If you work with translators or translation companies, you might have heard the acronym CAT being used (amongst many others). But why might we be talking about family pets during a translation project, you might ask? Allow me to explain what kind of ‘CATs’ we actually mean and how they help with your translation assignments.
In translation, CAT stands for…
CAT or Computer-Assisted Translation is an umbrella term for the technology used by freelance translators, translation companies and large global corporations for translation projects. Because of how this technology works, you might also hear it referred to as TM (Translation Memory) or even TENT (Translation ENvironment Tool) technology. Different brands of such technology are available on the market offered by various companies; the one used here at Alexika being Trados Studio 2021.
So, what does a CAT tool do?
This is all very interesting background, but you are probably still wondering what a CAT tool actually does. A very simple way to explain its main use is that a CAT tool replicates the function of the translator’s memory whilst translating.
Let’s think of a phrase — keeping it simple for the purposes of demonstration — ‘The cat is in the garden’ (now I am talking the animal variety). Say this was translated in a first document. It’s now a few months later and the previous translation is possibly a distant memory for the translator, having done lots of other translations since then. But that same phrase—‘The cat is in the garden’—now re-appears in a new document. The Translation Memory will recognise that this phrase has been translated before and will suggest the previous translation whilst indicating that the phrase in the current document is an exact match (also known as 100% match).
Pretty nifty. This is why I like to think of it, in very simple terms, as replicating the human brain’s own processes, only of course, a machine’s memory capacity is far greater than that of a human. It can suggest many different translations done many years ago, that the brain alone might struggle to recall.
What’s even niftier is that the TM will return other lower types of matches. What does that mean? Well, sticking with our scenario above, say for instance you have a further document with a similar phrase—‘The dog is in the garden’—where the word ‘cat’ has been replaced with the word ‘dog’, but the rest of the sentence is the same as before.
A CAT tool will suggest what’s called a ‘fuzzy match’. This means that the source sentence in the current file is partially the same as another source sentence in the TM, and furthermore it will assign a percentage value to indicate the similarity (fuzzy matches or ‘fuzzies’ can range from around 70-99%). The phrase in our example amounts to roughly an 89% fuzzy match (note that the match values can vary slightly between different tool manufacturers).
How do CAT tools return matches?
Put simply, the source language sentence (‘source segment’) is saved in the Translation Memory along with the target language sentence (‘target segment’) to create what is known as a ‘translation unit’. Every time a segment is translated and saved to the TM, the TM gets bigger, creating what is essentially a bilingual database of all previously translated work.
The beauty of this technology is the flexibility it allows. For instance, you can create as many TMs as required and define them in the most suitable way for you and your processes. You can create client-specific TMs (e.g. ‘Client 1 TM’, ‘Client 2 TM’), industry-specific TMs (e.g. ‘Patent TM’, ‘Marketing TM’), or TMs by specific subject area (e.g. ‘Mechanical engineering TM’, ‘Biotechnology TM’).
When a source file is opened in a CAT tool, the first thing that happens is an automatic process known in the industry as ‘segmentation’; this is where the source document is analysed and split up into segments (‘segmented’).
Note that the source file itself is not changed during this process — this all happens in an intermediate bilingual format, native to the CAT tool.
Segmentation rules are applied from the TM which instruct the CAT tool exactly how to segment or split up the document. These rules are typically based on the end punctuation — so if a sentence ends with a full stop, question mark or colon, this will denote the end of the sentence. This is why segments are most often full sentences, but can also be headings or just single phrases or words (for example from a bulleted list). Such rules can be tweaked.
Once the file has been segmented and the bilingual file has been prepared, the fun of translating can begin. Each time the translator clicks in an empty target segment, the TM is automatically searched for any matches and the best (highest) match is inserted into the target segment, which is then checked and edited as appropriate by the translator. This continues every time a new target segment is clicked. There is also a process called ‘Pre-Translate’ which allows all 100% matches in a document or set of documents (Project) to be pre-inserted into the target segments.
In addition to using TMs, glossaries can also be added as a further resource. In our tool of choice, glossaries are referred to as Termbases. A source segment is searched by the Termbase for any matching terminology it might contain. Returning to our example above, the terms ‘cat’ and ‘dog’ could be stored in the Termbase along with their respective translations. These terms would then be recognised and suggested by the tool as the translator works.
During translation, the two work together; the TM suggesting translations of exact or fuzzy matches for sentences (source segments), the Termbase suggesting translations of any individual terms or phrases it recognises in the source segment. The TM can even be searched for part of the source segment via a feature known as ‘Concordance’. Thinking again back to the example from earlier, the translator could use this Concordance search to look up ‘the dog’ or ‘garden’ to see if those terms existed in the TM separately or within another sentence. Equally, the Termbase may also be searched manually.
Terms can also be added very quickly and easily to the Termbase during translation, and will be recognised from that point on by the tool in any subsequent segments.
As well as being used during translation, most glossary systems allow the user to view the glossary on its own and be updated outside of any ongoing translation work. Handy if some changes need to be made ahead of a new project starting.
Not only this but…
The Translation Memory and Glossary features are just some parts of this type of technology.
In addition, CAT tools allow the user to translate file types without actually having the native program installed. This is because CAT tools use an intermediate bilingual file format—in our case, this format is SDLXLIFF. Only once fully translated does this SDLXLIFF file become converted back to the original (source) file format.
For example, many manuals are created using InDesign. However, it’s unlikely that a translator would have this program installed on their PC, which would otherwise pose a challenge, but that isn’t a problem in the case of a CAT tool. The InDesign file (using the interchange format of INDD, which is IDML) is processed by Trados Studio and becomes an SDLXLIFF during translation. Once complete, the SDLXLIFF is converted (at the simple touch of a button) back to IDML format. An artworker will now need to be involved in order to carry out the standard post-translation layout work in InDesign to create the translated and laid-out INDD file (plus PDFs). But you can see how the CAT tool is a huge help here.
There are many other benefits and features of using CAT tools. Hopefully this has provided a bit of an insight into how they work and are used by professional translators and companies. If you want to find out more about them or how they are used in your translation work, please get in touch. Send me a tweet or start a LinkedIn conversation with me.
Gemma Smith, August 2021