Introduction

-- Computer-Assisted Translation: Human Translation Enhanced with Computerized Tools

Development of CAT

While it signaled the end of public funding for machine translation research in the United States, the ALPAC report encouraged the pursuit of a more realistic goal for computer-assisted translation. The report praised the glossaries generated by the German army’s translation agency as well as the terminology base of the European Coal and Steal Community – a resource which foregrounded EURODICAUTOM and IATE – and came to the conclusion that these resources were a real help to translation. The final recommendations clearly encouraged the development of CAT, especially in the leveraging of glossaries initially created for machine translation.

At that point, a whole range of tools intended to help the translator in his/her work rather than replace him/her started to be developed. The first terminology management programs appeared in the 1960s and evolved into multilingual terminology databases such as TERMIUM or UNTERM. Bilingual concordancers are also of invaluable help: they allow the translator to access the word or term’s context and compare the translation of the contexts in the target language.

The rise in computer-assisted translation happened in the seventies with the creation of translation memory software, which allows the translator to recycle past translations: when a translator has to translate a new sentence, the software scans the memory for similar previously translated sentences, and when it finds any, suggests the previous translation as translation model. The time saved is all the greater when the texts translated are repetitive, which is often the case in certain specialized documents such as technical manuals.

These sets of translated documents make up what we call parallel corpora and their leveraging intensified in the 1980s, allowing for a resurgence in machine translation. While the translation systems based on rules had dominated the field until then, the access to large databases of translation examples helped further the development of data-driven systems. The two paradigms arising from this turnaround are the example-base translation and statistical machine translation, which remains the current dominant trend. The quality of machine translation is improving. Today, it generates usable results in specialized fields in which vocabulary and structures are rather repetitive. The last stronghold is general texts: machine translation offers, at best, an aid for understanding.

During the 1990s, CAT benefited from the intersecting input of machine translation and computational terminology. It was at that point that term alignment algorithms appeared, based on parallel corpora. The bilingual terminology lists generated are particularly useful in the case of specialized translation.

Automatic extraction and management of terminology, bilingual concordance services, pre-translation and translation memories, understanding aids: today, the translator’s workstation is a complex and highly digital environment. The language technology industry has proliferated and developed itself, generating many pieces of CAT software: TRADOS, WORDFAST, DÉJÀ VU, and SIMILIS to name just a few. The greater public is also provided for: on the one hand, Google has widened the access to immediate translation for anyone due to its GOOGLE TRANSLATE tool and on the other hand, open access bilingual concordance services have appeared recently on the Internet.