EU offers translation software developers one million sentences

PARIS — Translation engine can sometimes offer unintentionally incomprehensible results to a query. The European Union wants to improve their results and accessibility.

The European Commission is offering translation software developers free access to around one million sentences translated between 22 of the European Union’s 23 official languages. It hopes the data will help improve the quality of a variety of language tools, including grammar and spelling checkers, online dictionaries and machine translators — particularly in less well-served languages such as Latvian or Romanian.

The sentences are mostly drawn from the “Acquis Communautaire,” the body of law that must be implemented by all new E.U. member states, and include the treaties, directives and regulations adopted by the E.U., and rulings from the European Court of Justice. Translated by professional translators, they cover topics such as IT, telecommunications, labor law, agriculture and fishing.

The translations form part of the “translation memory” used by the Commission’s permanent staff of 1,750 translators, and are matched up, sentence by sentence, in each of the 22 languages, and are tagged with subject classifications.

The matching and tagging makes the sentences especially useful for developers of statistical machine translation software, who must amass a corpus of thousands of matched sentences in the languages between which they wish to translate, so that they can calculate the most likely translation for any given expression. Since the matching of sentences has already been done, they will save time — and the immense size of the Acquis Communautaire will help them make their calculations more accurate.

Until now, developers have typically resorted to scouring the Web for texts translated into several languages, and using other software tools to make a guess at where sentences start and end in order to match them up.

While the release of the data will benefit software developers, the Commission is not being entirely altruistic: it hopes that the availability of better, cheaper automated translation software will help speakers of the E.U.’s minority languages by giving them access to online information currently available only in the more widely spoken languages.

Interested developers can download the texts from the Web site of the Commission’s Directorate General of Translation. They will also need the text extraction program and its library.

Related Download
3 reasons why Hyperconverged is the cost-efficient, simplified infrastructure for the modern data center Sponsor: Lenovo
3 reasons why Hyperconverged is the cost-efficient, simplified infrastructure for the modern data center
Find out how Hyperconverged systems can help you meet the challenges of the modern IT department. Click here to find out more.
Register Now