EU offers translation software developers one million sentences

PARIS — Translation engine can sometimes offer unintentionally incomprehensible results to a query. The European Union wants to improve their results and accessibility.

The European Commission is offering translation software developers free access to around one million sentences translated between 22 of the European Union’s 23 official languages. It hopes the data will help improve the quality of a variety of language tools, including grammar and spelling checkers, online dictionaries and machine translators — particularly in less well-served languages such as Latvian or Romanian.

The sentences are mostly drawn from the “Acquis Communautaire,” the body of law that must be implemented by all new E.U. member states, and include the treaties, directives and regulations adopted by the E.U., and rulings from the European Court of Justice. Translated by professional translators, they cover topics such as IT, telecommunications, labor law, agriculture and fishing.

The translations form part of the “translation memory” used by the Commission’s permanent staff of 1,750 translators, and are matched up, sentence by sentence, in each of the 22 languages, and are tagged with subject classifications.

The matching and tagging makes the sentences especially useful for developers of statistical machine translation software, who must amass a corpus of thousands of matched sentences in the languages between which they wish to translate, so that they can calculate the most likely translation for any given expression. Since the matching of sentences has already been done, they will save time — and the immense size of the Acquis Communautaire will help them make their calculations more accurate.

Until now, developers have typically resorted to scouring the Web for texts translated into several languages, and using other software tools to make a guess at where sentences start and end in order to match them up.

While the release of the data will benefit software developers, the Commission is not being entirely altruistic: it hopes that the availability of better, cheaper automated translation software will help speakers of the E.U.’s minority languages by giving them access to online information currently available only in the more widely spoken languages.

Interested developers can download the texts from the Web site of the Commission’s Directorate General of Translation. They will also need the text extraction program and its library.

Would you recommend this article?

Share

Thanks for taking the time to let us know what you think of this article!
We'd love to hear your opinion about this or any other story you read in our publication.


Jim Love, Chief Content Officer, IT World Canada

Featured Download

Featured Articles

Cybersecurity in 2024: Priorities and challenges for Canadian organizations 

By Derek Manky As predictions for 2024 point to the continued expansion...

Survey shows generative AI is a top priority for Canadian corporate leaders.

Leaders are devoting significant budget to generative AI for 2024 Canadian corporate...

Related Tech News

Tech Jobs

Our experienced team of journalists and bloggers bring you engaging in-depth interviews, videos and content targeted to IT professionals and line-of-business executives.

Tech Companies Hiring Right Now