Oxford Global Languages initiative will empower millions of people across the globe with digital lexical data in 100 of the world’s languages. Lexical information, in a single linked repository, will become available for free for speakers and learners, as well as used for licensing and integrating into technology products and applications.
Initially, OUP made all dictionary conversions in-house. The process was not unified and took minimum 3 weeks, in some cases reaching 3 months depending on the initial data format, with up to 20% data loss on the way. For each dictionary a new convertor was created.
Digiteum created Dictionaries Conversion Framework (DCF) to keep up with the quality and consistency of output data. It enabled reducing the conversion speed to 2-5 working days with 100% data accuracy.
Figuratively speaking, the creation of the framework could be compared with the invention of the printing press for dictionaries conversion. It is “one converter fits all dictionaries” platform that unifies output lexical data no matter what initial format and structure it has.
With about 40 dictionaries converted up to this point, Phase II of the project was launched to continue the tradition of digital innovation Oxford University Press leads.
- Dictionary data conversion from arbitrary input formats (e.g. XML, RTF, HTML, Plain Text, etc.) into specified output formats (e.g. LeXml, DTD6)
- 2 to 5 business days for one dictionary conversion (compared to up to 3 months conversion time based on legacy methods)
- 100% data accuracy vs. 15-20% data loss
- 10 times encreased conversion speed compared to the conversions done previously (based on XSL)
- Over 40 dictionaries in 18 languages of Europe, Asia, and Africa were converted up to date
CLIENT: Oxford University Press, Global Languages