Lexicala news


Lynx knowledge-based AI service platform featured in Information Systems journal

An article presenting the main results of the EU-funded Lynx project was published in open access in the journal Information Systems (Volume 106, May 2022, Elsevier https://www.sciencedirect.com/science/article/pii/S0306437921001563?via%3Dihub).

The paper, entitled ‘Lynx: A Knowledge-based AI Service Platform for Content Processing, Enrichment and Analysis for the Legal Domain’, is co-authored by Dr Julian Moreno Schneider from DFKI (Berlin) and partners from the Lynx consortium. It describes the creation of a knowledge graph for the legal domain and its use for the semantic processing, analysis, and enrichment of documents, as well as the use cases covered in Lynx, the entire developed platform and the semantic analysis services that operate on the documents.

Lynx – Legal Knowledge Graph for Multilingual Compliance Services – was a 40-month research and innovation project held in the framework of the EU Horizon 2020 program and completed in March 2021.  Led by the Ontology Engineering Group of Madrid Polytechnic University and including ten more commercial and academic partners, the main objective of Lynx was to create an ecosystem of smart cloud services to better manage compliance, based on a legal knowledge graph that integrates and links heterogeneous compliance data sources including legislation, case law, standards, and other private contracts. See https://lynx-project.eu/.



Parallel Corpora for better Korean Translation

Naver Corporation will integrate over a quarter of a million sentence pairs from K Dictionaries to enhance the performance of Papago Translator and Naver Dictionary services.

Papago is the world leader in Neural and Semantic Machine Translation for Korean and 13 languages, and is available on the Web and in mobile apps for professional and personal use.

Naver Dictionary is the common dictionary service in Korea, including 49 bilingual dictionaries, and recently it launched the English Dictionary service that features renowned dictionaries from leading American and British publishers. 

The Lexicala Korean parallel corpora stem from quality lexicographic resources of K Dictionaries and are applied to training machine learning models and to Naver dictionaries. It consists of 260,000 bilingual examples of usage from dictionary entries between Korean and four major Western languages: English, French, German, and Spanish. The data is developed by converging human created and curated content with smart automated processing methods, including the review of all the sentence pairs by Korean language experts to assure perfect matching equivalence to the other languages.

Naver and K Dictionaries began to cooperate in 2017 on the development of Korean trilingual dictionaries and plan to expand their collaboration on Naver’s new Open Dictionary Platform.


SuperMemo publish Czech and Greek PowerWords!

SuperMemo World launched new language courses in the PowerWords! vocabulary learning series for Czech and Greek. PowerWords Čeština and PowerWords! Ελληνικά include versions for speakers of Chinese, English, French, German, Italian, Japanese, Korean, Polish, Portuguese, Russian, and Spanish. The PowerWords! series integrates lexicographic content from Lexicala and the languages covered so far include Arabic, Chinese, Czech, Danish, Dutch, English, Finnish, French, German, Greek Hungarian, Italian, Japanese, Norwegian, Polish, Portuguese (Brazilian and European), Russian, Spanish, and Swedish. More language courses are in preparation.


Sales of parallel corpora for machine translation

We are delighted to announce the first sale of Lexicala parallel corpora on TAUS Data Marketplace. The bilingual datasets, for English-Korean and English-Turkish, will serve to train machine learning models for neural machine translation (NMT) systems. Unlike most big data that is harvested on the Web for this purpose, but often contains various types of noise and shortcomings, the Lexicala resources converge human curated and automatically generated sentences, stemming from examples of usage that are translated by our editors, which can serve to enhance the quality of NMT processes and their results. The TAUS Data Marketplace is a pioneering platform for exchange between data sellers and buyers, used by major Language Service Providers worldwide. Currently it features 357 language pairs by Lexicala, which make us its biggest provider of parallel corpora for NMT.