NexusLinguarum: Where we are, where we go

Multilingual Data & Knowledge

NexusLinguarum:  Where we are, where we go

Jorge Gracia

The ‘European network for Web-centred linguistic data science’ (NexusLinguarum in its short name) is a COST Action that was launched on 28 October 2019, for a four-year duration. At that time, the initial network was constituted by representatives of 33 different countries from different areas, like computer science, semantic web, artificial intelligence, linguistics, humanities, etc. The main aim of the network is to support the construction of an ecosystem of multilingual and semantically interoperable linguistic data at the scale of the Web. To this end, methods and techniques of the Semantic Web, Natural Language Processing and Language Resources are studied and combined. Such an ecosystem could reduce language barriers in Europe (and eventually beyond) and favour both electronic commerce and cultural exchange between countries with different languages.

In a previous article last year, we described the main aims of NexusLinguarum as well as their organisation in working groups and its initial developments. In the following paragraphs, we give a brief update on its current status and next steps.

Since its constitution, the Action has accepted new representatives from more countries, counting today researchers from 42 different countries. The number of members that actively participate in the Working Groups (WGs) are steadily growing, with 170 registered participants in one or several WGs. The Action is still open to welcome new members.

Unfortunately, as in many other aspects in our professional and personal lives, the COVID-19 crisis had an impact on NexusLinguarum activities. A number of Short Term Scientific Missions (STSMs), that is, research visits hosted by members of the network, had to be postponed or cancelled due to travelling restrictions. Similarly, some of the Action’s events were postponed and/or changed to an online or hybrid format. For instance, the second NexusLinguarum plenary meeting took place in Lisbon, at the Universidade Nova de Lisboa, on 26-27 October 2020 in a hybrid setting, with most participants participating online.

In addition, the pandemic obliged us to postpone our first training school on Introduction to Linked Data for Linguistics, initially planned as a face-to-face event, which finally took place in a purely online mode in February 2021, hosted by the Romanian Academy and University of Iași in Romania. Twelve lecturers and over 80 participants followed the school, which combined theoretical presentations with hands-on sessions. The slides and the training materials are available online. This initial experience will be continued with more training schools in the following years.

Recently, the Action published a policy brief Towards an open ecosystem of multilingual interoperable linguistic data, to help raise awareness among policy makers, stakeholders, and the general public about the social and technological interest of Linguistic Data Science, as a means to overcome language barriers in Europe and worldwide.

It is also worth noting that NexusLinguarum joined the interdisciplinary network COST Actions against COVID-19 , contributing from the field of linguistics. This is an initiative of several Actions wishing to connect and collaborate in treating COVID-19 issues from different angles, offering substantial potential in mobilising experts and tackling challenges as they arise, for this and future pandemics.

Another important outcome of this initial period has been the release of a document on Use Case Description and Requirements Elicitation, where a number of use cases and applications were described, in which the Action’s methodologies and technologies can be tested and validated. An extensive summary of this document is available in this issue. The list of current use cases can serve to illustrate the broad range of application domains for the Action’s topics:

Media and Social Media
Language Acquisition
Humanities
Social Sciences
Cybersecurity
Fintech
Public Health
Pharmacy

As for next steps in Nexuslinguarum, there is considerable effort and progress in the use cases as well as an increased interaction across WGs to further develop them, which generates also new research opportunities and collaborations among network members that have not been originally anticipated.

Several joint scientific publications have been published so far in the context of the different WGs, and a yet higher number of papers is currently under review or in preparation. Further, joint funding opportunities will be explored at different levels, with a transnational project proposal already submitted on the topic of humanities and social sciences.

Another policy brief in preparation concerns the “inclusion of data from under-resourced languages”, to be released later this year. In fact, NexusLinguarum is especially sensitive to the situation of under-resourced languages and their deficit of language technologies, and is promoting the use of linked data technologies to improve this status.

The interaction with standardisation groups, such as the Ontolex and the Linked Data for Language Technologies W3C community groups, has been strengthened and is expected to continue in the future, leading to the publication of guidelines and best practises regarding linguistic data science topics.

On an educational dimension, we are currently preparing a common curriculum for a Europe-wide master degree that the participating institutions could adopt to train a new generation of researchers in the area, thus introducing linguistic data science in a cross-discipline academic infrastructure.

Finally, the organization of the Language, Data, and Knowledge conference (LDK 2021), the flagship conference of NexusLinguarum, will be held on September 1-4 in Zaragoza, Spain. This will be an excellent opportunity for sharing new ideas and raising awareness of recent advancements in the community. A number of workshops and tutorials co-located at LDK are also organised by NexusLinguarum participants.

In summary, despite the difficulties imposed by these pandemic times, we are moving ahead and steadily progressing towards an ecosystem of interoperable multilingual linguistic data. And we are doing so by connecting different research lines from different researchers, coming from different fields, through a number of networking tools such as STSMs, conferences, scientific meetings and, more importantly, the daily work at the level of the different tasks and WGs. That is, through connecting excellent researchers to achieve excellent research.

Jorge Gracia is Chair of NexusLinguarum ‘European network for Web-centred linguistic data science’ COST Action. He works as senior research fellow at the Department of Computer Science and Systems Engineering (University of Zaragoza, Spain) as a member of the Aragon Institute of Engineering Research (I3A) and of the Distributed Information Systems research group. His main research interests are Semantic Web, Ontology Matching, Multilingual Web of Data, Query Interpretation, and Web Intelligence, and his recent work focuses on linked data-based lexicography as well as on methods and techniques for crosslingual linking and crosslingual information access.

http://jogracia.url.ph/web/

Spanish	Hebrew
El navío atracó en la noche.	הספינה הגיעה למזח בלילה.
los macizos alpinos	רכסי האלפים
La masa leuda.	הבצק תּוֹפֵחַ.
¡No te preocupes!	אל תדאג
el bosquejo de una pintura	סקיצת ציור
La palabra “mesa” es de género femenino.	המילה “צלחת” היא ממין נקבה.
una obra de teatro en cinco actos	מחזה בחמש מערכות
la masa atomica de qualqer cosa	המסה האטומית של דבר מה
¿Cómo se dice “luna” en inglés?	איך אומרים “ירח” באנגלית?
abonarse al cable	לעשות מינוי לכבלים

Jorge Gracia

Arabic

German

Spanish

Hebrew

ARABIC	CHINESE	domain
زوجي السابق	前夫
عقاب بالسجن عشرين سنة	判二十年的牢狱
مقطوعة موسيقية كلاسيكية لباخ	巴特前奏曲	music
ملأ دجاجة بالحشوة	把一只鸡塞满馅料	culinary
رسم دائرة	画圆	geometry
طرد شخصا ما من دولة	将某人从国家中驱逐
مفرد وجمع كلمة	一个词的单复数	grammar
عمل حاصل جمع عدة أرقام	做几笔数目的总额	mathematics
رياح شمالية	北风
منظر خيالي	不真实的景象

ARABIC	DANISH	domain
السفارة الألمانية في باريس	den tyske ambassade i Paris	politics
قامت الشرطة بالقبض على المجرم.	Politiet har fanget forbryderen.	law
تقع برلين على دائرة عرض 52 درجة شمالاً وعلى خط طول 13 درجة شرقًا.	Berlin ligger omtrent på 52 grader nordlig bredde og 13 grader østlig længde.	geography
تمركز كل المشتركين على خط الانطلاق.	Alle konkurrencedeltagerne står på startlinjen.	sport
قطة أليفة	en tillidsfuld kat
حزمة من الفجل/الثوم	et bundt purløg/radiser
قانون الجاذبية	tyngdeloven	mathematics, physics
“لقد فعلها!” – “كم هذا مبهر، خاصة مع كل المساعدة التي تلقاها!”	“Han klarede det!‟ – “Det tror pokker, med al den hjælp, han har fået!‟
بذور دوار الشمس	solsikkekerne	botanics
اشتد السيل على نحو مخيف، لكن هذا الرعب انتهى بعد دقائق معدودة.	Det haglede frygteligt, men efter et par minutter var ubehaget overstået.

ARABIC	DUTCH	domain
أغنية من ألبومها الغنائي الجديد	een lied uit haar laatste album	music
مراسلنا في المنطقة المنكوبة	onze verslaggever uit het crisisgebied	journalism
عش السنونو	zwaluwennest	zoology
الولايات المتحدة الأمريكية وحلفائها	de USA en haar bondgenoten	politics
يضخ القلب الدم عبر الأوعية الدموية.	Het hart pompt het bloed door de aderen.	anatomy
المفعول به يكون في حالة النصب.	Het directe object is accusatief.	grammar
روض نمرا	een tijger temmen
نشر خبرا	een bericht verspreiden
دراسة الحقوق	rechten studeren
مثل صيني	een Chinees spreekwoord

ARABIC	ENGLISH	domain
فيلم روائي	feature film	cinema, television
حالة طقس هادئة	calm weather	meteorology
الفيلم عبارة عن تقليد هزلي لأفلام الغرب الأمريكية القديمة.	The film is a parody of the old Hollywood westerns.	television

Jorge Gracia

SHARE ON