Workshop on Deep Learning andNeural Approaches for Linguistic Data

Multilingual Data & Knowledge

Workshop on Deep Learning and
Neural Approaches for Linguistic Data

Radovan Garabík and Dagmar Gromann

Deep learning and neural network approaches are indispensable in modern Natural Language Processing and generally in all kinds of linguistic data analysis approaches. Latest advances suggest that hitherto intractable problems can be solved in near future, especially as demonstrated by ongoing revolution in NLP influenced by the “unreasonable effectiveness” of Transformer architectures.

Those huge and complex language models require a lot of training data. Fortunately, at least for official EU languages, there is substantial text data collected in various repositories through many years of corpus linguistic research (and we can often resort to web corpora too). However, such resources are highly non-uniform, often difficult to find, and in various states of usability. While projects to comprehensively collect and describe language resources or even provide a unified cloud-based access to the tools exist (e.g. Common Language Resources and Technology Infrastructure – CLARIN, META-SHARE, European Language Grid, European Language Equality), the situation is still very fragmented and access to data is difficult, especially in automated modes. The Linguistic Linked Open Data (LLOD) initiative attempts to improve the situation by designing a database-like access method based on existing standard web technologies, to provide a uniform mode of accessing the data, regardless of physical location or the natural language in question. Several of these initiatives, e.g. the European Language Grid, also include language technologies within their repository.

The COST Action CA18209 on European network for Web-centred linguistic data science (NexusLinguarum) organizes within the context of its Working Group 3 Support for linguistic data science a workshop focused on deep learning and neural approaches in connection with LLOD.

The goal of the workshop is connecting researchers working on all aspects of deep learning in relation with linguistic data and the effective use of deep learning in understanding the specificities of linguistic data, to be better exploited and combined with linked data mechanisms.

Researchers from all areas of NLP, corpus and computational linguistics or any other fields working with linguistic data are invited to submit a short description of their research, results, tools and applications. There is also a place to submit not only finished novel results, but also ongoing research, existing projects, including work in progress and future plans, as well as notable already published results.

The workshop is focused on, but not restricted to, these topics:

Language models for the Multilingual Semantic Web
Enhancement of language models with structured linguistic data
Neural Machine Translation for LLOD interlinking
Structured linguistic data to improve Neural Machine Translation
Use cases combining language models and structured linguistic data

The workshop will be collocated with the next NexusLinguarum plenary meeting, to be held in Skopje, North Macedonia on 30 September 2021. It will be held in hybrid mode – both online and physical. The deadline for submitting proposals is 15 July 2021.

For further details please visit the workshop webpage.

Spanish	Hebrew
El navío atracó en la noche.	הספינה הגיעה למזח בלילה.
los macizos alpinos	רכסי האלפים
La masa leuda.	הבצק תּוֹפֵחַ.
¡No te preocupes!	אל תדאג
el bosquejo de una pintura	סקיצת ציור
La palabra “mesa” es de género femenino.	המילה “צלחת” היא ממין נקבה.
una obra de teatro en cinco actos	מחזה בחמש מערכות
la masa atomica de qualqer cosa	המסה האטומית של דבר מה
¿Cómo se dice “luna” en inglés?	איך אומרים “ירח” באנגלית?
abonarse al cable	לעשות מינוי לכבלים

Radovan Garabík and Dagmar Gromann

Arabic

German

Spanish

Hebrew

ARABIC	CHINESE	domain
زوجي السابق	前夫
عقاب بالسجن عشرين سنة	判二十年的牢狱
مقطوعة موسيقية كلاسيكية لباخ	巴特前奏曲	music
ملأ دجاجة بالحشوة	把一只鸡塞满馅料	culinary
رسم دائرة	画圆	geometry
طرد شخصا ما من دولة	将某人从国家中驱逐
مفرد وجمع كلمة	一个词的单复数	grammar
عمل حاصل جمع عدة أرقام	做几笔数目的总额	mathematics
رياح شمالية	北风
منظر خيالي	不真实的景象

ARABIC	DANISH	domain
السفارة الألمانية في باريس	den tyske ambassade i Paris	politics
قامت الشرطة بالقبض على المجرم.	Politiet har fanget forbryderen.	law
تقع برلين على دائرة عرض 52 درجة شمالاً وعلى خط طول 13 درجة شرقًا.	Berlin ligger omtrent på 52 grader nordlig bredde og 13 grader østlig længde.	geography
تمركز كل المشتركين على خط الانطلاق.	Alle konkurrencedeltagerne står på startlinjen.	sport
قطة أليفة	en tillidsfuld kat
حزمة من الفجل/الثوم	et bundt purløg/radiser
قانون الجاذبية	tyngdeloven	mathematics, physics
“لقد فعلها!” – “كم هذا مبهر، خاصة مع كل المساعدة التي تلقاها!”	“Han klarede det!‟ – “Det tror pokker, med al den hjælp, han har fået!‟
بذور دوار الشمس	solsikkekerne	botanics
اشتد السيل على نحو مخيف، لكن هذا الرعب انتهى بعد دقائق معدودة.	Det haglede frygteligt, men efter et par minutter var ubehaget overstået.

ARABIC	DUTCH	domain
أغنية من ألبومها الغنائي الجديد	een lied uit haar laatste album	music
مراسلنا في المنطقة المنكوبة	onze verslaggever uit het crisisgebied	journalism
عش السنونو	zwaluwennest	zoology
الولايات المتحدة الأمريكية وحلفائها	de USA en haar bondgenoten	politics
يضخ القلب الدم عبر الأوعية الدموية.	Het hart pompt het bloed door de aderen.	anatomy
المفعول به يكون في حالة النصب.	Het directe object is accusatief.	grammar
روض نمرا	een tijger temmen
نشر خبرا	een bericht verspreiden
دراسة الحقوق	rechten studeren
مثل صيني	een Chinees spreekwoord

ARABIC	ENGLISH	domain
فيلم روائي	feature film	cinema, television
حالة طقس هادئة	calm weather	meteorology
الفيلم عبارة عن تقليد هزلي لأفلام الغرب الأمريكية القديمة.	The film is a parody of the old Hollywood westerns.	television

Radovan Garabík and Dagmar Gromann

SHARE ON