Linguistic Linked Data – Advanced Topics

Nexus Linguarum DGN_linkeddata-advanced

The Linguistic Linked Data field studies techniques and tools aimed at modelling and publishing language resources on the Web, in ways that enable their data interoperation and reuse. During this course you will be introduced to some advanced topics on linguistic linked data as well as its application to some fields like lexicography or terminology, among other aspects. This course is the continuation of another one that covers the essentials of linguistic linked data (Linguistic Linked Data – Essentials) that we encourage you to take in preparation for this one.

Course Start:
(Self-paced)
Estimated Effort:
15 h/total

Enroll Now

Linguistic Linked Data – Advanced Topics

About This Course

The Linguistic Linked Data field studies techniques and tools aimed at modelling and publishing language resources on the Web, in ways that enable their data interoperation and reuse.

After covering the basic concepts of linguistic linked data in our previous, introductory course (Linguistic Linked Data – Essentials), this course will tackle other advanced and complementary topics. These include metadata representation, the description of some particular linguistic linked data resources (such as DBnary and Wikidata), or the application of linked data to specific fields in linguistics such as lexicography and terminology. Other very current topics will also be discussed such as deep learning and linguistic linked data.

The lesson that wraps up the course describes a very interesting use case: the LiLa project. LiLa builds a network of interconnected linguistic resources (dictionaries, glossaries, corpora, etc.) for the study of the Latin language, and exemplifies very well many of the concepts explained along the course, in a real setup.

Course Topics

Metadata
Linked data resources
Linked data and lexicography
Linked data and terminology
Deep learning and linked data
Use case: LiLa project

Required Skills

General IT knowledge, basics in linguistics. We recommend to first finish the course "Linguistic Linked Data – Essentials"

Course Level

Intermediate course.

Target Group

Everyone interested in language technologies, willing to represent and generate linguistic data in standard and interoperable ways on the Web.

Effort

3-4 hours per week

Course Staff

Jorge Gracia, University of Zaragoza

Jorge Gracia is a senior research fellow (“Ramón y Cajal” postdoctoral position) at the Department of Computer Science and Systems Engineering (University of Zaragoza, Spain). He is a member of the Aragon Institute of Engineering Research (I3A) and of the Distributed Information Systems (SID) research group. His main research interest include Semantic Web, Ontology Matching, Linguistic Linked Data and Natural Language Processing. He has been chair of NexusLinguarum, the “European network for Web-centred linguistic data science”, a COST Action that joined the efforts of researchers from 42 countries.

Penny Labropoulou, Athena Research Center

Penny Labropoulou is a Principal Applications Researcher at the Institute for Language and Speech Processing/Athena RC. She works mainly in the areas of metadata models and ontologies, infrastructures for sharing and exploiting Language Resources and Technologies, Semantic Web technologies, semantic interoperability, and licensing and ethical issues for LRTs. She has also extensive experience in computational lexicography and lexicology, pedagogical lexicography and computational terminology. She has participated with a leading position in many European and national projects, among which, more recently, Common European Language Data Space, European Language Grid and CLARIN-EL.

Andon Tchechmedjiev, IMT Mines Ales

Andon Tchechmedjiev is Associate Professor at Institut Mines Telecom in the EuroMov Digital Health in Motion (EDHM) interdisciplinary lab (IMT Mines Alès, University of Montpellier), at the crossroads of artificial intelligence, human movement science and embodiment as well as medicine. He is co-PI of the Semantics and Taxonomy of Human Movement research axis and member of the lab steering committee. He also co-animates the data ecosystem and governance theme in the Data & AI scientific community at Institut Mines Telecom.

Gilles Sérasset, University Grenoble Alpes

Gilles Sérasset is an associate professor at Université Grenoble Alpes. His main research activities focuses on large multilingual lexical databases maintenance and management. He first theorised and described the notion of "Interlingual Acceptions", used as a pivot component of multilingual lexical databases that avoids the introduction of artificial contrastive semantic problems. This notion has inspired several work among them the notion of "Axis" defined in the LMF (Lexical Markup Framework) ISO norm. He also worked in Multilingual Information Retrieval with several participation in CLEF campaign and in multilingual communication as the head of the French team in the UNL (Universal Networking Language) project under the auspices of the United Nation University. Since 2012, Gilles Sérasset is the main developer of DBnary, one of the largest Multilingual Lexical Linked Dataset available in RDF format. The DBnary dataset won two international challenges distinguishing works on Lexical Linked Open Data.

Sara Carvalho, University of Aveiro/CLLC-UA/NOVA CLUNL

Sara Caravalho is an Assistant Professor at the University of Aveiro and researcher at both the Centre for Languages, Literatures and Cultures of the Univ. of Aveiro and the Linguistics Research Centre of NOVA University Lisbon. Her research focuses on the intersection of Terminology and Ontologies, especially in the medical domain.

Dagmar Gromann, University of Vienna

Dagmar Gromann is an Associate Professor at the University of Vienna and Coordinator of the master’s program Multilingual Technologies. She primarily works in multilingual information extraction and knowledge extraction, including deriving cognitive concepts from textual data. In terms of deep learning, she is particularly interested in socio-technical implications and bias. Furthermore, she is interested in neurosymbolic approaches that combine neural language models with structured data and knowledge representation methods.

Marco Passarotti, Universita Cattolica del Sacro Cuore

Marco Carlo Passarotti is Full Professor of Computational Linguistics at Università Cattolica del Sacro Cuore (Milan, Italy), where he is Director of the CIRCSE Research Centre and Coordinator of the MA in Linguistic Computing. His main research interests deal with building, using and disseminating linguistic resources and natural language processing tools for Latin. A former pupil of one of the pioneers of humanities computing, father Roberto Busa SJ, since 2006 he has headed the Index Thomisticus Treebank project. Between 2018 and 2023, he was the principal investigator of the LiLa project, an ERC-Consolidator Grant that built a Linked Data Knowledge Base of interoperable linguistic resources for Latin.

Francesco Mambrini, Universita Cattolica del Sacro Cuore

Francesco Mambrini is a researcher at Universita Cattolica del Sacro Cuore. He holds a PhD in Classical Philology from Univeristy of Trento (Italy) and EHESS (France). He has collaborated with several DH projects and institutes like The Perseus Project and the DAI Berlin At Universita Cattolica he teaches Computational Linguistics Linguistic Linked Data and Digital Tools for the Humanities.

Slavko Žitnik, University of Ljubljana (Course coordinator)

Slavko Žitnik Associate Professor and Vice Dean for Education at the University of Ljubljana, Faculty of Computer and Information Science. His research is in the areas of natural language processing, information retrieval, information extraction, semantic Web, and information systems, and counts more than 100 bibliographic items. He actively collaborates with researchers from Université Paris 1 - Sorbonne, University of Belgrade, University of South Florida, and Harvard University. He received multiple shared task awards and the University of Ljubljana award or extraordinary pedagogical, research, and artistic achievements. He is teaching courses related to data science, databases, semantics, and natural language processing.

Collaborators

Rute Costa, NOVA University Lisbon/NOVA CLUNL
Dagmar Gromann, University of Vienna
Elena Montiel-Ponsoda, Universidad Politécnica de Madrid
Patricia Martín-Chozas, Universidad Politécnica de Madrid
Ilan Kernerman, Lexicala
Ana Salgado, Universidade NOVA de Lisboa/Academia das Ciências de Lisboa
Anas Fahad Khan, CNR-Institute for Computational Linguistics
Panagiotis Karioris, Athena Research Center
Gokhan Ozkan, Kırklareli University

Frequently Asked Questions

What web browser should I use?

Our German-UDS.academy platform works best with current versions of Chrome, Edge, Firefox, or Safari.

See our list of supported browsers for the most up-to-date information.