Opening the door to Europe’s digital language library

Imagine a library containing every piece of language research ever created in Europe—not just written research, but spoken and multimodal, too. Every recording, every study, every data set, all in one place. Scientists and researchers could share knowledge like never before. They’d have opportunities to mine, reference and visualise massive datasets. It would lead to new ideas, stronger grant proposals, better science products—and potentially big leaps in discovery.

This is exactly what the CLARIN research infrastructure (or simply “CLARIN”) set out to do 2012. The objective? To make digital language resources from Europe accessible to humanities and social science researchers through a single sign-on.

Here’s how it is working towards that goal.

A Europe-wide library. One single sign-on

CLARIN stands for “Common Language Resources and Technology Infrastructure”.
Through a combination of access and advanced interoperable tools, researchers can use the CLARIN infrastructure to explore, annotate and combine complex data sets to support their work.

The resources are hosted at 38 CLARIN centres—usually a university or academic institute—and connected via a central online portal. Anyone with an academic computer account can access protected resources using a federated login. From the Bavarian Archive for Speed Signals to the Language Bank of Finland repository, all data and services adhere to the same access conditions and standards of the CLARIN framework.

Why is CLARIN important?

CLARIN centres are home to some of the world’s most important repositories of language resources and documentation. From the major European languages: French, Italian, Spanish, German, English and Arabic, to endangered and lesser-studied languages, CLARIN deals with all human languages, in present-day and historical forms.

CLARIN is also working to cater to other languages studied in these member states, such as Arabic, Chinese, Russian, Japanese and even Swahili. In a nutshell, CLARIN supports scholars who want to engage in cutting edge data-driven research, contributing to a truly multilingual European Research Area.

PORTULAN CLARIN is the Research Infrastructure for the Science and Technology of Language.  It exists to support researchers, innovators, citizen scientists, students and language professionals interested in language.

The challenge of multiple identity federations

With so many participating institutions and thousands of users Europe-wide to connect, the complexities around data security are many. For instance, research and education institutions use different architectures, systems, and policies to conform to their Authentication, Authorisation, and Identification (AAI) procedures. This meant CLARIN has to negotiate the trustworthy exchange of information with each identity federation individually.

But with such a large number of identity federations—often not present in every country—extensive negotiations are time-consuming and costly. This presented a huge and complex challenge for CLARIN as more and more institutions and countries came on board.

How GÉANT helps

In 2015, CLARIN adopted eduGAIN. GÉANT developed eduGAIN to enable research and education institutions across the world to more easily interconnect. By inviting them to sign up to an agreed set of policies for exchanging information about users and resources, they can negotiate connections from a shared starting point with a clearer path forward. This means the administrative burden on CLARIN is reduced. Attributes are released faster. And cross border logins have increased, suggesting that real progress is being made in the tricky world of inter-country bilateral agreements.

Importantly, more cross border logins mean more collaboration and more benefit to society; as sensitive and gated data becomes more widely available to scientists working for the greater good.

Published: 01/2021

For more information please contact our contributor(s):