May 2019

Why TROLLing is the thing to do for linguists

Europe | GÉANT (Europe) | UNINETT (Norway)

The word “trolling” carries quite a few meanings: It can refer to a specific kind of fishing, trailing a baited line along behind a boat, and it may refer to walking in a rambling manner. Furthermore, in the early 90’s people began using it to describe dubious people posting inflammatory or off-topic messages on the Internet.

To many linguists however, the word took on a whole new meaning, when in June 2014 the University Library together with the Department of Language and Linguistics at UiT The Arctic University of Norway, launched The Tromsø Repository of Language and Linguistics, in short: TROLLing.

Open archive

TROLLing is an open archive for linguists worldwide to post datasets and statistical models used in their research. All content is accompanied by searchable metadata that identify the researchers, the languages and linguistic phenomena involved, the statistical methods applied, and scholarly publications based on the data.

So, if you feel an irresistible urge to dive into the use of accusative of negation in Borderland Polish, TROLLing is the place to go. Likewise, if multi-dimensional analysis of Czech is your thing, or if you want a closer look at Norwegian compounds and their Russian counterparts, please do not hesitate to start TROLLing.

For a short presentation of what TROLLing can do for researchers, please watch the video. But be warned: linguist humour may occur!

Fundamental changes

Over the past 15 years linguistics has gone through fundamental changes. Due to extensive digitization, researchers have gained access to large quantities of linguistic data. Furthermore, they have started to apply sophisticated statistical software for analysing the data. Statistical studies and data studies in linguistics have increased significantly, and now the majority of linguistic research is based on digital data sets “crunched” by statistical code.

This trend leads to a need for increased transparency: As all researchers depend on the work others have done before them, they not only need access to relevant scholarly publications, but also to the data sets and the software used to yield the research results. So, in linguistics as well as in many other parts of the scientific community, the concept of Open Science is gaining momentum. And this means openness in all parts of the research cycle, from bibliometrics to access to scientific journals, and from science workflows to data use and reuse.

TROLLing and CLARIN

The TROLLing repository is part of a larger European network of similar data repositories for scholars in the social sciences and humanities called the “Common Language Resources and Technology Infrastructure”, in short CLARIN. European research & education networks provide fast, secure and stable access to the vast amounts of CLARIN data and services. Also, Federated Identity technology developed by R&E networks make it easy to access CLARIN, enabling academic users to login with their existing institutional credentials, rather than requiring to register a new username and password for each individual web application.

The concepts of Open Science and Open Data are important to research as a whole. Not least for linguists. They live in exciting times: suddenly they have access to enormous amounts of data – or Corpora as they call it – and powerful statistical software to uncover hidden structures and meanings in the data.

Our only defence

But there is a downside to that development, a downside that may be an additional reason for linguistic researchers to insist on maximum transparency.

According to Laura A. Janda, initiator of TROLLing and professor of Russian Linguistics at UiT The Arctic University of Norway, in recent years many statistically capable linguists have been picked up by tech giants like Google, Apple and Facebook.

“These corporations are doing a lot of clandestine research on you and me, and everything they do is kept under cover. They are using linguistics and data techniques to spy on us, and various governmental organizations are doing similar things,” she said.

“I think this is pretty much unstoppable. But in my opinion the best counter measure against this is to make things as public as possible, and put everything in plain sight. In fact, I believe this is our only defence.”

Further information:

The repository: https://trolling.uit.no

The blog/info site: https://info.trolling.uit.no

See also TROLLing on Twitter: @TROLLingRepo

The CLARIN infrastructure: https://www.clarin.eu/

TROLLing participates in the EU project Social Sciences & Humanities Open Cloud SSHOC