According to experts, the world needs Data Stewards – and a lot of them. Estimates say, that in Europe alone 500.000 Data Stewards will be needed over the coming years. They are the “humanware” to connect the vast amounts of data in the digital domain with the people, who want to protect, harvest or exploit its potential.
Data Stewards have an important future role to play in all parts of society, in the public sector as well as in businesses, and needless to say, in the research and education sector as well.
As research and education becomes increasingly data driven, national and international frameworks for responsible research data management have emerged, setting up requirements concerning data use, data protection, and terms of customer services. Also, the vision of Open Science, and concepts like FAIR data (findable, accessible, interoperable, reusable), requires the research community to live up to new standards of research data management. Here, Data Stewards can take on tasks and responsibilities of documenting, curating and structuring data across a scientific domain. They can align data processes and applications, help developing and enforce data governance and compliance, and facilitate FAIR data.
As providers of connectivity for research and education the R&E networks have an interest in promoting data driven research, and therefore they engage in supporting Data Stewardship. As in Denmark, where local NREN DeiC is working to pave the way for a formal Data Steward education. Michael Svendsen, special advisor at the Department for Research Support at the Copenhagen University Library, and involved in the initiative, explains:
– We all have a shared responsibility to promote well-curated, well-documented and well-structured data. That goes for the network providers, the research infrastructures, the researchers and the research libraries. The biggest part of this challenge is about humans and the ways in which humans collaborate. We can’t solve all our problems with Data Science, Machine Learning and Artificial Intelligence. You still need a human touch point to handle the complexity of working with data in organizations. That is what Data Stewards are for, and they will be collaborating closely with data managers, data scientists, security specialists and IT staff.
– The concept of a Data Steward is not a new one. It has existed for more than 20 years. But what is new in the academic sector is that Data Stewards are to become more responsible for the handling, collection and the interoperability of data e.g. in regards to responsible data management in businesses and the European Open Science Cloud, EOSC, and the FAIR data principles in research performing organisations.
– Both enterprises and research organisations and the public sector need Data Stewards. Researchers are good at working with data relevant to their discipline specific domain of research. But documenting, organising and preserving data is labour intensive and time consuming, and it is also a challenge to many researchers e.g. to provide the metadata required for their research data to fit into a FAIR ecosystem. FAIR is all about documenting your data with good metadata and providing consistency and context. It is about connecting and persistently linking all parts of the data with other taxonomies and vocabularies, and to organize, structure, and curate data in a way that increases its value. That is what good data stewardship should do.
As Michael Svendsen points out, all stakeholders in the R&E sector have a shared responsibility in this, including R&E networks. And they are chipping in. DeiC and a number of other R&E networks are promoting new strategies for Data Management, and contributing to establish a formalised Data Steward education as well. And Finnish CSC and Dutch SURFsara are doing something similar on a European level in the FAIRsFAIR project, which aims to embed FAIR education in university programs, and supply practical solutions for the use of the FAIR data principles throughout the research data life cycle.
– Network, infrastructures, HPC, and data management – everything has to connect even closer in the future. R&E connectivity providers, apart from offering fast and powerful connections, are already contributing to the effort of gluing all this together in more meaningful workflows.
– This trend will continue. They key is to have a diverse set of tools, software and infrastructure at your disposal. You have to be able to easily model your data to document it, almost in real-time, with timestamp, location etc. You need a suite of software to make your data findable and reproducible and an infrastructure provider to make in interoperable. All this has to be interconnected in a workflow.
– A few research domains are leading the way. Life science – producing more than half of the data in the academic sector – has extremely well structured data that are highly interoperable due to the use of well-defined semantics in linked data graphs, such as the RDF technology (Resource Description Framework). If you take a look at the Linked Open Data Cloud (https://lod-cloud.net/) showing the relationship between large data bases and sources on the Internet using RDF, you can see how Life Science databases dominate the picture. They have done a great job in regards to becoming more FAIR and interoperable.
– Other research domains will follow, and hopefully, an increasing number of Data Stewards will do their important part to make the vision of an internet of FAIR data and services and the EOSC a reality.
On the 4th of October 2019 Danish NREN DeiC held a 1-day conference “Data Stewards – competition or coordination?”, gathering academia, industry, and public sector to explore the interest and requirements for a nationally coordinated and internationally aligned effort towards data steward education.