Enabling researchers to reuse sensitive biomedical data

Genome sequencing has become an integral part of medical practice. The value of the data can be multiplied to benefit not only the patient in question, but thousands of patients when combining the various data sets and finding underlying patterns. A test led by CSC, the Finnish national research and education network (NREN), has shown this to be possible in a safe manner respecting the data privacy of patients.

CSC has recently developed new services for sensitive data management, CSC Sensitive Data services for research. These services also allow for publishing biomedical data under controlled access via Federated EGA (European Genome-phenome Archive), a network of European repositories for biomedical data.

The original version of EGA was launched in 2008 by the European Molecular Biology Laboratory of the European Bioinformatics Institute (EMBL-EBI) located in UK. EGA stores data from about 4,500 studies from more than 1,000 institutions worldwide. The data can be reused for further analysis in compliance with the consent provided by the individual patients.

A scientist wanting to use data must submit a data access request. Each dataset in EGA is linked to policies which specify the conditions of reuse. Further, each dataset is managed by a data controller who approves or denies access based on an evaluation of the request. The data controller is represented by a Data Access Committee nominated for each dataset.

Datasets stay in the country of origin

Now, the EGA will be supplemented by the Federated EGA – a network of repositories across Europe. Here, the sensitive data is not stored in the central EGA archive, but in the country of origin. Federated EGA is managed in the digital life science infrastructure ELIXIR for which CSC is the Finnish node.

In collaboration with EMBL-EBI, the Centre of Genomic Regulation in Spain, and the ELIXIR nodes in Norway and Sweden, CSC completed the first end-to-end test for Federated EGA.

“During the demonstration, we simulated the entire data submission process: preparation of legal agreements, encryption, uploading sensitive data to the Finnish Federated EGA node at CSC. Finally, the public metadata was shared with the central EGA,” says Francesca Morello, Customer Liaison Officer for the CSC Sensitive Data services offices.

The dataset access was linked to the Data Access Committee decision process, and the data release process was finalized with the publication of the dataset and its permanent identifier (access number) on the Central EGA webpage.

“Data remains under control of the Data Access Committee even though it is shared and can be discovered in the federated network,” Francesca Morello notes.

Simple web user interface

An important technical element of the solution is standardized machine-readable messages which allow service providers to establish the researchers’ identity, affiliation, and data access permissions when they login to a specific service.

“For example, for datasets held in the Finnish Federated EGA node, data access requests are possible using a service called Sensitive Data Apply. This simple web user interface allows the Data Access Committee to easily review, approve or deny access to a specific dataset. Once the request is approved, the applicant is given access to the data to analyze it in a private cloud environment called Sensitive Data Desktop via a web browser. The virtual Desktop is a secure encapsulated environment where data export or download requires specific authorization,” explains Francesca Morello, adding:

“With this tool, the original copy of the dataset never leaves the country of origin, and in compliance with European and national regulations no extra copies are created after each application.”

Published: 06/2022

For more information please contact our contributor(s):