“It is expected that in the next few years all newborns will have their whole genomes sequenced. With medicine increasingly built upon genome processing, it is evident that managing genomic data will shortly be a very serious problem.”
Aisling O’Driscoll, Jurate Daugelaite and Roy Sleator, Journal of Biomedical Informatics, 2013
As we enter an era of genomics-based healthcare, critical decisions are increasingly genetically informed and personalised to each patient. But with one single sequenced human genome file equal to about 3.2 GB, the required computing power and, by extension, data-communications needs are growing exponentially. The Catalogue of Human Genetic Variations (1000 Genomes) alone has produced 464 TB for just 1000 samples!
Next-generation sequencing technologies are changing the biology landscape, flooding the databases with massive amounts of raw sequence data. To realise the promises of more effective disease diagnosis and research it is essential to rethink the data distribution model in a robust network environment to avoid bottlenecks and thus to allow medical doctors and researches access to these critical data sets.
Providing such a distribution solution for genomic data is at the very heart of the activities of the ARES project (Advanced networking for the EU genomic Research), a collaboration between students and researchers at the University of Perugia (UoP) and the Polo d’Innovazione di Genomica, Genetica e Biologica SCARL in Italy.
Driven by the foreseeable unsuitability of current data networks for supporting genomic processing services in the near future, in 2014 the ARES team submitted a proposal within the GÉANT Open Call Innovation programme with the aim of getting a better understanding of the network problems related to the sustained proliferation of genomic data.
The strategic objective was to design and deploy pilot network solutions over the pan-European GÉANT backbone to exchange genomic data and thus allow thousands of medical doctors and genetics scientists across Europe to download them fast and efficiently for diagnosis and research purposes. This resulted in the creation of an advanced Content Distribution Network (CDN) architecture, accessible through a cloud interface as well as open source software packages.
ARES now efficiently and effectively returns a diagnosis by processing sequenced data sets stored in nodes via software pipelines selected by the users. The priority level of the diagnosis request is managed by differentiated network services in line with the degree of seriousness of a particular situation needing genomic processing.
ARES contributes to the needs of EU strategic R&E communities involved in genomic research in relation to content delivery. The end results are: measurable cost reduction, infrastructure virtualization and consolidation, improved services for specific research and medical needs and competitive advantages over legacy solutions.
ARES has been instrumental in innovating an infrastructure that is likely to assist genomic doctors and researchers at potentially thousands of hospitals and clinics across Europe, supporting ground-breaking research, such as studying particular gene mutations. In doing so, they are contributing a new CDN solution for European users that has the potential to be rolled out across the GÉANT network as a service for handling large data sets, beyond genomics.
ARES also aims to provide tools for allowing medical centres without computational or genome-sequencing resources to get a diagnosis on patients that have a sequenced genome stored in a node shared by a network such as GÉANT.