Proteomics project faster with grids and lightpaths

The research field of proteomics contributes to important new medical insights, producing vast amounts of data along the way.  To convert these massive amounts of proteomics data into knowledge requires dedicated computing and storage capacity. Dutch Research & Education network SURF has created a dynamic lightpath to connect a number of life science compute clusters, speeding up proteomics research, and enabling researchers to exchange data.

Proteins play an important role

Dr Peter Horvatovich, assistant professor in bio-informatics at the University of Groningen, is developing software to process large quantities of data generated during proteomics research. Horvatovich is an expert in the field of proteomics, the study of the proteome. He explains:

“The proteome is the collective term for all the proteins in an organism that are produced on the basis of the genome, the genetic information present. These proteins differ from cell to cell and they are constantly changing over the course of life due to all kinds of biochemical interactions, for example with the environment.”

In many medical conditions, defects in and between proteins play an important role.

“Proteomics research can contribute to new medical insights, for example in the form of new therapies,” says Dr Horvatovich. “The research focuses, for example, on what proteins occur in a cell and in what quantities (the protein profile), what changes proteins undergo, and what interactions occur between them. We aim to be able to identify cancer cells, for example, with the aid of biomarkers, which are proteins that function as indicators.”

Growing quantities causing delay

The proteomics studies are intensive. “Using a mass spectrometer produces a particularly large amount of information,” says Dr Horvatovich.

“It generates between ten and a hundred thousand results from just a single sample. Bio-informatics is the discipline that brings together biology and IT. We use IT to store data in databases, to analyse information, and to convert data into useful knowledge.”

Distributing the growing quantities of information to computer centres and storage locations was causing a great deal of delay. Processing the information generated during the research requires a lot of computing capacity.

Alleviating bottlenecks

To solve the challenges of Dr Horvatovich and many other life science researchers, a multi-faceted solution was created combining better network connections and access to computer facilities. To speed up the research process lightpaths were used to transmit the data faster to researchers at other locations, and process and store on supercomputers, which are only available in a few places in the Netherlands.

“Using SURF’s dynamic lightpaths has alleviated those bottlenecks,” says Dr Horvatovich. “We can now share data between the various computer centres. And we can also process it significantly faster and cheaper. Those locations were connected by lightpaths. That created opportunities for our research.”

The computing capacity and the associated storage capacity are available in Groningen, at the SURF High Performance Compute center in Amsterdam, and also in Utrecht, Rotterdam and Delft. This Life Science Grid is a network of compute clusters intended specifically for researchers in the life sciences. The Life Science Grid (LSG) consists of 10 compute clusters. Each member institution has access to an on-site compute cluster managed by SURF. Using the grid structure gives researchers access not only to their own local LSG cluster, but also to other LSG clusters. Institutions can scale up to greater computational power or storage capacity when performing large-scale analyses.

Published: 07/2018

For more information please contact our contributor(s):