Leaders in big data and biomedicine

Research in biomedicine would not be possible without the massive use of data. It is the only way to find solutions in the fight against cancer, to explain how millions of neurons give structure to our brains or to carry out virtual trials for drugs. Big data ensures that medicine can move forward by leaps and bounds.

© Òscar Julve

Catalonia is playing an important role in some of the most ambitious projects on big data and biomedicine, such as the Cancer Genome Project (specifically looking at leukaemia), the Human Brain Project and the ENCODE Consortium, which aims to shed light on the parts of our genome we know least about. Many more projects are under way which require enormous quantities of data to test molecules that are candidates for use in drugs. Sorting between the many candidate compounds and using a virtual environment to simulate their effects leads to faster progress and greater confidence in results before starting clinical trials.

More than eight hundred scientists are working in the field of bioinformatics in Catalonia, some of them internationally renowned experts. Catalonia is also home to state-of-the-art facilities for storage, analysis and production of data such as the Barcelona Supercomputing Center (BSC), where the MareNostrum supercomputer was recently upgraded and equipped to store and analyse much larger amounts of data, and the National Centre for Genome Analysis (CNAG), which has cutting-edge sequencing machinery.

Genetic data mining

Big data is nothing new to biomedicine. “The first massive databases were built in the 1950s, when storage of protein sequences began”, explains Roderic Guigó. He is coordinator of the bioinformatics programme at the Centre for Genomic Regulation (CRG) and has been recognised as one of the world’s leading experts on bioinformatics since the first human genome was obtained in 2000 (he was one of the few Europeans involved). However, it was not until the 1980s, when early computers became widely available, that it was possible to take advantage of the first electronic databases. “In 1983, data mining led to the first oncogene being found”, recalls Guigó. The three billion base pairs in one human genome take up three gigabytes. “It doesn’t seem like much, but with lots of them it quickly adds up”, he points out.

Nowadays research in biomedicine would be inconceivable without big data, something that, according to Guigó, means two major challenges: the power needed for complex calculations and storage capacity. In Europe, Catalonia plays a key role in both of these areas. It is no coincidence that the European Bioinformatics Institute (EBI) was fully confident about setting up the headquarters for the European Genome-phenome Archive (EGA) in Barcelona, led by the CRG. This archive stores the genetic data of a hundred thousand patients who have taken part in over seven hundred scientific studies on cancer, diabetes, cardiovascular and autoimmune diseases and numerous other pathologies.

There is only one other genetic database in the world of a similar size, led by the National Institutes of Health (NIH) in the United States. The Catalan archive contains data from work conducted with both healthy and sick people in trials at around two hundred sites across the globe. The EGA safeguards genomes (genetic data) and phenomes (phenotypic data, which range from hair or eye colour to the types of conditions suffered by the people taking part in the trials).

Researchers from all around the world – including those working in non-profit organisations – have access to these data. In just the first four months of 2014, data stored by the EGA were downloaded more than two hundred thousand times by almost five thousand research groups on every continent. Among the many prize assets held in the Barcelona EGA headquarters is the data from one of the most ambitious projects ever undertaken to study seven complex diseases, carried out by the Wellcome Trust using data from over five thousand people. Scientists have free access to this project.

© Albert Armengol
Roderic Guigó, coordinator of the bioinformatics program at the Centre for Genomic Regulation (CRG) and one of the principal experts in bioinformatics worldwide.

The complete map of leukaemia

The EGA also holds the genetic data of thousands of genomes sequenced as part of the International Cancer Genome Consortium, an ambitious worldwide project which seeks to build a complete genetic map of every type of cancer. The consortium was launched in 2008 and studies more than forty types of cancer; the work is divided into a number of different projects, one of which is set in Barcelona. Each project studies a minimum of five hundred patients.

Participants in Barcelona include the National Centre for Genome Analysis, the Barcelona Supercomputing Center and researchers from the Hospital Clínic. Elías Campo, head of the research team on human and experimental functional oncomorphology for the IDIBAPS research institute at the Hospital Clínic, codirects one section of this macro-project, the Spanish Chronic Lymphocytic Leukaemia Genome Consortium (CLL). The team has completed the genome of one hundred and fifty people and the exome of four hundred. The exome is made up of areas in the genome containing genes that provide code, forming the messenger RNA that, when transcribed by cellular mechanisms, creates proteins. It is the most important functional part of the genome as it determines the organism’s final constitution.

The whole brain in one supercomputer

Every year around sixty thousand high quality scientific articles are published on the brain. However, none of them tells more than one part of the story. As a result, despite all these efforts the brain remains a relatively impregnable black box. There are scientists that dream of bringing all these data together to form one great virtual brain in which each neuron, each electrical pulse, each neurotransmitter and each brain circuit could be recreated. This could provide understanding on the activity that happens when, for example, a thought occurs or a decision is made. Details could also be uncovered regarding what goes wrong in the more than five hundred brain-related diseases that affect a third of the European population, many of which currently have no cure.

Work on making this dream a reality has already begun at more than eighty research centres around the world (mostly European) that are participating in the Human Brain Project (HBP). This ambitious scheme is led by the Swiss Federal Institute of Technology Lausanne (EPLF), and there are two Catalan research centres involved, the Barcelona Supercomputing Center (BSC) and Barcelona’s Institute for Research in Biomedicine (IRB Barcelona).

The BSC and the IRB are researching modelling the molecular complexity established between two neurons. “A neuron is like a switch”, explains Modesto Orozco, who is head of the project at the IRB which provides the BSC with the mathematical data to use in the modelling. “Our goal is to simulate interactions between neurons on an atomic scale. That would enable us to model studies of drugs that could change synaptic transmission properties.”

It involves making mathematical models of the electric potentials and the generation of molecules between one neuron and another. One of the areas of study will be ion channels, complex protein mechanisms that open and close to allow or prevent the circulation of ions between neurons. “We want to visualise and model how synapses work, and how their effect can be blocked or increased”, says Orozco. These channels can be altered by external factors, such as drug abuse, the side effects of some medications or some diseases.

Based on the models that are to be created, data will be gathered to explain, for example, why some people suffering depression respond to drugs while others do not, and why some experience significant side effects while others feel them much less. In addition, it will make it possible to gain a better understanding of other diseases such as schizophrenia and Alzheimer’s. “We’ll be able to reconstruct the architecture of memory”, says Orozco, “and essentially, at a molecular level, understand what makes us human”.

Mònica L. Ferrado

Science journalist. Head of science for the newspaper Ara

Leave a Reply

Your email address will not be published. Required fields are marked *