Bioinformatics: open access databases help researchers worldwide
Twenty-three pairs of chromosomes, around 23,000 genes, and more than three billion base pairs. This is the size of the human genome. It contains more than just information about whether a person has green eyes or brown hair. If you know where to look and what you are looking for, the human genome also offers up clues as to whether the person is at risk of getting cancer. To ensure that physicians, geneticists and biologists don’t have to spend years searching through massive charts, they have teamed up with bioinformatic specialists to develop algorithms that enable them to search through vast sets of data more quickly using a computer.
One of these specialists is Dr Jan Grau who conducts research at the University of Halle in the working group “Pattern Recognition and Bioinformatics” led by Professor Stefan Posch. “Our methods can be applied to more than just human DNA, they are also used in the field of plant genetics and for the genomes of bacteria,” says Grau. The approach of using information technology to analyse large quantities of data and make new discoveries has gained a foothold in many areas of science and in parts of the humanities and social sciences.
Often several research groups conduct research on similar topics and questions simultaneously around the world. It would be exceedingly time-consuming if scientists would always have to start over again decoding genomes or analysing the structure of proteins. “This is why researchers save their datasets in large, publicly accessible databases – so-called repositories,” Grau reports. This means researchers in Halle can access data that was obtained by other research groups around the world. “We profit from the open access data in the large databases. In return, we upload the data, which we obtained together with experiment partners, to public databases. We also give back to the community newly developed methods that were frequently developed and tested using open access data.” The methods he is referring to include algorithms that are better able to quickly search for similarities or patterns in large datasets – for instance the human genome contains around three gigabytes of data.
Currently Grau is working on a research project with Professor Jens Boch, a plant geneticist formerly of Halle who now works at the University of Hannover. Their project examines the genome of bacterial pathogens that infest the rice plant among other things.
There are many repositories around the world in which scientists make their raw data publicly available to other researchers. Two of the largest platforms are the portal “GenBank”, operated by the National Center for Biotechnology Information in the US, and the databases of the European Bioinformatics Institute. Alongside these are many small databases for specific fields.
Many international journals are now demanding that the raw data on which a research article is based be made publicly accessible. This allows other scientists to review the details of a paper and enables them to use this data in their own research. At the same time, data from different sources can be combined to answer questions that could not be explored using individual datasets, for instance, evolution in the plant and animal kingdoms.
Access to the data can be limited in part so that other colleagues cannot publish the data before the researchers who collected it. In this case the data can only be seen by the reviewers until the article is published. Researchers can also decide to initially publish their data, but to limit its use by other publications until their own article has been published.
Contact: Dr. Jan Grau
Bioinformatics
Phone.: 0345 5524768
Send an email