Image Source: Pixabay
Bioinformatics is generally defined as the application of computer science, mathematics and statistics to the development of algorithms and statistical models involved in the management and analysis of biological data. Although bioinformatics appears to be a new field of research, it has been around since the 1960s. The first person to see the potential of using computers in the fields of biology and medicine was the remarkable scientist Margaret Dayhoff. Dayhoff applied computational methods during her PhD thesis in electrochemistry to calculate molecular energies of organic molecules. During her career, she made several contributions to the bioinformatics field, being the most remarkable her publication of the first catalog of proteins the Atlas of Protein Sequence and Structure. Since then, she has been known as the pioneer of Bioinformatics.
The Human Genome Project: the propulsive turbine of bioinformatics
The Human Genome Project, initiated in 1991, aimed to identify the sequence of letters that make up human DNA and to map all protein-coding genes. In a cell, DNA stores the information that biologically defines an individual in letters A (Adenine), T (Thymine), C (Cytosine) and G (Guanine). This huge amount of information, which totals more than 3 billion letters, is stored and organized in 23 chromosomes. DNA is transformed, through a complex process, into proteins that are the basic components of life, as they perform several functions that ensure the cell's survival.
Mapping all protein-coding genes has several implications not only in the understanding of human biology, but also in health care. For example, it has allowed not only the identification of genes involved in normal biology, but also of genes linked to diseases. The identification of genes involved in pathophysiology such as Alzheimer's or cancers is crucial to identify the causes of such diseases, which will eventually lead to improved diagnosis and treatments.
Thus, the Human Genome Project became a key moment for the field of modern bioinformatics. It fueled the need to develop algorithms and models to understand the gems of information often hidden in the vast amounts of data that elude human discernment.
The role of bioinformaticians
In the Human Genome Project context, for example, programs were written to assemble whole genome sequencing (WGS). This may seem a bit mysterious, but in simple words, to discern the order of the complete sequence of a genome, given its huge size, it needs to be cut into small pieces of DNA. These pieces will be processed by a sequencing instrument that reads the DNA and converts it into the digital sequence of letters (ATCG). Therefore, the next step is to assemble all the sequences to obtain the global sequence of the genome which will allow finally to map all the protein-coding genes.
Crédit image : Maria Virginia Ruiz Cuevas
Similar to the genome sequencing, since the 2000s, we know how to reveal the presence of the key molecule in the transformation of DNA into protein: the messenger RNA molecule (mRNA). The RNA-seq technic reveals the RNA sequences present in the cell that will allow, among other things, the generation of proteins. These sequences are stored digitally as strings of characters written in the DNA-specific alphabet, allowing the information to be analyzed.
Crédit image : Maria Virginia Ruiz Cuevas
As biological data collection advances, bioinformaticians are looking to write more powerful programs capable of performing critical tasks and managing the large load of new data.
RNA-seq opens the door to a myriad of questions. Bioinformaticians must use their ingenuity to develop tools to study the intrinsic and often invisible relationships between DNA and its transformation into proteins.
In medicine, for example, bioinformaticians have contributed RNA-seq applications that provide information for inferring differential protein generation, disease biology, biomarkers, genetic diagnostics, and disease-associated single nucleotide polymorphisms (SNPs). Notably, in recent years, based on RNA sequencing data, they have developed several approaches to improve the selection and identification of candidate antigens for cancer vaccine design.
In addition, data analysis through these applications can raise important and challenging questions, which bioinformaticians can pursue. A quick analysis performed on the number of letters in DNA that code for proteins shows that only 2% of the 3 billion letters are actually used to transform into proteins. This observation certainly raises more questions than answers: why does the cell carry such a large amount of data if it appears to be useless? What could be the biological utility of the 98% of the genome that does not code for proteins?
It is thus an exciting time for bioinformaticians, as they have a key role to play in research. With a wealth of new data, there is much work to be done in many areas of biological research and therefore much to be discovered.