Bioinformatics is the branch of science that addresses biological questions with the help of computers, software tools and databases. It is a subdiscipline of computer science and biology which involves the acquisition, storage and analysis of biological data. This biological data is stored in the molecular form inside DNA, RNA and complex amino acids. The biological data of any organism is accessed and analyzed with the help of bioinformatics tools.
With the help of computer programs, Bioinformatics studies the function of genes and proteins, it traces the evolutionary relationship between the organisms, and it also predicts the 3D shapes of proteins.
Bioinformatics is thus a study of the patterns of genes. It closely analysis the biological molecules and then try to relate organisms with each other, after noticing the common pattern.
Bioinformatics is an interdisciplinary branch of Life Sciences. The term was coined by Paulien Hogeweg and Ben Hesper in 1970. It involves the analysis of a huge amount of biological data. Bioinformatics largely helps to store, systematize, organized, visualize, mime, understand and interpret complex data.
Statistics, Mathematics, cloud computing and modern computer science contribute to Bioinformatics in pattern recognition and reconstruction of the data. The importance of bioinformatics is getting increased with the advancement of computer technology.
Bioinformatics is used in the following fields-
- Microbial Genome Applications
- Molecular Medicine
- Personalized Medicine
- Preventative Medicine
- Gene Therapy
- Drug Development
- Antibiotic Resistance
- Waste Cleanup
- Biotechnology
- Bio-Weapon Creation
- Veterinary Science
- Forensic Analysis etc.
Bioinformatics is capable of doing different functions in the molecular sequencing process. For example- we can take the molecular sequence of DNA or RNA, and then with a wide range of analytical methods we can understand the features, function, structure and evolution of that particular molecule. The same goes for any other molecule. Thus bioinformatics primarily concerns the molecular sequencing of different organisms.
Prominent bioinformatics examples include-
- Database interfaces- GenBank, Medline, SwissProt
- Sequence alignment- BLAST, FASTA
- Multiple sequence alignment- ClustalW, MultAlin, etc.
- Gene finding- Genscan, GenomeScan, GRAIL etc.
- Protein Domain analysis and identification- Pfam, BLOCKS, ProDom
- Pattern Identification- Gibbs sampler, AlignACE, MEME
- Protein Folding prediction- PredictProtein, SwissModeler.
These were some important examples of bioinformatics. Now let us trace the history of bioinformatics.
History of Bioinformatics
The term Bioinformatics was coined by Paulien Hogeweg and Ben Hesper in the year 1970. However, the emergence of Bioinformatics can be traced back to the 1960s. The reason this field emerged is due to the development of protein sequencing methods.
Fredrick Sanger calculated the sequence of Insulin in the early 1950s. So as the protein sequences developed there was a need for a tool that would analyze and compare a huge number of protein sequences with each other. It was manually impossible to read these sequences and analyze. To give you an idea, a human genome consists of 3 Billion pairs of DNA strands. And figuring out the exact order of these DNA strands is next to impossible without computing powers.
So, as the analysis and comparison of the protein sequences looked impossible, the researchers worked on developing computer methods that would help them. That’s when the first “Protein Information Resources” (PIR) was developed by Margaret Oakley Dayhoff and her collaborators at the National Biomedical Research Foundation.
The PIR (Protein Information Resources) was like an atlas (a map) of protein sequences. These protein sequences were classified into different groups and subgroups according to their sequence similarity and percent accepted mutation (PAM) matrices. This atlas has been used widely in the field of Bioinformatics since then.
In the 1970s, Elvin A. Kabat further led to the development of the field by doing extended protein sequence analysis of antibodies. He collaborated with Tai Te Wu on this project and released the antibodies sequence b/w 1980 and 1991.
Further, the field was enriched by the following events-
· Collection of DNA sequences into GenBank* during 1982-1992. GenBank was prepared by Walter Goad’s group.
*GenBank is a comprehensive database that contains publicly available nucleotide sequences for more than 300,000 organisms.
· The DNA sequencing database became more useful when the researchers developed web-based searching algorithms. This helped researchers to find and compare the data they needed at that moment. And saved them to go through the entire sequence when they needed just a part of it only.
· Subsequently, a software developed called GENEINFO, by which researchers could rapidly search the sequences and match them with the other sequences.
· After that other software developed by the National Center of Biotechnology Information (NCBI), helped in the analysis, comparison and visualization of Molecular sequence.
· Development of FASTA and BLAST* greatly improved the biological data analysis
* FASTA and BLAST are two similarity searching programs that identify homologous DNA sequences and proteins. They provide facilities for comparing DNA and proteins sequences with the existing DNA and protein databases.
· Other than this, tools were developed for predicting the putative protein sequences, their structures and the functions of proteins based on DNA sequences. Full genome sequences were completed with these tools and a base genome database of various organisms was created.
· Finally, the ability for identification, data storing, mining and querying for large volumes of biological datasets has led to the unprecedented popularity and applications of Bioinformatics.
To further clarify the history of bioinformatics, following the timeline of events would be helpful.
Year | Development |
1952 | Alfred Day Hershey and Martha Chase proved that the DNA carries genetic information |
1961 | Sidney Brenner, Francois Jacob, Matthew Meselson identified RNA |
1962 | Pauling gave the theory of molecular evolution |
1965 | Margaret Dayhoff developed an Atlas of Protein Sequences |
1970 | Needleman-Wunsch algorithm developed |
1977 | Software to analyse DNA sequencing developed |
1981 | Smith-Waterman algorithm developed |
1982 | GenBank Release 3 came |
1983 | Sequence database searching algorithm |
1988 | Creation of National Center for Biotechnology Information (NCBI) |
1988 | EMBnet network developed for the distribution of the database |
1990 | BLAST, software for fast searching of sequence |
1995 | First bacterial genomes sequenced completely |
2001 | Human genome gets published |
Applications of Bioinformatics
Bioinformatics finds application in multiple fields such as molecular medicine, climate change, Biotechnology, forensic analysis, Bioweapon creation, Alternative energy sources, agriculture, veterinary science, etc.
The importance of bioinformatics can be guessed by the following applications-
Molecular Medicine
The human genome contains information which if analyzed and interpreted correctly can affect the fields of biomedical research and clinical medicine.
We can understand the molecular basis of diseases more clearly, by studying the genes directly associated with a particular disease.
Understanding the molecular information with the help of Bioinformatics can enable better treatments and cures.
Personalized Medicine
Pharmacogenomics is the study of how a person’s inheritance affects the response to drugs by his/her body. With the development of bioinformatics, we can easily analyse what affects an individual the most and then we can give adequate medicine which suits the individual in the best possible manner.
Trial and error methods used by doctors would get eliminated with proper information. This information can only be obtained through Bioinformatics.
Gene Therapy
With the advancement of Bioinformatics, it would be possible to cure diseases by using genes.
In gene therapy, the disease is treated or prevented by changing the expression of the patient’s genes.
Waste Cleanup
Some environments or sites which emit radiation and toxic chemicals, they can be cleaned up with the help of a bacteria called Deinococcus radiodurans. This is the toughest known bacteria, and through bioinformatics, scientists are exploring its potential.
Climate Change
Increasing levels of carbon dioxide emissions lead to global climate change. Recently, the Department of Energy USA has launched a program in efforts to decrease atmospheric carbon dioxide levels. In this program, one of the methods is to study the genomes of microbes that use carbon dioxide. The application of this study may lead to a decrease in the levels of CO2 emissions.
Bioinformatics Tools
The bioinformatics tools are basically the software programs that are used to save, retrieve and analyze the biological data, and used to extract information considered in that data.
The Bioinformatics Tools can be categorised in four groups:
- Homology and Similarity Tools
- Protein Function Analysis
- Structural Analysis
- Sequence Analysis
The following are some great bioinformatics tools that are developed after years of hard work by scientists and researchers.
- BLAST- The Basic Local Alignment Search Tool is used for comparing gene and protein sequences. The BLAST tool has several versions including PSI-BLAST, PHI-BLAST and BLAST 2. Some special BLASTs can study the human, microbial, malaria and other genomes.
- FASTA- This tool was developed for database similarity searching.
- EMBOSS- The European Molecular Biology Open Software Suite is an open software that caters to the needs of the molecular biology community. Within EMBOSS, one can find 100s of applications for- sequence alignment, protein motif identification and domain analysis, nucleotide sequence pattern analysis, codon usage analysis, and much more.
- ClustalW- ClustalW is a program for DNA and protein. This software produces sequence alignments of different sequences and then calculates the best match for these selected sequences so that one can find the identities, similarities and differences among them.
- RasMol- It is a computer program developed in the 90s. This program was intended for the molecular graphic visualization of macromolecule structures.
Other useful tools include CHEMSKETCH, WINCOOT, AUTODOCK, Swiss-Pdb Viewer, etc.
Conclusion
Bioinformatics is a rapidly developing field. It is expected that by 2025, the genome of 1 Billion people would get sequenced. This would lead to the generation of around 240 Billion GB of genome data. And this is just human data! The importance of bioinformatics is becoming relevant day by day.
It can be easily said that bioinformatics has replaced the microscope of biologists with computer systems. Only time will tell what great marvels this field brings.
IIT-JEE
Sciences