|
Bioinformatics is a hybrid term of BIOlogy and INFORMATion technology that is application of information technology in biological studies. Bioinformatics encompasses many other technologies within itself. Bioinformatics is combination of various disciplines that includes biology, information technology, biochemistry, and mathematics. In this field of science and technology we use computer technologies and statistical methodology to manage and analyze a huge amount of biological data that is about DNA, RNA, and proteins sequences, structures of proteins, interactions of proteins and genetic expression profiles.
It also involves development of algorithms and statistical techniques to analyze and determine relationship between biological data, development of databases to store and retrieve biological data, and development of tools to identify, interpret and mining of datasets. This field of science plays an important role in the study of fundamental biological problems owing to the explosion of sequence and structural information with time, which is likely to grow exponentially.
However, like in other technologies, bioinformatics has to overcome some challenges. These can be broadly put into two categories: 1) Data Management 2) Knowledge Discovery.
The challenge to data management involves management and integrity of existing biological databases. Widely used biological databases are:
- Primary nucleic acid database: GenBank (NCBI), Nucleotide Sequence Databases (EMBL), and DNA data bank of Japan (DDBJ)
- Protein sequence databases: SWISS-PROT, and TrEMBL
- Structural Databases: Protein Data Bank (PDB), and Macromolecular Structure Database (MSD)
- Literature databases: Medline
The transformation of huge amount of biological data into useful information and valuable knowledge is the challenge being faced by knowledge discovery. Identification and interpretation of interesting patterns which are hidden in trillions of genetic and other biological data is critical goal of bioinformatics. This embeds identification of useful gene structures from biological sequences, deriving knowledge from experimental data, and extracting scientific from the available literature.
The research in bioinformatics is all about the knowledge discovery, sequence analysis, structure analysis, and expression analysis. Sequence analysis involves discovery of functional and structural similarities and differences between multiple biological sequences. This is usually done by comparing the unknown (new) sequence with well studied and annotated (known) sequences. It is found that two similar sequences possess the same functional role, regulatory or biochemical pathway, and protein structure. If the two similar types of sequences are from different organisms, they are called as Homologous sequences. Homologous sequence analysis is important for predicting the nature of protein and this can be of great help in development of new drugs and in the performance of phylogenetic analysis. Another method for sequence comparison is sequence alignment. This is based on a procedure for base-by-base comparison of two (pair wise) or more (multiple) sequences by searching or a series of individual characters or character patterns that are in the same order in the sequences. String matching technique is widely used for an identical character or character patterns. An active research area in this field is gene prediction. Gene prediction is a process to detect meaningful signals in uncharacterized DNA sequences. It uses homology search to acquire knowledge of interesting information in DNA.
Structure analysis is all about the study of proteins and their interactions. The understanding of protein structures and their functions can lead to new approaches for diagnosis and treatment of diseases and discovery of new drugs. The ongoing research on protein structural analysis involves comparison and prediction of protein structures.
Expression analysis involves gene expression analysis and gene clustering. Gene expression analysis determines the similarities or differences of genes expressed in a particular cell type or tissue. Gene expression, represented in a matrix, can be determined in two ways. First by comparing the expression profiles of genes, if the expression profiles are similar, the genes are co-regulated and functionally related. Secondly, by comparing the expression profiles of samples, this can be determined whether the genes are expressed differently. Genes clustering aims to group together the genes with similar expression profiles. Genes in a specific group are regulated and functionally related to each other rather than to genes in different groups. However, due to the complexity and gigantic volume of biological data the current traditional information technology and algorithms fail to solve the complex biological problems in the real world.
|