|
Proteins can be compared in terms of sequence similarity or structural similarity. There is one major difference between sequence and structural similarity. Significant sequence similarity is usually an important indicator of an evolutionary relationship between sequences. Contrary to that, significant similarity is common, even among proteins that do not share any sequence similarity or evolutionary relationship.
The similarity between two protein sequences can be assessed by sequence comparison. Proteins sequence alignment, the problem of degeneracy in the genetic code (where multiple DNA triplets may code for the same amino acid) does not occur. In addition, it is much less likely that two proteins will have the same amino acid (letter), by chance alone, at any position, since protein sequences are written with a 20-letter alphabet. Many of the sequence alignment and comparison tools that are used for DNA sequence comparison can also be used for protein sequences. In protein sequence alignment, the amino acid sequence of a protein is aligned to another amino acid sequence, with possible insertions (i.e., gaps) and deletions, such that the distance between the two sequences is minimized or the similarity score is maximized.
To align and assess the similarity of two protein sequences, the varying degrees of similarity between amino acids needs to be taken into account. The varying degrees of similarity reflect the different likelihood of one amino acid being substituted for another during the course of molecular evolution. Qualification of the similarity between amino acids is by means of scoring matrices. The 20 by 20 matrices, relating each amino acid to every amino acid, fall into the PAM, percent or point accepted mutation, and BLOSUM, BLOcks SUstitution Matrix classes.
The 3D structure of a protein can be determined by X-ray crystallography or nuclear magnetic resonance (NMR) spectroscopy. The procedure for X-ray crystallography requires the protein to first be in the form of crystals. A narrow beam of X-rays is then directed at the protein crystals, where the atoms in the protein molecules scatter the incoming X-rays. These scattered waves either reinforce or cancel one another, producing a complex diffraction pattern. The position and intensity of each spot in the diffraction pattern crystal can be deciphered by a computer. By combining this information with the known amino acid sequence of the protein, an atomic model of the protein’s structure can be generated. In NMR spectroscopy, a solution of pure protein is placed in a strong magnetic field and then bombarded with radio waves of different frequencies. The hydrogen nuclei in the protein would generate an NMR signal that can be used to determine the distances between the amino acids and between different parts of the protein. The NMR spectrum, together with the known amino acid sequence, would allow us to compute the 3D structure of the protein.
A major goal of bioinformatics and structural molecular biology is to understand the relationship between the amino acid sequence and the 3D structure in protein, and to predict the fold based on the amino acid alone. The protein folding problem is often described as the most significant problem remaining in structural molecular biology, and to solve the protein folding problem is to break the second half of the genetic code. Solving the protein folding problem is the key to rapid progress in the fields of protein engineering and rational drug design.
However, protein fold prediction from an amino acid sequence is still a distant goal, and most current algorithms aim at predicting only the secondary structures, such as α-helices, β-strands, and loops/coils. The prediction of the secondary structure is an essential intermediate step on the way to predicting the full 3D structure of a protein. If the secondary structure of a protein is known, it is possible to derive a comparatively small number of possible tertiary (3D) structures using knowledge about the ways that the secondary structural elements pack.
All secondary structure prediction methods assume that there is a correlation between the amino acid sequence and the secondary structure and a short stretch of sequence may form one type of secondary structure than another. Major computational methods of secondary structure prediction are (1) statistical feature based methods, (2) nearest neighbor method, and (3) neural network-model method.
|