|
A portion of a sequence, called subsequence, may occur repeatedly and have distinctive feature. This portion is known as motif, and will act as the starting point for any analysis of biological data. The motif may recur often in a set of protein sequences, with some variations. Generally, it has some important functional role. For example, it may contain information preserved by the evolutionary process.
The pattern is generally used to represent anything that occurs repeatedly. It is used in all walks of life. From a statistical point of view, it is a repeated occurrence of sequential data. There are sequence and structure patterns that can be used to characterize proteins. Motifs are short patterns. A motif may represent biological information like the tertiary structure of the protein. On the other hand, a pattern is statistical rather than biological. A pattern can be defined as a motif if it is conserved strongly within a given set of sequences. All motifs are patterns, but not vice-versa.
The genomic code that determines when and where transcription occurs is written in DNA sequence upstream and downstream of the transcription start site. Nuclear proteins that bind to these sequence elements in a sequence-specific manner determine the transcriptional activity of the locus. In general, our knowledge of transcription control of any particular gene has been determined empirically. At one time, experimenters cloned individual cDNAs, and used them as probes to measure levels of mRNA one at a time. The availability of cDNA microarrays has permitted complete transcriptome profiling of individual tissue or cell types responding to a stimulus. However, if we could read the transcriptional code, such experiments would become partly redundant. We would be able to predict from genomic DNA sequence that a particular gene will be transcribed in a certain location and in association with genes that might contribute to the same pathway.
The process of transcription is initiated by the binding of RNA polymerase and associated proteins of the preinitiation complex to the transcription start point (TSP). Binding and initiation are controlled by transcription factors bound to elements within the vicinity of the TSP. Although transcriptional output is commonly measured experimentally as if transcription were an analog process, it is actually digital. In a higher eukaryote, each diploid cell has two copies of each strand of DNA, packaged in a nucleoprotein complex called chromatin and visible as a chromosome. At the level of a single gene on one chromosome, there are two forms of regulation that are, essentially, all or nothing. Firstly, the locus can be lineage, it switches off transcription of genes that will not be required for differentiated functions, and assembled into heterochromatin, which is transcriptionally silent. Aside from silent genes, within a given cell there are many other genes that are not actively transcribed at any particular time, and may be acutely and reversibly induced. They are in an active chromatin state, but signals from the environment determine whether mRNA is produced. There is a great deal of evidence that this is also a digital phenomenon at the level of individual DNA templates in single cells. Since transcription is a digital process, the transcriptional codes, and the binding of specific DNA binding proteins, work by determining the probability that a gene is available for transcription, and whether, subsequently, it is actually transcribed at any particular time. The actual level of mRNA in cells is also regulated at other levels; the rate of transcriptional elongation, splicing and processing, nuclear export and translation into proteins.
Transcription factors (TFs) are DNA-binding proteins that bind DNA on specific cis-acting regulatory elements to increase or decrease the probability of transcription. While genes of simpler eukaryotes contain relatively small numbers of binding sites for specific transcription factors, and these are usually found in a small window of 200-400bp 5’ of the transcription start point, mammalian genes are more complex. Enhancer and silencer regions that, respectively activate or repress transcription, may be found tens of kilobases upstream of the TSP, within the introns of the gene, or tens of kb 3’. For example, the CSF-1 receptor contains regulatory elements in the first intron. Many of these enhancer and silencer regions, like those of the CSF-1R gene, are highly conserved across mammalian species.
|