Académique Documents
Professionnel Documents
Culture Documents
By definition, a gene includes the entire nucleic acid sequence necessary for the expression of its product (peptide or RNA). Such sequence may be divided into regulatory region and transcriptional region. The regulatory region could be near or far from the transcriptional region. The transcriptional region consists of exons and introns. Exons encode a peptide or functional RNA. Introns will be removed after transcription. As shown in the following figure, a typical DNA molecule consists of genes, pseudogenes and extragenic region. Pseudogenes are nonfunctional genes. They often originate from mutation of duplicated genes. Because duplicated genes have many copies, the organism can still survive even if a couple of them become nonfunctional
Figure. General organization of the DNA sequence. Only the exons encode a functional peptide or RNA. The coding region accounts for about 3% of the total DNA in a human cell.
RNA Processing
RNA processing is to generate a mature mRNA (for protein genes) or a functional tRNA or rRNA from the primary transcript. Processing of pre-mRNA involves the following steps:
5'-Capping
Capping occurs shortly after transcription begins. The chemical structure of the "cap" is shown in the following figure, where m7G is linked to the first nucleotide by a special 5'-5' triphosphate linkage. In most organisms, the first nucleotide is methylated at the 2'-hydroxyl of the ribose. In vertebrates, the second nucleotide is also methylated.
3'-Polyadenylation
A stretch of adenylate residues are added to the 3' end. The poly-A tail contains ~ 250 A residues in mammals, and ~ 100 in yeasts.
Figure . Polyadenylation at the 3' end. The major signal for the 3' cleavage is the sequence AAUAAA. Cleavage occurs at 10-35 nucleotides downstream from the specific sequence. A second signal is located about 50 nucleotides downstream from the cleavage site. This signal is a GU-rich or Urich region.
Duplicated Genes
Most proteins do not need duplicated genes, because the mRNA molecule transcribed from one gene can be translated into many copies of its protein product. However, rRNA and tRNA are the final gene products. In order to accelerate the production process, all species contain an array of tandemly repeated RNA genes. The number of repeats ranges from tens to 24,000.
Number of RNA genes
*The X chromosome of fruit fly contains 250 copies of Pre-rRNAs, Y chromosome contains 150 copies.
There are four types of rRNA in mammalian cells: 28S, 5.8S, 5S and 18S. In the human genome, 28S, 5.8S and 18S are clustered together. They form a single transcription unit which will be separated by specific enzymes after transcription. " Pre-rRNA" refers to their precursor. In humans, a repeat unit for the pre-rRNA has about 40 kb in length, including a 13kb transcription unit and a 27-kb untranscribed spacer region. The transcription unit contains three spacers: ETS, ITS1 and ITS2. They will be removed during RNA processing.
b globin gene
Figure. Graphic view of the b globin gene, which consists of three exons and two introns, with a total length of 1.6 kb. This figure was obtained from NCBI. Gene family "Gene family" refers to a set of genes with homologous sequences. For example, H2A, H2B, H3 and H4 are in the same histone gene family. Their products have similar structures and functions. Another example is the b-globin gene family located on the chromosome 11. Figure. The bglobin gene family includes b, d, Ag, Gg and e. Y is a pseudogene. H S1 to HS4 are regulatory elements.