Chapter 2

Structure and Behavior of Genes and Chromosomes

What will we study in this chapter
DNA structure: a brief review The central dogma: DNA RNA protein

Structure of human chromosomes Mitosis and Meiosis

In the next day or so Crick and I shall send a note to Nature proposing our structure(of DNA) as a possible model, at the same time emphasizing its provisional nature and the lack of proof in its favor. Even if wrong, I believe it to be interesting since it provides a concrete example of a structure composed of complementary chains. If, by chance, it is right, then I suspect we may be making a slight dent into the manner in which DNA can reproduce itself.
—James Watson, from a letter to Max Delberück, March 12, 1953

James Watson, a Chicago native, was a child prodigy who entered college at the age of 15. By 23 he was a postdoctoral fellow at the University of Copenhagen studying genetics. It was there that he heard Maurice Wilkins of King’s College give a talk about investigations into the molecular structure of DNA. Watson was hooked, and he moved to Cambridge University’s Cavendish Laboratory to study DNA’s structure.

Watson worked well there with Francis Crick, and the two began using three-dimensional molecular models to test structural ideas. In 1953 Wilkins showed Watson and Crick X-ray crystallographic images of DNA taken by Wilkins’s coworker Rosalind Franklin. On the basis of these images—which had been shown without Franklin’s knowledge—Watson and Crick worked out a double-helical structural model within a few weeks. For this Watson, Crick, and Wilkins were awarded the Nobel Prize in physiology or medicine in 1962.

In addition to his work on DNA, Watson also discovered the molecular structure of the tobacco mosaic virus and helped uncover the role of messenger RNA in protein synthesis. Watson described the discovery of DNA in his highly popular book The Double Helix, though the book has been criticized for downplaying Franklin’s role in the discovery.

DNA structure: a brief review

DNA is a polymeric nucleic acid macromolecule composed of three types of units: a five-carbon sugar, deoxyribose; a nitrogen-containing base; and a phosphate group. base phosphate five-carbon sugar

In DNA, there are two purine bases, adenine (A) and guanine (G), and two pyrimidine bases, thymine (T) and cytosine (C). Nucleotides, each composed of a base, a phosphate, and a sugar moiety, polymerize into long polynucleotide chains by 5’ – 3’ phosphodiester bonds formed between adjacent deoxyribose units.

In the human genome, these polynucleotide chains (in their double-helix form) are hundreds of millions of nucleotides long, ranging in size from approximately 50 million base pairs (for the smallest chromosome, chromosome 21) to 250 million base pairs (for the largest chromosome, chromosome 1).

The anatomical structure of DNA carries the chemical information that allows the exact transmission of genetic information from one cell to its daughter cells and from one generation to the next. At the same time, the primary structure of DNA specifies the amino acid sequences of the polypeptide chains of proteins.

DNA replication

DNA replication is semiconservative and synthesis of DNA strands is semidiscontinuous
During the process of DNA synthesis (DNA replication), the two DNA strands of each chromosome are unwound by a helicase enzyme and each DNA strand directs the synthesis of a complementary DNA strand to generate two daughter DNA duplexes, each of which is identical to the parent molecule.

DNA replication is initiated at specific points, which have been termed origins of replication. Starting from such an origin, the initiation of DNA replication results in a replication fork, where the parental DNA duplex bifurcates into two daughter DNA duplexes.

Replication fork: the point of bifurcation when a DNA double helix is being replicated. Two replication forks proceeding outwards from a single starting point create a replication bubble.

The two daughter strands must run in opposite directions ,the direction of chain growth must be 5 '→ 3 ' for one daughter strand, the leading strand, but 3 ' → 5 ' for the other daughter strand, the lagging strand .

The central dogma: DNA RNA protein

Genetic information is contained in DNA in the chromosomes within the cell nucleus, but protein synthesis, during which the information encoded in the DNA is used, takes place in the cytoplasm.

The molecular link between these two related types of information (the DNA code of genes and the amino acid code of proteins) is ribonucleic acid (RNA). The chemical structure of RNA is similar to that of DNA, except that each nucleotide in RNA has a ribose sugar component instead of a deoxyribose; in addition, uracil (U) replaces thymine as one of the pyrimidines of RNA. An additional difference between RNA and DNA is that RNA in most organisms exists as a singlestranded molecule, whereas DNA exists as a double helix.

The informational relationships among DNA, RNA, and protein are intertwined: DNA directs the synthesis and sequence of RNA, RNA directs the synthesis and sequence of polypeptides, and specific proteins are involved in the synthesis and metabolism of DNA and RNA. This flow of information is referred to as the “central dogma” of molecular biology.

Genetic information is stored in DNA by means of a code (the genetic code) in which the sequence of adjacent bases ultimately determines the sequence of amino acids in the encoded polypeptide. First, RNA is synthesized from the DNA template through a process known as transcription. The RNA, carrying the coded information in a form called messenger RNA (mRNA), is then transported from the nucleus to the cytoplasm, where the RNA sequence is decoded, or translated, to determine the sequence of amino acids in the protein being synthesized.

The process of translation occurs on ribosomes, which are cytoplasmic organelles with binding sites for all of the interacting molecules, including the mRNA, involved in protein synthesis. Ribosomes are themselves made up of many different structural proteins in association with a specialized type of RNA known as ribosomal RNA (rRNA). Translation involves yet a third type of RNA, transfer RNA (tRNA), which provides the molecular link between the coded base sequence of the mRNA and the amino acid sequence of the protein.

An organism may contain many types of somatic cells, each with distinct shape and function. However, they all have the same genome. The genes in a genome do not have any effect on cellular functions until they are "expressed". Different types of cells express different sets of genes, thereby exhibiting various shapes and functions.

"Gene expression" means the production of a protein or a functional RNA from its gene. Several steps are required: Transcription: A DNA strand is used as the template to synthesize a RNA strand, which is called the primary transcript. RNA processing: This step involves modifications of the primary transcript to generate a mature mRNA (for protein genes) or a functional tRNA or rRNA.

For RNA genes (tRNA and rRNA), the expression is complete after a functional tRNA or rRNA is generated. However, protein genes require additional steps: Nuclear transport: mRNA has to be transported from the nucleus to the cytoplasm for protein synthesis. Protein synthesis: In the cytoplasm, mRNA binds to ribosomes, which can synthesize a polypeptide based on the sequence of mRNA.

Essential steps involved in the expression of protein genes.

Gene Structure and Organization
The human genome is the term used to describe the total genetic information (DNA content) in human cells. It really comprises two genomes: a complex nuclear genome which accounts for 99.9995% of the total genetic information, and a simple mitochondrial genome which accounts for the remaining 0.0005% .

The nucleus of a human cell contains more than 99% of the cellular DNA. The nuclear genome is distributed between 24 different types of linear double-stranded DNA molecule, each of which has histones and other nonhistone proteins bound to it, constituting a chromosome The 24 different chromosomes (22 types of autosome and two sex chromosomes, X and Y) can easily be differentiated by chromosome banding techniques

In its simplest form, a gene can be visualized as a segment of a DNA molecule containing the code for the amino acid sequence of a polypeptide chain and the regulatory sequences necessary for expression. This description, however, is inadequate for genes in the human genome (and indeed in most eukaryotic genomes), because few genes exist as continuous coding sequences.

By definition, a gene includes the entire nucleic acid sequence necessary for the expression of its product (peptide or RNA). Such sequence may be divided into regulatory region and transcriptional region. The regulatory region could be near or far from the transcriptional region. The transcriptional region consists of exons and introns. Exons encode a peptide or functional RNA. Introns will be removed after transcription.

Introns are sections of DNA within a gene that do not encode part of the protein that the gene produces, and are spliced out of the mRNA that is transcribed from the gene before it is exported from the cell nucleus. Introns exist mainly (but not only) in eukaryotic cells. The regions of a gene that remain in the spliced mRNA are called exons.

Rather, the vast majority of genes are interrupted by one or more noncoding regions. These intervening sequences, called introns, are initially transcribed into RNA in the nucleus but are not present in the mature mRNA in the cytoplasm. Thus, information from the intronic sequences is not normally represented in the final protein product. Introns alternate with coding sequences, or exons, that ultimately encode the amino acid sequence of the protein.

Rather, the vast majority of genes are interrupted by one or more noncoding regions. These intervening sequences, called introns, are initially transcribed into RNA in the nucleus but are not present in the mature mRNA in the cytoplasm. Thus, information from the intronic sequences is not normally represented in the final protein product. Introns alternate with coding sequences, or exons, that ultimately encode the amino acid sequence of the protein.

Structural features of a typical human gene A gene includes not only the actual coding sequences but also adjacent nucleotide sequences required for the proper expression of the gene — that is, for the production of a normal mRNA molecule, in the correct amount, in the correct place, and at the correct time during development or during the cell cycle.

The adjacent nucleotide sequences provide the molecular “start” and “stop” signals for the synthesis of mRNA transcribed from the gene. At the 5’ end of the gene lies a promoter region, which includes sequences responsible for the proper initiation of transcription. Within the 5’ region are several DNA elements whose sequence is conserved among many different genes. This conservation, together with functional studies of gene expression in many laboratories, indicates that these particular sequence elements play an important role in regulation.

Both promoters and other regulatory elements located either 5’ or 3’ of a gene or in its introns) can be sites of mutation in genetic disease that can interfere with the normal expression of a gene. These regulatory elements, including enhancers, silencers, and locus control regions (LCRs).

At the 3’ end of the gene lies an untranslated region of importance that contains a signal for addition of a sequence of adenosine residues (the so-called polyA tail) to the end of the mature mRNA.

Organization of the human genome

Regions of the genome with similar characteristics or organization, replication, and expression are not arranged randomly but, rather, tend to be clustered together. This functional organization of the genome correlates remarkably well with its structural organization as revealed by metaphase chromosome banding.

The overall significance of this functional organization is that chromosomes are not just a random collection of different types of genes and other DNA sequences. Some chromosome regions, or even whole chromosomes, are quite high in gene content (“gene-rich”), whereas others are low (“gene-poor”). Certain types of sequence are characteristic of the different physical hallmarks of human chromosomes.

The clinical consequences of abnormalities of genome structure reflect the specific nature of the genes and sequences involved. Thus, abnormalities of gene-rich chromosomes or chromosomal regions tend to be much more severe clinically than similar sized defects involving gene-poor parts of the genome.

Gene families
Many genes belong to families of closely related DNA sequences, recognized as families because of similarity of the nucleotide sequence of the genes themselves or of the amino acid sequence of the encoded polypeptides. "Gene family" refers to a set of genes with homologous sequences. For example, H2A, H2B, H3 and H4 are in the same histone gene family. Their products have similar structures and functions.

DNA sequences that closely resemble known genes but are nonfunctional are called pseudogenes. Pseudogenes are widespread in the genome and are thought to be byproducts of evolution, representing genes that were once functional but are now vestigial, having been inactivated by mutations in coding or regulatory sequences.

The human genome consists of three broad sequence components:
•Single copy, or at least very low copy number, This class accounts for 50-60% of mammalian DNA. - reassociates very slowly. A single strand from a single copy sequence will require some considerable time to find a complementary partner strand, given that the vast majority of DNA fragments are unrelated to it. •Moderately repetitive Roughly 25-40% of mammalian DNA reassociates at an intermediate rate. This class includes interspersed repeats •Highly repetitive About 10-15% of mammalian DNA reassociates very rapidly. This class includes tandem repeats.

Several different categories of repetitive DNA are recognized. A useful distinguishing feature is whether the repeated sequences (“repeats”) are clustered in one or a few locations or whether they are dispersed throughout the genome, interspersed with single-copy sequences along the chromosome.

Depending on the average size of the arrays of repeat units, highly repetitive noncoding DNA belonging to this class can be grouped into three subclasses:satellite, minisatellite and microsatellite DNA. Satellite DNA is composed of very long arrays of tandem repeats which can be separated from bulk DNA by buoyant density gradient centrifugation

The size of a satellite DNA ranges from 100 kb to over 1 Mb. In humans, a well known example is the alphoid DNA located at the centromere of all chromosomes. Its repeat unit is 171 bp and the repetitive region accounts for 3-5% of the DNA in each chromosome. Other satellites have a shorter repeat unit. Most satellites in humans or in other organisms are located at the centromere.

Minisatellite DNA is composed of moderately sized arrays of tandem repeats and is often located at or close to telomeres

Microsatellite DNA is defined by the presence of short arrays of tandem simple repeat units and is dispersed throughout the human genome

The size of a minisatellite ranges from 1 kb to 20 kb. One type of minisatellites is called variable number of tandem repeats (VNTR). Its repeat unit ranges from 9 bp to 80 bp. They are located in non-coding regions. The number of repeats for a given minisatellite may differ between individuals. This feature is the basis of DNA fingerprinting. Another type of minisatellites is the telomere. In a human germ cell, the size of a telomere is about 15 kb. In an aging somatic cell, the telomere is shorter. The telomere contains tandemly repeated sequence GGGTTA.

Microsatellites are also known as short tandem repeats (STR), because a repeat unit consists of only 1 to 6 bp and the whole repetitive region spans less than 150 bp. Similar to minisatellites, the number of repeats for a given microsatellite may differ between individuals. Therefore, microsatellites can also be used for DNA fingerprinting.

In addition to satellite DNAs, another major class of repetitive DNA in the genome consists of related sequences that are dispersed throughout the genome rather than localized. Although many small DNA families meet this general description, two in particular warrant discussion because together they make up a significant proportion of the genome and because they have been implicated in genetic diseases.

The best-studied dispersed repetitive elements belong to the so-called Alu family. The members of this family are about 300 base pairs in length and are recognizably related to each other although not identical in sequence. In total, there are about 500,000 Alu family members in the genome, making up at least several percent of human DNA. In some regions of the genome, they make up a much higher percentage of the DNA.

A second major dispersed, repetitive DNA family is called the L1 family. L1 elements are long, repetitive sequences (up to 6 kb in length) that are found in about 100,000 copies per genome. They are plentiful in some regions of the genome but relatively sparse in others.

Structure of human chromosomes

human chromosome at metaphase

The composition of genes in the human genome, as well as the determinants of their expression, is specified in the DNA of the 46 human chromosomes. As we saw in an earlier section, each human chromosome is believed to consist of a single, continuous DNA double helix; that is, each chromosome in the nucleus is a long, linear double-stranded DNA molecule.

Chromatin in the nucleus

Chromosomes are not naked DNA double helices, however. The DNA molecule of a chromosome exists as a complex with a family of basic chromosomal proteins called histones and with a heterogeneous group of acidic, nonhistone proteins that are much less well characterized, but that appear to be critical for establishing a proper environment to ensure normal chromosome behavior and appropriate gene expression. Together, this complex of DNA and protein is called chromatin.

There are five major types of histones that play a critical role in the proper packaging of the chromatin fiber. Two copies each of the four core histones H2A, H2B, H3, and H4 constitute an octamer, around which a segment of DNA double helix winds, like thread around a spool.

Approximately 140 base pairs of DNA are associated with each histone core, making just under two turns around the octamer. After a short (20 to 60 base-pair) “spacer” segment of DNA, the next core DNA complex forms, and so on, giving chromatin the appearance of beads on a string. Each complex of DNA with core histones is called a nucleosome, which is the basic structural unit of chromatin.


The fifth histone, H1, appears to bind to DNA at the edge of each nucleosome, in the internucleosomal spacer region. The amount of DNA associated with a core nucleosome, together with the spacer region, is about 200 base pairs.

The long strings of nucleosomes are themselves further compacted into a secondary helical chromatin structure that appears under the electron microscope as a thick, 30-nm-diameter fiber (about three times thicker than the nucleosomal fiber). This cylindrical “solenoid” fiber (from the Greek solenoeides, “pipeshaped”) appears to be the fundamental unit of chromatin organization. The solenoids are themselves packed into loops or domains attached at intervals of about 100 kb or so to a nonhistone protein scaffold or matrix.

During the cell cycle, chromosomes pass through orderly stages of condensation and decondensation. In the interphase nucleus, chromosomes and chromatin are quite decondensed in relation to the highly condensed state of chromatin in metaphase. Nonetheless, even in interphase chromosomes, DNA in chromatin is substantially more condensed than it would be as a native, protein-free double helix.

Different levels of DNA condensation. (1) Single DNA strand. (2) Chromatin strand (DNA with histones). (3) Condensed chromatin during interphase with centromere. (4) Condensed chromatin during prophase. (Two copies of the DNA molecule are now present) (5) Chromosome during metaphase.

Mitosis and Meiosis

Two types of cell division: Mitosis: occurs in somatic cells; objective is to make identical body cells, each with a full set of identical chromosomes (2n) Diploid: The number of chromosomes normally present in a somatic cell. A diploid cell contains two copies of each chromsome(2n).

Meiosis: production of sex cells (gametes); objective is to divide chromosomes in half (n)= "reduction division" Haploid: The normal number of chromosomes present in an egg or sperm. A haploid cell contains a single set of chromsomes(n). In humans, the haploid number of chromosomes is 23.

Mitosis Stages Interphase: the period between cell divisions; cell spends most of its time in this stage, generally carrying out its normal activities. Nearing the end of this phase, the DNA and cell contents will replicate in preparation for division.

G1 Phase or the "Gap 1" Phase The chromosomes decondense as they enter the G1 phase; this is a physiologically active time for the cell. The cell synthesizes the necessary enzymes and proteins needed for cell growth. DNA consists of a single unreplicated helix (with histone and non-histone proteins). In the G1 , the cell may be growing, active, and performing many intense biochemical activities. S Phase or the "Synthesis" Phase DNA and chromosomal proteins are replicated. This phase lasts a few hours. Each chromosomes is composed of two identical strands of DNA called sister chromatids.

G2 Phase or the "Gap 2" Phase Between synthesis and mitosis. The mitotic spindle proteins are synthesized. The mitotic spindle is a structure that is involved with the movement of chromosomes during mitosis.

•Prophase: The chromatin, diffuse in interphase, condenses into chromosomes. Each chromosome has duplicated and now consists of two sister chromatids. At the end of prophase, the nuclear envelope breaks down into vesicles. • Metaphase: The chromosomes align at the equitorial plate and are held in place by microtubules attached to the mitotic spindle and to part of the centromere. •Anaphase: The centromeres divide. Sister chromatids separate and move toward the corresponding poles. •Telophase: Daughter chromosomes arrive at the poles and the microtubules disappear. The condensed chromatin expands and the nuclear envelope reappears. •Cytokinesis: The cytoplasm divides, the cell membrane pinches inward ultimately producing two daughter cells .

The Importance of Mitosis Mitosis ensures that each new body cell has the same genetic makeup as its parent. Mutations can and do occur occasionally but, for the most part, all of your body cells have identical DNA. Mitosis not only functions to replace cells and make new cells (growth) it also reduces cell size.

MEIOSIS- the process of nuclear division that reduces the number of chromosomes by half 2n---->n The two nuclear divisions in meiosis result in four daughter cells forming from an original parent cell, each with a 1n of chromosomes sexual reproduction n + n gametes fuse to form a complete 2n zygote

Homologous chromosomes: A pair of
chromosomes, one from each parent, carrying genes for the same traits, in the same order.

Synapse: homologous chromosomes begin to pair closely along their entire length.

In meiosis, the process is quite similar to mitosis. However, another cell division takes place in which there is no extra DNA replication step. Instead of having a pair of genes (as in a diploid cell), there is only one copy of each gene (a haploid cell). This one copy of genetic information produces gametes of either sperm or eggs. Thus, only one copy of a gene is passed on to each gamete. It is not until the sperm and egg join that there will be two halves of genetic information. This process is the basis for all of Mendel's laws.

An extremely important feature of meiosis I is that during synapsis, when homoglogous chromsomes are paired together, crossovers occur.

Male meiosis produces four spermatozoa from a single germ cell precursor; female meiosis produces just one oocyte from a single germ cell precursor, discarding two polar bodies in the process.

The central dogma explains how information in DNA is converted into the sequence of amino acids in a protein How do changes in DNA correlate with changes in the sequence of amino acids in a protein?

Mutations are permanent, sometimes transmissible (if the change is to a germ cell) changes to the genetic material (usually DNA or RNA) of a cell. Mutations can be caused by copying errors in the genetic material during cell division and by exposure to radiation, chemicals, or viruses.

In multicellular organisms, mutations can be subdivided into germline mutations, which can be passed on to progeny and somatic mutations, which (when accidental) often lead to the malfunction or death of a cell and can cause cancer.

The simplest change is a substitution of one nucleotide for another, called a “point mutation” .

a. silent mutation:
a change in a base pair does not result in a change in the sequence of amino acids in a protein

b. missense mutation:
a mutation results a change in an amino acid where the new amino acids has a different property than the old amino acid. The protein with the new primary structure may have reduced or no activity.

c. nonsense mutations:
a mutation results in a new stop translation condon formed before the naturally occuring one. Translation is stopped prematurely and a shortened protein is made.

d. frameshift mutation:
a deletion or insertion of one base results in a change in the translational reading frame

a reading frame is a contiguous and nonoverlapping set of three-nucleotide codons in DNA or RNA. There are 3 possible reading frames in a strand. A reading frame that contains no stop codon is called an open reading frame (ORF).

Exon skipping

Splicing of an intron requires an essential signal: "GT........AG". If the splice acceptor site AG is mutated (e.g., A to C in this figure), the splicing machinery will look for the next acceptor site. As a result, the exon between two introns is also removed.

The end