Académique Documents
Professionnel Documents
Culture Documents
The sequencing protocol is essentially a technical procedure with several variants of the basic procedure, but the most widely used techniques are based on the enzymatic method. Whatever the method, the desired result is to generate a set of overlapping fragments that terminate at different bases and differ in length by one nucleotide. This is known as a set of nested fragments. Assuming that the technique has generated a set of nested fragments, the detection step is the final stage of the sequencing procedure. This usually involves separation of the fragments on a polyacrylamide gel. Slab gels, in which fragments are radioactively labeled, generate an autoradiograph. Automated sequencing procedures tend to use fluorescent labels and a continuous electrophoresis to separate the fragments, which are identified as they pass a detector. There are two main methods for sequencing DNA. In one method, developed by Allan Maxam and Walter Gilbert, chemicals are used to cleave the DNA at certain positions, generating a set of fragments that differ by one nucleotide. The same result is achieved in a different way in the second method, developed by Fred Sanger and Alan Coulson, which involves enzymatic synthesis of DNA strands that terminate in a modified nucleotide. Analysis of fragments is similar for both methods and involves gel electrophoresis and autoradiography (assuming that a radioactive label has been used). The enzymatic method (and variants of the basic technique) has now almost completely replaced the chemical method as the technique of choice, although there are some situations where chemical sequencing can provide useful data to confirm information generated by the enzymatic method. Fluorographic detection methods are also used in place of radioactive isotopes. This is particularly important in DNA sequencing, as it speeds up the process and enables the technique to be automated. Nucleotide Sequencing Traditionally about 500 nucleotides can be sequenced at a time. Therefore it calls for cloning the entire genome in different vectors (plasmid upto 10 Kb, Cosmid upto 40 Kb, BAC upto 300 KB and YAC above 300 Kb). The uniformity of the DNA molecule and the seemingly monotonous repetition of the nucleotide bases may seem like impenetrable barriers to determining the precise sequence order of the bases within nucleic acid. The methods used were, however, impractical for DNA sequencing on a large scale. In 1975, Fred Sanger and Alan Coulson devised a method of direct DNA sequencing referred to as the plusminus method (Sanger and Coulson, 1975). This method utilized a DNA polymerase, primed by synthetic radio-labeled oligonucleotides, to generate fragments of DNA that could be analyzed following electrophoresis and autoradiography. This technique was used to determine the entire 5386 bp sequence of the bacteriophage X174 genome (Sanger et al., 1977).
The method requires radioactive labeling at one 5' end of the DNA (typically by a kinase reaction using gamma-32P ATP) and purification of the DNA fragment to be sequenced. Chemical treatment generates breaks at a small proportion of one or two of the four nucleotide bases in each of four reactions (G, A+G, C, C+T). For example, the purines (A+G) are depurinated using formic acid, the guanines (and to some extent the adenines) are methylated by dimethyl sulfate, and the pyrimidines (C+T) are methylated using hydrazine. The addition of salt (sodium chloride) to the hydrazine reaction inhibits the methylation of thymine for the C-only reaction. The modified DNAs are then cleaved by hot piperidine at the position of the modified base. The concentration of the modifying chemicals is controlled to introduce on average one modification per DNA molecule. Thus a series of labeled fragments is generated, from the radiolabeled end to the first "cut" site in each molecule. The fragments in the four reactions are electrophoresed side by side in denaturing acrylamide gels for size separation. To visualize the fragments, the gel is exposed to X-ray film for autoradiography, yielding a series of dark bands each corresponding to a radiolabeled DNA fragment, from which the sequence may be inferred.
2. SangerCoulson sequencing
(dideoxy
or
enzymatic)
Although the end result is similar to that attained by the chemical method, the Sanger--Coulson procedure is totally different from that of Maxam and Gilbert. In this case a copy of the DNA to be sequenced is made by the Klenow fragment of DNA polymerase. The template for this reaction is singlestranded (SS) DNA, and a primer must be used to provide the 3 terminus for DNA polymerase to begin synthesizing the copy (Fig. 3.9). The production of nested fragments is achieved by the incorporation of a modified dNTP in each reaction. These dNTPs lack a hydroxyl group at the 3 position of deoxyribose, which is necessary for chain elongation to proceed. Such modified dNTPs are known as dideoxynucleoside triphosphates (ddNTPs). The four ddNTPs (A, G, T, and C forms) are included in a series of four reactions, each of which contains the four normal dNTPs. The concentration of the dideoxy form is such that it will be incorporated into the growing DNA chain infrequently. Each reaction, therefore, produces a series of fragments terminating at a specific nucleotide, and the four reactions together provide a set of nested fragments. The DNA chain is labelled by including a radioactive dNTP in the reaction mixture. This is usually [- S]dATP, which enables more sequence to be read from a single gel than the 32P-labelled dNTPs that were used previously. The generation of fragments for dideoxy sequencing is more complicated than for chemical sequencing and usually involves sub-cloning into different vectors. Many plasmid vectors are now available and some types can be used directly for DNA sequencing experiments. Another method is to clone the DNA into a vector such as the bacteriophage M13, which produces singlestranded DNA during infection. This provides a suitable substrate for the sequencing reactions.
35
Figure 9.6 depicts the dideoxy form of the nucleotide that incorporates and the chain is terminated. The gel used to separate the newly synthesized DNA fragments usually contains high concentrations of urea (7 M) and is run at a high power level to heat the gel to about 70C. Both of these have denaturing effects on DNA fragments and help reduce secondary structure that could occur in the single-stranded molecules that may make them run anomalously through the gel.
A straightforward way to increase the throughput of DNA sequencing would be to combine the four individual sequencing reactions (each containing a different ddNTP) into a single reaction that could be analyzed on a single lane of a gel. This is not possible using radioactivity since each band is distinguishable only by the position in which it runs on the gel. Therefore, combining all four lanes would merely result in a series of bands differing in size by a single base (Figure 9.8). However, if the terminal base of each DNA fragment can be identified specifically then, since each band on the gel is a different size, the DNA sequence can be unambiguously assigned from a single gel lane. A set of dideoxynucleotides has been developed that are labeled with fluorescent dyes precisely for this purpose. The dideoxynucleotide can still be incorporated into DNA opposite its complementary base, which again results in the termination of DNA synthesis. The dye structures attached to the dideoxynucleotide contain a fluorescein donor dye linked to a dichlororhodamine (dRhodamine) acceptor dye via an aminobenzoic acid linker and are called Big Dye terminators. An argon ion laser is able to excite the fluorescein donor dye that efficiently transfers the energy to one of the four acceptor dyes, each of which has a distinctive emission spectrum (Figure 9.9). Each dideoxynucleotide is labeled with a different acceptor dye so that DNA fragments ending in a different ddNTP will fluoresce at a different wavelength. Sequencing reactions can therefore be performed in a single tube (or single well of a microtitre dish)
and the products separated either on a single lane of a gel, or using a capillary tube containing a gel matrix. The intensity and wavelength of the fluorescent emission is measured as the DNA fragments move past a laser and fluorescence detector located at the bottom of the gel. This information is fed directly into a computer so that the resulting sequence can be automatically assigned and stored.
Sophisticated base calling software is available to convert the fluorescent patterns obtained into a sequence of DNA bases (Figure 9.10). Sequencing in this way has massive speed advantages over manual sequencing methods. As many as 1000 bases can be read automatically from a single reaction, although the sequence obtained from within 500 bp of the primer is generally more reliable than that further away. Additionally, the detection methods used during automated sequencing are far more reliable than sequence interpretation from an autoradiograph.
Figure 6.1. Polyacrylamide gel electrophoresis can resolve single-stranded DNA molecules that differ in length by just one nucleotide. The banding pattern is produced
after separation of single-stranded DNA molecules by denaturing polyacrylamide gel electrophoresis. The molecules are labeled with a radioactive marker and the bands visualized by autoradiography. The bands gradually get closer together towards the top of the ladder. In practice, molecules up to about 1500 nucleotides in length can be separated if the electrophoresis is continued for long enough.
Figure 6.2. Chain termination DNA sequencing. (A) Chain termination sequencing involves the synthesis of new strands of DNA that are complementary to a single-stranded template. (B) Strand synthesis does not proceed indefinitely because the reaction mixture contains small amounts of a dideoxynucleotide, which blocks further elongation because it has a hydrogen atom rather than a hydroxyl group attached to its 3 -carbon. (C) Strand synthesis in the presence of ddATP results in chains that are terminated opposite Ts in the template. This 'A' family of terminated chains is loaded into one lane of a polyacrylamide gel, alongside the families of terminated chains from the T, G and C reactions. (D) In the methodology shown here, the banding pattern is visualized by autoradiography, the terminated chains having become radioactively labeled by inclusion of a labeled dNTP in the strand synthesis reactions. The sequence, shown on the right, is read by noting which lane each band lies in, starting at the bottom of the autoradiograph and moving band by band towards the top.
Negligible or zero 5 --> 3 exonuclease activity. Most DNA polymerases also have exonuclease
activities, meaning that they can degrade DNA polynucleotides as well as synthesize them. A5 --> 3 exonuclease activity enables the polymerase to remove a DNA strand that is already attached to the template. This is a disadvantage in DNA sequencing because removal of nucleotides from the 5 ends of the newly synthesized strands alters the lengths of these strands, making it impossible to read the sequence from the banding pattern in the polyacrylamide gel. Negligible or zero 3 --> 5 exonuclease activity is also desirable so that the polymerase does not remove the chain termination nucleotide once it has been incorporated. These are stringent requirements and are not entirely met by any naturally occurring DNA polymerase. Instead, artificially modified enzymes are generally used. The first of these to be developed was the Klenow polymerase, which is a version of Escherichia coli DNA polymerase I from which the 5 --> 3 exonuclease activity of the standard enzyme has been removed, either by cleaving away the relevant part of the protein or by genetic engineering. The Klenow polymerase has relatively low processivity, limiting the length of sequence that can be obtained from a single experiment to about 250 bp, and giving non-specific bands on the sequencing gel, these 'shadow' bands representing strands that have terminated naturally rather than by incorporation of a ddNTP. The Klenow enzyme was therefore superseded by a modified version of the DNA polymerase encoded by bacteriophage T7, this enzyme going under the trade name 'Sequenase'. Sequenase has high processivity and no exonuclease activity, and also possesses other desirable features such as rapid reaction rate and the ability to use many modified nucleotides as substrates.
automated techniques that we can hope to generate sequence data rapidly enough to complete a genome project in a reasonable length of time.
Figure 6.7. Automated DNA sequencing with fluorescently labeled dideoxynucleotides. (A) The chain termination reactions are carried out in a single tube, with each dideoxynucleotide labeled with a different fluorophore. In the automated sequencer, the bands in the electrophoresis gel move past a fluorescence detector, which identifies which dideoxynucleotide is present in each band. The information is passed to the imaging system. (B) The printout from an automated sequencer. The sequence is represented by a series of peaks, one for each nucleotide position. In this example, a green peak is an 'A', blue is 'C', black is 'G', and red is 'T'. RECONSTRUCTION OF SEQUENCES (FRAGMENT ASSEMBLY):
DEFINITION OF COVERAGE:
The number of genes in the human genome was unknown, with estimates ranging from 50,000 to 90,000 (refs 1 2, and to more than 140,000 according to unpublished sources. A procedure named Exofish, based on homology searches, to identify human genes quickly and reliably was followed. This method relies on the sequence of another vertebrate, the pufferfish Tetraodon nigroviridis, to detect conserved sequences with a very low background. Similar to Fugu rubripes , a marine pufferfish proposed by Brenner et al.3 as a model for genomic studies, T. nigroviridis is a more practical alternative4 with a genome also eight times more compact than that of human. Many comparisons have been made between F. rubripes and human DNA that demonstrate the potential of comparative genomics using the pufferfish genome5. Application of Exofish to the December version of the working draft sequence of the human genome and to Unigene showed that the human genome contains 28,00034,000 genes, and that Unigene contains less than 40% of the protein-coding fraction of the human genome.
A map of 1.42 million single nucleotide polymorphisms (SNPs) distributed throughout the human genome, providing an average density on available sequence of one SNP every 1.9 kilobases. We estimate that 60,000 SNPs fall within exon (coding and untranslated regions), and 85% of exons are within 5 kb of the nearest SNP. Nucleotide diversity varies greatly across the genome, in a manner broadly consistent with a standard population genetic model of human history. This high-density SNP map provides a public resource for defining haplotype variation across the genome, and should help to identify biomedically important genes for diagnosis and therapy.
DNA sequencing using reversible terminators (Nature 456, 53-59 (6 November 2008)
2008) | doi:10.1038/nature07517; Received 24 June 2008; Accepted 2 October
Fig. a, DNA fragments are generated, for example, by random shearing and joined to a pair of oligonucleotides in a
forked adaptor configuration. The ligated products are amplified using two oligonucleotide primers, resulting in doublestranded blunt-ended material with a different adaptor sequence on either end. b, Formation of clonal single-molecule array. DNA fragments prepared as in a are denatured and single strands are annealed to complementary oligonucleotides on the flow-cell surface (hatched). A new strand (dotted) is copied from the original strand in an extension reaction that is primed from the 3 end of the surface-bound oligonucleotide; the original strand is then removed by denaturation. The adaptor sequence at the 3 end of each copied strand is annealed to a new surfacebound complementary oligonucleotide, forming a bridge and generating a new site for synthesis of a second strand (dotted). Multiple cycles of annealing, extension and denaturation in isothermal conditions result in growth of clusters, each ~1m in physical diameter. This follows the basic method outlined in ref. 33. c, The DNA in each cluster is linearized by cleavage within one adaptor sequence (gap marked by an asterisk) and denatured, generating singlestranded template for sequencing by synthesis to obtain a sequence read (read 1; the sequencing product is dotted). To perform paired-read sequencing, the products of read 1 are removed by denaturation, the template is used to generate a bridge, the second strand is re-synthesized (shown dotted), and the opposite strand is then cleaved (gap marked by an asterisk) to provide the template for the second read (read 2). d, Long-range paired-end sample preparation. To sequence the ends of a long (for example,>1kb) DNA fragment, the ends of each fragment are tagged by incorporation of biotinylated (B) nucleotide and then circularized, forming a junction between the two ends. Circularized DNA is randomly fragmented and the biotinylated junction fragments are recovered and used as starting material in the standard sample preparation procedure illustrated in a. The orientation of the sequence reads relative to the DNA fragment is shown (magenta arrows). When aligned to the reference sequence, these reads are oriented with their 5 ends towards each other (in contrast to the short insert paired reads produced as shown in ac). See Supplementary Fig. 17a for examples of both. Turquoise and blue lines represent oligonucleotides and red lines represent genomic DNA. All surface-bound oligonucleotides are attached to the flow cell by their 5 ends. Dotted lines indicate newly synthesized strands during cluster formation or sequencing. (See Supplementary Methods for details.)
High-throughput sequencing
The high demand for low-cost sequencing has driven the development of high-throughput sequencing technologies that parallelize the sequencing process, producing thousands or millions of sequences at once.[20][21] High-throughput sequencing technologies are intended to lower the cost of DNA sequencing beyond what is possible with standard dye-terminator methods.[22]
The first of the "next-generation" sequencing technologies, MPSS was developed in 1990s at Lynx Therapeutics, a company founded in 1992 by Sidney Brenner and Sam Eletr. MPSS was a bead-based method that used a complex approach of adapter ligation followed by adapter decoding, reading the sequence in increments of four nucleotides; this method made it susceptible to sequence-specific bias or loss of specific sequences. Because the technology was so complex, MPSS was only performed 'in-house' by Lynx Therapeutics and no machines were sold; when the merger with Solexa later lead to the development of sequencing-by-synthesis, a more simple approach with numerous advantages, MPSS became obsolete. However, the essential properties of the MPSS output were typical of later "next-gen" data types, including hundreds of thousands of short DNA sequences. In the case of MPSS, these were typically used for sequencing cDNA for measurements of gene expression levels. Lynx Therapeutics merged with Solexa in 2004, and this company was later purchased by Illumina. [23]
Polony Sequencing
Main article: Polony sequencing
Polony sequencing, developed in George Church's lab at Harvard, was among the first nextgeneration sequencing systems used to sequence a full genome in 2005. It combined an in vitro paired-tag library with emulsion PCR, an automated microscope, and ligation-based sequencing chemistry to sequence an E. coli genome at an accuracy of > 99.9999% and a cost approximately 1/10th that of Sanger sequencing. The technology was licensed to Agencourt Biosciences, subsequently spun out into Agencourt Personal Genomics, and ultimately incorporated into the Applied Biosystems SOLiD platform.
Pyrosequencing
Main article: 454 Life Sciences#Technology
A parallelized version of pyrosequencing was developed by 454 Life Sciences. The method amplifies DNA inside water droplets in an oil solution (emulsion PCR), with each droplet containing a single DNA template attached to a single primer-coated bead that then forms a clonal colony. The sequencing machine contains many picolitre-volume wells each containing a single bead and sequencing enzymes. Pyrosequencing uses luciferase to generate light for detection of the individual nucleotides added to the nascent DNA, and the combined data are used to generate sequence read-outs.[16] This technology provides intermediate read length and price per base compared to Sanger sequencing on one end and Solexa and SOLiD on the other. [24] 454 Life Sciences has since been acquired by Roche Diagnostics.
SOLiD sequencing
Main article: ABI Solid Sequencing
Applied Biosystems' SOLiD technology employs sequencing by ligation. Here, a pool of all possible oligonucleotides of a fixed length are labeled according to the sequenced position. Oligonucleotides are annealed and ligated; the preferential ligation by DNA ligase for matching sequences results in a signal informative of the nucleotide at that position. Before sequencing, the DNA is amplified by emulsion PCR. The resulting bead, each containing only copies of the same DNA molecule, are deposited on a glass slide.[26] The result is sequences of quantities and lengths comparable to Illumina sequencing.[24]
Future methods
Sequencing by hybridization is a non-enzymatic method that uses a DNA microarray. A single pool of DNA whose sequence is to be determined is fluorescently labeled and hybridized to an array containing known sequences. Strong hybridization signals from a given spot on the array identifies its sequence in the DNA being sequenced.[27] Mass spectrometry may be used to determine mass differences between DNA fragments produced in chain-termination reactions.
[28]
DNA sequencing methods currently under development include labeling the DNA polymerase, [29] reading the sequence as a DNA strand transits through nanopores,[30][31] and microscopybased techniques, such as AFM or electron microscopy that are used to identify the positions of individual nucleotides within long DNA fragments (>5,000 bp) by nucleotide labeling with heavier elements (e.g., halogens) for visual detection and recording.[32][33] In microfluidic Sanger sequencing the entire thermocycling amplification of DNA fragments as well as their separation by electrophoresis is done on a single glass wafer (approximately 10 cm in diameter) thus reducing the reagent usage as well as cost.[citation needed] In some instances researchers[who?] have shown that they can increase the throughput of conventional sequencing through the use of microchips.[citation needed] Research will still need to be done in order to make this use of technology effective. In October 2006, the X Prize Foundation established an initiative to promote the development of full genome sequencing technologies, called the Archon X Prize, intending to award $10 million to "the first Team that can build a device and use it to sequence 100 human genomes within 10 days or less, with an accuracy of no more than one error in every 100,000 bases sequenced, with sequences accurately covering at least 98% of the genome, and at a recurring cost of no more than $10,000 (US) per genome."[34]
Each year NHGRI promotes grants for new research and developments in genomics. 2010 grants and 2011 candidates include continuing work in microfluidic, polony and base-heavy sequencing methodologies [35]
1953 Discovery of the structure of the DNA double helix.[36] 1972 Development of recombinant DNA technology, which permits isolation of defined fragments of DNA; prior to this, the only accessible samples for sequencing were from bacteriophage or virus DNA. 1977 The first complete DNA genome to be sequenced is that of bacteriophage X174.[37] 1977 Allan Maxam and Walter Gilbert publish "DNA sequencing by chemical degradation".[5] Frederick Sanger, independently, publishes "DNA sequencing with chain-terminating inhibitors".[38] 1984 Medical Research Council scientists decipher the complete DNA sequence of the Epstein-Barr virus, 170 kb. 1986 Leroy E. Hood's laboratory at the California Institute of Technology and Smith announce the first semi-automated DNA sequencing machine. 1987 Applied Biosystems markets first automated sequencing machine, the model ABI 370. 1990 The U.S. National Institutes of Health (NIH) begins large-scale sequencing trials on Mycoplasma capricolum, Escherichia coli, Caenorhabditis elegans, and Saccharomyces cerevisiae (at US$0.75/base). 1991 Sequencing of human expressed sequence tags begins in Craig Venter's lab, an attempt to capture the coding fraction of the human genome.[39] 1995 Craig Venter, Hamilton Smith, and colleagues at The Institute for Genomic Research (TIGR) publish the first complete genome of a free-living organism, the bacterium Haemophilus influenzae. The circular chromosome contains 1,830,137 bases and its publication in the journal Science[40] marks the first use of whole-genome shotgun sequencing, eliminating the need for initial mapping efforts. 1996 Pl Nyrn and his student Mostafa Ronaghi at the Royal Institute of Technology in Stockholm publish their method of pyrosequencing[41] 1998 Phil Green and Brent Ewing of the University of Washington publish phred for sequencer data analysis.[42] 2000 Lynx Therapeutics publishes and markets "MPSS" - a parallelized, adapter/ligation-mediated, bead-based sequencing technology, launching "next-generation" sequencing.[43]
2001 A draft sequence of the human genome is published.[44][45] 2004 454 Life Sciences markets a parallelized version of pyrosequencing.[46] [47] The first version of their machine reduced sequencing costs 6-fold compared to automated Sanger sequencing, and was the second of a new generation of sequencing technologies, after MPSS.[24]
In Silico Biology : Bioinformatics and data mining Journal CABIOS (Computer applications in Biosciences) -1984
Sequencing: Construction of PHYSICAL MAPS (physical position of each clone on the chromosome depending on the physical distance) [Linkage or chromosomal map is made from recombination frequency] Method of whole genome sequencing: (started during late 80s) 1.Whole-genome Shotgun (WGS) sequencing Bacterium Haemophylus influenzae (1995)- 140 seq contigs each of 2-20 overlapping clones and representing diff. non-overlapping portions of the genome.
Contig is a set of contiguous overlapping clones, each contig having two or more than 25 clones and a Singleton is a clone not incorporated in to any contig. International Human Genome project (HGP headed by Franklin from NIH of Federal, malty state, malty inst.
IHGSC= International human genome sequencing consortium) and the individual genomics company (Craig Venter of Celera Genomics, Maryland USA) released the rough draft of human genome sequences during Feb. 2001. Genomes of following 4 eukaryotes were sequenced:Yeast (Saccharoromyces cerevisiae) Nematode (Coenorbditis elegans) Fruit fly (Drosophila melanogaster) Higher plant ( Arabidopsis thaliana) Fission yeast (Schizosaccharomyces pombe)-2002 Later mouse and human genomes Rough drafts of rice genome by M/S Monsanto & M/S Syngenta and also by the Chinese Assembly packages sequencing: for large scale genome
Shotgun sequences generated by automatic sequences called Contigs : A set of contiguous overlapping clones PHRED : Base calling software with quality identification Reads from ABI 373, ABI 377, ABI 3700 Documented sequences in PHRED files or Fasta formats
PHRAP (Fragment assembly program or Phils revised assembly programs) : For assembly of shortgun DNA seq. Data. It makes contig seq, each as a mosaic of high quality parts of read outs and removes consensus seq of low quality values. DEMIGLACE: identifying polymorphisms including SNPs Consed: For viewing (Sequence editor-viewer). and editing Genome seq
Seqman & Sequencher : For small scale seq projects EST Clustering packages : EST of > 250 sp. are in dbEST EST- Expressed sequence tags (small seq of an expressing gene) CORPUS (Contig Semantics) refinement performed using
2.
by
clone
Between 10,000 to 20,000 BACs were selected to generate a working draft for Human genome. Minimum tiling path (with suitable algorithms [S/W] used to have few BAC clones for the entire genome. BAC clones are used for subcloning of` 500-800 bp in to cosmid or plasmid vectors. These are seq randomly. All parts of genome is seq 4-5 times so that
no part is left out. In case of WGS sequencing 8-10 fold seq is required for similar efficiency. 2.1. Construction of whole-genome BAC map (BACby-BAC approach) Maps having landmarks based on molecular markers like RFLP, STSs, SSRs and AFLPs are now available. Thus construction of whole genome physical maps of ordered BACs. The steps involved are (i) Fingerprinting of BAC clones having 10-15 times coverage of the genome. e.g. 30,000 BAC (each 150 kb) = 45 billion bp (15 fold coverage of 3 billion bp.
BACs are arranged as contigs and singletons with S/W like FPc. Physical mapping on chromosome is done by chromosome walking, FISH, deletion stocks and radiation hybrids. [In radiation hybrid mapping, human chromosomes are separated from one
another and broken into several fragments using high doses of X rays. Similar to the underlying principle of mapping genes by linkage analysis based on recombination events, the farther apart two DNA markers are on a chromosome, the more likely a given dose of X rays will break the chromosome between them and thus place the two markers on two different chromosomal fragments. The order of markers on a chromosome can be determined by estimating the frequency of breakage that, in turn, depends on the distance between the markers. This technique has been used to construct whole-genome radiation hybrid maps]. {Technique: A rodent-human somatic cell hybrid ("artificial" cells with both rodent and human genetic material), which contains a single copy of the human chromosome of interest, is X-irradiated. This breaks the chromosome into several pieces, which are subsequently integrated into the rodent chromosomes. In addition, the dosage of radiation is sufficient to kill the somatic cell hybrid or donor cells, which are then rescued by fusing them with non-irradiated rodent recipient cells. The latter, however, lack an important enzyme and are also killed when grown in a specific medium. Therefore, the only cells that can survive the procedure are donor-recipient hybrids that have acquired a rodent gene for the essential enzyme from the irradiated-rodent-human-cell-line}.
(ii) Contigs are joined by examining the extreme ends by gap filling approach.
2.2. Building of clone contigs (Overlapping series of cloned DNA fragments Hybridization approach: The first clone is selected by hybridization with mapped DNA markers. This is followed by progress on to the next by CHROMOSOME WALKING whose insert overlaps with the previous clone. The clone in question is used as a probe to screen the Genomic library. The problem in this method is the presence of repeat DNA which gives non-specific hybridization. This can partly be reduced by prehybridization with excess of genomic DNA. Also subcloning the end of the clone and using it as a probe eliminates the nonspecific hybridization due to repetitive DNA.
(i)
(ii) PCR approach for building contigs: The end of the clone is sequenced and PCR primer designed from that is used for all other clones, then overlapping clones can be identified. This process continues further. Speeding up this process can be made by combinational screening. STS (Sequence tag site) content mapping [STS is a short unique sequence that identifies one or more specific loci, which can be amplified through PCR. Each STS has a pair of PCR primers which are designed by partial sequencing of RFLP probe representing a mapped low copy number DNA seq. A sequence-tagged site (or STS) is a short (200 to 500 base
pair) DNA sequence that has a single occurrence in the genome and whose location and base sequence are known. STSs can be easily detected by the polymerase chain reaction (PCR) using specific primers. For this reason they are useful for constructing genetic and physical maps from sequence data reported from many different laboratories. They serve as landmarks on the developing physical map of a genome.
When STS loci contain genetic polymorphisms (e.g. simple sequence length polymorphisms, SSLPs, single nucleotide polymorphisms), they become valuable genetic markers, i.e. loci which can be used to distinguish individuals. They are used in shotgun sequencing, specifically to aid sequence assembly. The STS concept was introduced by Olson et al (1989). In assessing the likely impact of the Polymerase Chain Reaction (PCR) on human genome research, they recognized that single-copy DNA sequences of known map location could serve as markers for genetic and physical mapping of genes along the chromosome. The advantage of STSs over other mapping landmarks is that the means of testing for the presence of a particular STS can be completely described as information in a database: anyone who wishes to make copies of the marker would simply look up the STS in the database, synthesize the specified primers, and run the PCR under specified conditions to amplify the STS from genomic DNA. In most cases STS markers are co-dominant, i. e., allow hetorozygotes to be distinguished from the two homozygotes. The DNA sequence of an STS may contain repetitive elements, sequences that appear elsewhere in the genome, but as long as the sequences at both ends of the site are unique and conserved, researches can uniquely identify this portion of genome using tools usually present in any laboratory. Thus, in broad sense STS include such markers as microsatellites (SSRs, STMS or SSRPs), SCARs,
]. for overlapping YAC or BAC clones. With Combinational screening two BAC clones giving same sized PCR products with same STS primer are assumed to be overlaps. The combinational screening significantly reduces the requirement of number of PCR reactions.
CAPs, and ISSRs
(iii) Clone fingerprinting: Restriction fragments from all clones are electrophoresed and banding compared. Similar sized fragments indicate overlaps. This is suitable even for seq long distances like the genome (unlike the earlier two methods). Overlaps is inferred from banding patterns. Clone fingerprinting include (a) Restriction fingerprinting (b) Repetitive DNA probing (c) Repetitive DNA PCR fingerprinting (d) STS content mapping. (iv) Directed Shortgun approach:
This method seq 10 times the genome size and covers 99.8% of the genome leaving only few gaps (which can be closed as in H. influenzae). For human 7000 million individual clones (each 500 bp) will give 35,000 Mb of seq which is 10 times that of genome of 3,500 Mb. The Sequencer (2001) MegaBASE 4000 can seq 1-2 Mb/day. 100 such machines can complete seq of H. Influenzae in 1 year. 1998 the ABI 3700 machine 70 Nos. (each machine for 1000 clones= 0.5Mb/day takes 3 years.
Electronic PCR (e-PCR) : Bridging the gap between mapping and sequencing of genome
From genome sequence to function (annotation): Integrative biology Methods of annotation of genome sequences Annotation be sequence search:Loss of function by mutation approach i. Insertion mutagenesis