Finishing the euchromatic sequence of the human genome

Collins F.S., Lander E.S., Rogers J., Waterson R.H.
Abstract: The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and nearly complete coverage. Here, we report the result of this finishing process. The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers 99% of the euchromatic genome and is accurate to an error rate of 1 event per 100,000 bases. Many of the remaining euchromatic gaps are associated with segmental duplications and will require focused work with new methods. The nearcomplete sequence, the first for a vertebrate, greatly improves the precision of biological analyses of the human genome including studies of gene number, birth and death. Notably, the human enome seems to encode only 20,000-25,000 protein-coding genes. The genome sequence reported here should serve as a firm foundation for biomedical research in the decades ahead. Index Keywords: Biomedical engineering; Genes; Physiology; Proteins; Research; Biological analysis; Euchromatic sequence; Genome sequence; Nucleotides; Genetic engineering; DNA; genome; accuracy; analytic method; article; birth; death; euchromatin; evolution; gene duplication; gene number; gene sequence; human; human cell; human genome; international cooperation; medical research; nucleotide sequence; priority journal; process design; protein analysis; teamwork; vertebrate; Amino Acid Sequence; Base Sequence; Centromere; Chromosomes, Artificial, Bacterial; Chromosomes, Human; DNA, Complementary; Euchromatin; Gene Duplication; Genes; Genome, Human; Heterochromatin; Human Genome Project; Humans; Molecular Sequence Data; Multigene Family; Physical Chromosome Mapping; Plasmids; Pseudogenes; Research Design; Sensitivity and Specificity; Sequence Analysis, DNA; Telomere; Vertebrata Year: 2004 Source title: Nature Volume: 431 Issue: 7011 Page : 931-945 Cited by: 1209 Link: Scorpus Link Document Type: Article Source: Scopus Authors with affiliations:
1. Collins, F.S. 2. Lander, E.S. 3. Rogers, J.

Initial sequencing and analysis of the human genome (2001) Nature.. elegans: A platform for investigating biology (1998) Science. 380.. P. 258. 522-528 27. J. 423. W. pp. 744-746 7. 0079. pp. The DNA sequence and comparative analysis of human chromosome 10 (2004) Nature. pp... 429. A. The DNA sequence of human chromosome 21 (2000) Nature. pp.. pp. 428.000 human genes (1998) Science. 6. 291. 405. 2012-2018 14. A. 408.. 825-837 24.1-0079. pp.. 743-750 10. pp.. Mungall.D.J. J. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana (2000) Nature. A comprehensive genetic linkage map of the human genome (1992) Science. pp. J. A comprehensive genetic map of the mouse genome (1996) Nature... M. 865-871 22. Dunham. NCBI Reference Sequence Project: Update and current status (2003) Nucleic Acids Res.264 microsatellites (1996) Nature. A comprehensive genetic map of the human genome based on 5. T. A physical map of the human genome (2001) Nature. 152-154 5. 2049-2054 Dib... 934-941 Dietrich.J. 157-164 25. The DNA sequence and comparative analysis of human chromosome 20 (2001) Nature. Deloukas. 414. pp. 246-339 Murray. 149-152 9.. D. Guyer. Hattori.E. pp... 67-86 Gyapay. L. pp. Blattner. S. 418. 3. pp.. 546-567 13. Schmutz. 489-495 18. The DNA sequence and analysis of human chromosome 14 (2003) Nature. 365-368 33. J. pp. Generation and initial analysis of more than 15. 277. 601-607 23. 31.D. 270. G.. 429. 282. 2185-2195 15.F.R. Hudson. 796-815 19. Assessing the quality of the DNA sequence from the Human Genome Project (1999) Genome Res. Pruitt. 409.. I. Hillier. 369-374 29. 9. F. The DNA sequence of human chromosome 7 (2003) Nature. pp.. Peterson. 265..14 20..000 full-length human and mouse cDNA sequences (2002) . Schloss. T. Venter. M. 1453-1457 12.. J. R. A physical map of 30. C.. The sequence of the human genome (2001) Science. 274.H. pp. 79.C. pp. pp. 431. Tatusova.D.. The genome sequence of Drosophila melanogaster (2000) Science. The DNA sequence and comparative analysis of human chromosome 5 (2004) Nature. pp. 1304-1351 17. 529-535 28. pp.. Maglott. A. 4. Finishing a whole-genome shotgun: Release 3 of the Drosophila melanogaster euchromatic genome sequence (2002) Genome Biol.. Genome sequence of the nematode C.. 311-319 21. pp. Adams. S.. pp. pp.. 7. A comprehensive human linkage map with centimorgan density (1994) Science. 1-4 32. Gregory. pp.. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd (1995) Science.. 496-512 11.000 genes (1996) Science. Heilig. The DNA sequence of human chromosome 22 (1999) Nature. Dunham.. pp. Strausberg. The male-specific regions of the human Y chromosome is a mosaic of discrete sequence classes (2003) Nature. Deloukas. 375-381 30. Quality assessment of the human genome sequence (2004) Nature. R. J. Feisenfeld. 421. H. pp. 34-37 34. 380. The DNA sequence and analysis of human chromosome 13 (2004) Nature.4. The complete genome sequence of Escherichia coli K-12 (1997) Science. R. 401. pp. Grimwood. pp. Schmutz. 282. pp. R. Fleischmann. pp. DNA sequence and analysis of human chromosome 9 (2004) Nature. An STS-based map of the human genome (1995) Science. 425. A. pp. 268-274 31. Waterson. Goffeau. Celniker. 2. The DNA sequence and analysis of human chromosome 6 (2003) Nature.C. 805-811 26. 409. pp. J.J. pp. 269. 429.. 424. The DNA sequence and biology of human chromosome 19 (2004) Nature. P.W. 8. Life with 6. Humphray. A physical map of the mouse genome (2002) Nature.R. References: 1. 1945-1954 Deloukas.. S. 287.L. pp. The 1993-94 Genethon human genetic linkage map (1994) Nature Genet... Skaletsky.G. 428... P. 860-921 16. K. M.

. She. 27-36 48. D. Bailey. Re-evaluating human gene annotation: A second-generation analysis of chromosome 22 (2003) Genome Res. E. 235-238 52.. in the press 36. 16. 309-313 39. Suyama. Large-scale transcriptional activity in chromosomes 21 and 22 (2002) Science..A. Sequencing and comparison of yeast species to identify genes and regulatory elements (2003) Nature. 13. Endrizzi.K. X. J. 2559-2567 57. Burke. Storz. 18. Willard.. Padlock probes reveal single-nucleotide differences..F. pp. pp. H.. A genome-wide survey of human peudo genes (2003) Genome Res. Eichler.. Locke. Birren. 1260-1263 54.. pp. G.. Cloning human telomeric DNA fragments into Saccharomyces cerevisiae using a yeast artificial chromosome vector (1989) Proc.. Rudd.. E.. Acad. Analysis of human mRNAs with the reference genome sequence reveals potential errors. E.. 345-354 38. Nilsson. Eichler. mechanism. 5. Mol. Olson..E. Sci.P. An expanding universe of noncoding RNAs (2002) Science. pp. 301. biogenesis. J. pp. B. D.E.. Genet... M. 14.F. 789-801 45. Eichler...S. An assessment of the sequence gaps: Unfinished business in a finished human genome (2004) Nature Rev. 493-506 46.A. J. 514-519 44.. J. P.. pp... J. M. X.. pp. Lai. polymorphisms. M. Cliften. 23.E.... pp. She. Rocchi. and RNA editing Genome Res. Clark. 1175-1186 49.. Zdobnov. R. E. Lander. P. J..A.. Patterson. Genet.. Bailey..A.A... Horvath. Bork.. pp.. Torrents.C... Erdmann. M. Genome architecture. Acad.. E. M.. Z. 99. Johnson.M... Eichler. Genet. Positive selection of a novel gene family during the emergence of humans and great apes (2001) Nature. H. 281-297 53..P.. J.. . 252-255 42.. in the press 41. Cliften. 11.S. Ruvolo. N. M.. pp. J.. Stankiewicz. and function (2004) Cell. 86. Analysis of the centromeric regions of the human genome assembly Trends Genet. Natl. Collins. USA.. 25. Church. 31. P. Maston. E. Furey.. P.. D.K.T. 296. Moyzis. pp. pp. MicroRNAs: Genomics. Ventura... Szymanski. H.R. pp. pp.... Riethman.. The structure and evolution of centromeric transition regions within the human genome (2004) Nature. 10. M. Estimate of human gene number provided by genome-wide analysis using Tetraodon nigroviridis DNA sequence (2000) Nature Genet. M. Surveying Saccharomyces genomes to identify functional elements by comparative DNA sequence analysis (2001) Genome Res. Finding functional features in Saccharomyces genomes by phylogenetic footprinting (2003) Science. 430. V. 241-254 51. M. P. Bartel. Tuzun. D. Natl. pp. pp. Meyne. USA.E. 16899-16903 35.Proc... T.E. G. pp. Sci. R. 7176 50. Kellis. pp. Bailey.E. 916-919 56. 2215-2223 47.. 429-431 55.V. Lessons from the human genome: Transitions between euchromatin and heterochromatin (2001) Hum. parent of origin and in situ distribution of centromeric sequences in human chromosomes 13 and 21 (1997) Nature Genet. pp. pp.. E. pp.E. 423. 116. D. 14. M.A. .. 13. Kapranov. rearrangements and genomic disorders (2002) Trends Genet. 6240-6244 37. Lupski. 296. Roest Crollius.. Analysis of segmental duplications and genome assembly in the mouse (2004) Genome Res. A shotgun optical map of the entire Plasmodium falciparum genome (1999) Nature. 413.. Chorionic gonadotropin has a recent origin within primates and an evolutionary history of . 857864 40. Barciszewski. Recent segmental duplications in the working draft assembly of the brown Norway Rat (2004) Genome Res. M. 7482 43. Noncoding regulatory RNAs database (2003) Nucleic Acids Res.

Scanlan... 291-304 63. J.-T. S... 9 (1974) Ann. Guttmacher.M. 17. Chen.S. Furguson-Smith. J. 19. C. pp.O.. Glusman. A. Guyer. J. pp.. Genet.A. Gure. Desmarais. Lancet..... E.E. C. Genet.. 83-100 67. B. Lin. 7474-7476 64. Collins. The complete human olfactory subgenome (2001) Genome Res... Acad. Human-specific duplication and mosaic transcripts: The recent paralogous structure of chromosome 22 (2002) Am. Glass. 8. pp. Genome duplications and other features in 12 Mb of DNA sequence from human chromosome 16p and 16q (1999) Genomics. 11...... B. 88.J.S..selection (2002) Mol.B. Chen. Sci.. R. Bailey.. 11. 60. E.. 188.C. 295-308 68.I.. Automated finishing with autofinish (2001) Genome Res. pp. 1003-1007 66. Madan... Bobrow. Morton. pp. Bailey. Sequencing multimegabase-template DNA with BigDye terminator chemistry (1998) Genome Res. Green. K.A. Natl. J. Rev. M. McMurray. pp.E. 22-32 59. 8. 835-847 62. D. I.D. pp. 81-86 65. Genet. Structural variation in chromosome no. A.E. 685702 60. A vision for the future of genomics research (2003) Nature.A... Quail.. 614-625 69. A. 422. Heiner. USA. USA.R. pp.. pp. S.. Whole-genome shotgun assembly and comparison of human genome assemblies (2004) Proc. M.A. Parameters of the human genome (1991) Proc.. pp. Cancer/testis antigens: An expanding family of targets for cancer immunotherapy (2002) Immunol. Lee. M. Hum. K.L. F.. P. Acad. 557-561 . 320-355 58. Evol. pp. 297.A. Yanai. Chen. Natl. Fisher. D. pp.. pp. Istrail. I. Hunkapiller.J.. M. Sulston.J. 70.. Human centromeric DNAs (1997) Hum. Jungbluth. A. 101. Rubin.. Gordon. pp. J... Old. Weverick. Biol. Recent segmental duplications in the human genome (2002) Science. Initial sequencing and comparative analysis of the mouse genome (2002) Nature.. pp. 100. C. 562-566 71. G. N.. C.Y. Y. Short-insert libraries as a method of problem solving in genome sequencing (1998) Genome Res. 520-562 61. Sci. 1916-1921 70... M. Loftus..A. Green. 420. L...

Sign up to vote on this title
UsefulNot useful