Finishing the euchromatic sequence of the human genome

Collins F.S., Lander E.S., Rogers J., Waterson R.H.
Abstract: The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and nearly complete coverage. Here, we report the result of this finishing process. The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers 99% of the euchromatic genome and is accurate to an error rate of 1 event per 100,000 bases. Many of the remaining euchromatic gaps are associated with segmental duplications and will require focused work with new methods. The nearcomplete sequence, the first for a vertebrate, greatly improves the precision of biological analyses of the human genome including studies of gene number, birth and death. Notably, the human enome seems to encode only 20,000-25,000 protein-coding genes. The genome sequence reported here should serve as a firm foundation for biomedical research in the decades ahead. Index Keywords: Biomedical engineering; Genes; Physiology; Proteins; Research; Biological analysis; Euchromatic sequence; Genome sequence; Nucleotides; Genetic engineering; DNA; genome; accuracy; analytic method; article; birth; death; euchromatin; evolution; gene duplication; gene number; gene sequence; human; human cell; human genome; international cooperation; medical research; nucleotide sequence; priority journal; process design; protein analysis; teamwork; vertebrate; Amino Acid Sequence; Base Sequence; Centromere; Chromosomes, Artificial, Bacterial; Chromosomes, Human; DNA, Complementary; Euchromatin; Gene Duplication; Genes; Genome, Human; Heterochromatin; Human Genome Project; Humans; Molecular Sequence Data; Multigene Family; Physical Chromosome Mapping; Plasmids; Pseudogenes; Research Design; Sensitivity and Specificity; Sequence Analysis, DNA; Telomere; Vertebrata Year: 2004 Source title: Nature Volume: 431 Issue: 7011 Page : 931-945 Cited by: 1209 Link: Scorpus Link Document Type: Article Source: Scopus Authors with affiliations:
1. Collins, F.S. 2. Lander, E.S. 3. Rogers, J.

pp. The DNA sequence of human chromosome 22 (1999) Nature. Venter. 258. 375-381 30. pp. A. 414. 431.D.R.. Grimwood. 744-746 7. Genome sequence of the nematode C. pp. An STS-based map of the human genome (1995) Science. S. The DNA sequence and analysis of human chromosome 14 (2003) Nature. 825-837 24. pp. 269. 34-37 34. 496-512 11..F. G.. 423. Maglott. 6. F. pp. M. A. Initial sequencing and analysis of the human genome (2001) Nature. A. pp... S. Heilig. H. I.. pp. pp. 67-86 Gyapay. 429. Adams. The DNA sequence and comparative analysis of human chromosome 20 (2001) Nature. DNA sequence and analysis of human chromosome 9 (2004) Nature. Feisenfeld. The complete genome sequence of Escherichia coli K-12 (1997) Science. 380. pp. Celniker.G. pp. The DNA sequence and comparative analysis of human chromosome 5 (2004) Nature. 2012-2018 14. Deloukas. pp. P.J. pp. NCBI Reference Sequence Project: Update and current status (2003) Nucleic Acids Res. Finishing a whole-genome shotgun: Release 3 of the Drosophila melanogaster euchromatic genome sequence (2002) Genome Biol. 529-535 28. 428. Deloukas. 428. Generation and initial analysis of more than 15.. 79. pp. pp. Quality assessment of the human genome sequence (2004) Nature. Fleischmann. P.000 genes (1996) Science.D. M.. Hattori. D. 934-941 Dietrich.. 409. pp. 152-154 5... 743-750 10. A physical map of 30.. 2049-2054 Dib. A physical map of the mouse genome (2002) Nature. Skaletsky. A comprehensive human linkage map with centimorgan density (1994) Science.1-0079..4.. Dunham. R. R.000 human genes (1998) Science.L. pp. 2.. A comprehensive genetic map of the human genome based on 5. 282. The DNA sequence of human chromosome 7 (2003) Nature. K. pp. Pruitt. Dunham. J.000 full-length human and mouse cDNA sequences (2002) . 489-495 18.. pp. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd (1995) Science. M. Schmutz. The DNA sequence and analysis of human chromosome 13 (2004) Nature. 418. J.. J.. S. 601-607 23.. 805-811 26.. The DNA sequence and comparative analysis of human chromosome 10 (2004) Nature. 270. pp. Gregory.C.W.. pp. 546-567 13. 277. 2185-2195 15. 429. The 1993-94 Genethon human genetic linkage map (1994) Nature Genet. 291. Blattner. elegans: A platform for investigating biology (1998) Science. W.. 405. 380. 425. pp. 860-921 16. 365-368 33. 311-319 21. P. 246-339 Murray. 421. 1945-1954 Deloukas. The genome sequence of Drosophila melanogaster (2000) Science. A comprehensive genetic linkage map of the human genome (1992) Science.D. Tatusova.. pp. 424..264 microsatellites (1996) Nature. pp. 522-528 27. Life with 6. 1453-1457 12.. 401. 31. Schloss.J. pp. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana (2000) Nature... A. A physical map of the human genome (2001) Nature. The male-specific regions of the human Y chromosome is a mosaic of discrete sequence classes (2003) Nature. Peterson.. 4. pp. Hillier. R. The DNA sequence of human chromosome 21 (2000) Nature. 0079.. 282.. The DNA sequence and biology of human chromosome 19 (2004) Nature. C.. pp. 865-871 22. Hudson. References: 1. pp. 796-815 19. Assessing the quality of the DNA sequence from the Human Genome Project (1999) Genome Res.14 20. 409. 1-4 32.. T. J. 274. 287. 268-274 31. Strausberg. L.J. 8. pp. 265. pp. The sequence of the human genome (2001) Science. J. A comprehensive genetic map of the mouse genome (1996) Nature. 149-152 9.C. pp. 157-164 25. 9. Goffeau.R. Mungall. 429. T. 369-374 29..H. 7. 408.E. Schmutz.. 3.. Guyer.. 1304-1351 17. The DNA sequence and analysis of human chromosome 6 (2003) Nature. pp. J.. Humphray. Waterson.. pp. J. R. pp.

Lupski. She. pp.A. pp......E.. pp. Eichler.. M. M. Stankiewicz. P... Church. 345-354 38. Willard. P. 2215-2223 47. She. 514-519 44... M. 27-36 48. 23... 16899-16903 35.. 6240-6244 37.. Estimate of human gene number provided by genome-wide analysis using Tetraodon nigroviridis DNA sequence (2000) Nature Genet.P. Burke.. Cliften. . Collins. M. 13. Sci. 18. Rocchi.. The structure and evolution of centromeric transition regions within the human genome (2004) Nature. 7176 50. Storz.. Szymanski.. M. Eichler. pp. Genet. Erdmann.. J. P.. H. G. D.. D. Z. H.M.E.. Kapranov.. pp. Bartel.. E. 493-506 46. Bailey. pp.. Bailey. Lander. Genome architecture. Barciszewski. Sequencing and comparison of yeast species to identify genes and regulatory elements (2003) Nature. D. H. pp.. An expanding universe of noncoding RNAs (2002) Science. 25. Birren. pp. 7482 43. J. 10. Maston. M. Large-scale transcriptional activity in chromosomes 21 and 22 (2002) Science.E..A.V. E. E. 309-313 39. Bork.S. 252-255 42. polymorphisms.A. in the press 41... M. Ruvolo. An assessment of the sequence gaps: Unfinished business in a finished human genome (2004) Nature Rev. Mol. X. V. J.A.. MicroRNAs: Genomics.. pp. 301. Locke.F...E. 423. Genet. Moyzis. pp. Natl... D.Proc.. M. .. A genome-wide survey of human peudo genes (2003) Genome Res. E. USA. Eichler. pp. E. P. M. J. Genet. J. Eichler. 296. and RNA editing Genome Res. Analysis of segmental duplications and genome assembly in the mouse (2004) Genome Res. Padlock probes reveal single-nucleotide differences.. Lai.K. pp...T. 241-254 51. pp. Bailey.. J. rearrangements and genomic disorders (2002) Trends Genet. Sci. Positive selection of a novel gene family during the emergence of humans and great apes (2001) Nature. 14. Suyama. 281-297 53.C. Patterson... 16. Nilsson.E. Riethman. 857864 40. N.K. pp.. 1175-1186 49. J. in the press 36.E. Recent segmental duplications in the working draft assembly of the brown Norway Rat (2004) Genome Res. R. J. Natl. 31. 14. pp... 13. 916-919 56. T. M. Cloning human telomeric DNA fragments into Saccharomyces cerevisiae using a yeast artificial chromosome vector (1989) Proc. D... 2559-2567 57. R.. 116. Furey. Torrents. Roest Crollius. M..P. 789-801 45..F. Analysis of human mRNAs with the reference genome sequence reveals potential errors. E.. Tuzun. 413. 430. 11. Clark. Lessons from the human genome: Transitions between euchromatin and heterochromatin (2001) Hum. Endrizzi... E.. USA. pp... Re-evaluating human gene annotation: A second-generation analysis of chromosome 22 (2003) Genome Res. pp. Meyne. X. Kellis. pp.A. biogenesis. 235-238 52.E.. 5. Ventura.S. Rudd. Olson... Analysis of the centromeric regions of the human genome assembly Trends Genet. B. Acad. 429-431 55... parent of origin and in situ distribution of centromeric sequences in human chromosomes 13 and 21 (1997) Nature Genet. pp. Johnson. Acad. pp. Cliften. P. 99. 1260-1263 54. A shotgun optical map of the entire Plasmodium falciparum genome (1999) Nature. mechanism.R. pp. G.A. Noncoding regulatory RNAs database (2003) Nucleic Acids Res. Surveying Saccharomyces genomes to identify functional elements by comparative DNA sequence analysis (2001) Genome Res. and function (2004) Cell. Zdobnov.. 86. Finding functional features in Saccharomyces genomes by phylogenetic footprinting (2003) Science. 296. Chorionic gonadotropin has a recent origin within primates and an evolutionary history of . Horvath.

Guyer..A. 1003-1007 66.. S. pp.. Madan. M. Natl... Gordon. Lee. C. The complete human olfactory subgenome (2001) Genome Res. Quail. Glass. Genet. J. pp.. 11. Natl.. 562-566 71. B. 9 (1974) Ann. J. 320-355 58.Y. J. M. 22-32 59. C. Morton. A. Scanlan. Desmarais. M. Y. 835-847 62. 8... A. 8. A. Bailey. C. pp..E... Acad.. Collins. Yanai. pp. 1916-1921 70. pp... pp. M. F. pp. Bobrow. Acad.B. Chen.S.. Parameters of the human genome (1991) Proc.. pp. pp. Whole-genome shotgun assembly and comparison of human genome assemblies (2004) Proc. Structural variation in chromosome no. Chen. 520-562 61. 295-308 68. 17. Rubin.L... Jungbluth. 685702 60. N.. 557-561 . Hum.-T. Evol. USA.A.. McMurray.R. Lancet. Genet.. E. Automated finishing with autofinish (2001) Genome Res. D. Hunkapiller.. Weverick. Chen. pp. Furguson-Smith. Guttmacher. L. 7474-7476 64. Cancer/testis antigens: An expanding family of targets for cancer immunotherapy (2002) Immunol. Gure. Rev. Genet.A. 101. Initial sequencing and comparative analysis of the mouse genome (2002) Nature.. Sequencing multimegabase-template DNA with BigDye terminator chemistry (1998) Genome Res...A..J. I. USA.. 291-304 63. Bailey. 88. Old.J. G. 83-100 67. 188. 81-86 65. Fisher. pp.selection (2002) Mol... 100. 420. Glusman. A.A. B. Human-specific duplication and mosaic transcripts: The recent paralogous structure of chromosome 22 (2002) Am. K. M. pp..M.J. Short-insert libraries as a method of problem solving in genome sequencing (1998) Genome Res.E. pp. 60.. 11. E. 614-625 69.. Green. 422. Loftus. D. Recent segmental duplications in the human genome (2002) Science.. pp. Istrail.I.O. S. P.D... K... Lin. 19.C. Green. pp.. Human centromeric DNAs (1997) Hum. 297.. Heiner.A. 70. Genome duplications and other features in 12 Mb of DNA sequence from human chromosome 16p and 16q (1999) Genomics. R. C. Sulston.S. Sci... J.. Biol. Sci. J. I..E.. A vision for the future of genomics research (2003) Nature.

Master your semester with Scribd & The New York Times

Special offer for students: Only $4.99/month.

Master your semester with Scribd & The New York Times

Cancel anytime.