Finishing the euchromatic sequence of the human genome

Collins F.S., Lander E.S., Rogers J., Waterson R.H.
Abstract: The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and nearly complete coverage. Here, we report the result of this finishing process. The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers 99% of the euchromatic genome and is accurate to an error rate of 1 event per 100,000 bases. Many of the remaining euchromatic gaps are associated with segmental duplications and will require focused work with new methods. The nearcomplete sequence, the first for a vertebrate, greatly improves the precision of biological analyses of the human genome including studies of gene number, birth and death. Notably, the human enome seems to encode only 20,000-25,000 protein-coding genes. The genome sequence reported here should serve as a firm foundation for biomedical research in the decades ahead. Index Keywords: Biomedical engineering; Genes; Physiology; Proteins; Research; Biological analysis; Euchromatic sequence; Genome sequence; Nucleotides; Genetic engineering; DNA; genome; accuracy; analytic method; article; birth; death; euchromatin; evolution; gene duplication; gene number; gene sequence; human; human cell; human genome; international cooperation; medical research; nucleotide sequence; priority journal; process design; protein analysis; teamwork; vertebrate; Amino Acid Sequence; Base Sequence; Centromere; Chromosomes, Artificial, Bacterial; Chromosomes, Human; DNA, Complementary; Euchromatin; Gene Duplication; Genes; Genome, Human; Heterochromatin; Human Genome Project; Humans; Molecular Sequence Data; Multigene Family; Physical Chromosome Mapping; Plasmids; Pseudogenes; Research Design; Sensitivity and Specificity; Sequence Analysis, DNA; Telomere; Vertebrata Year: 2004 Source title: Nature Volume: 431 Issue: 7011 Page : 931-945 Cited by: 1209 Link: Scorpus Link Document Type: Article Source: Scopus Authors with affiliations:
1. Collins, F.S. 2. Lander, E.S. 3. Rogers, J.

7. The DNA sequence and comparative analysis of human chromosome 20 (2001) Nature.. The DNA sequence and comparative analysis of human chromosome 5 (2004) Nature. pp. pp. 414. Feisenfeld. 496-512 11. 805-811 26. 865-871 22. 265. 380. Tatusova. 270. S. pp. An STS-based map of the human genome (1995) Science. A. S. pp.. R. Life with 6..C. I. 405. Dunham. Goffeau. 31. Peterson. 269. 418. P. Deloukas. 149-152 9. pp.. 0079. P. M.. Pruitt. Hillier.. 421. J. The genome sequence of Drosophila melanogaster (2000) Science. 2.. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana (2000) Nature. 489-495 18. Quality assessment of the human genome sequence (2004) Nature. pp. M. A.. 546-567 13.. Skaletsky.. A comprehensive genetic map of the human genome based on 5. 4. 152-154 5. 79. 409.000 full-length human and mouse cDNA sequences (2002) . 3. DNA sequence and analysis of human chromosome 9 (2004) Nature. The DNA sequence and comparative analysis of human chromosome 10 (2004) Nature. pp. H.. Schloss. Strausberg. S. 1304-1351 17. pp. A physical map of the human genome (2001) Nature. Deloukas.J. P. T.R. 246-339 Murray.... C.D.D.. 423. R. Maglott. pp. 428. 429. 311-319 21. pp. 375-381 30.... 796-815 19. Generation and initial analysis of more than 15. 522-528 27. Adams.. J. elegans: A platform for investigating biology (1998) Science. Grimwood. References: 1. The DNA sequence and analysis of human chromosome 6 (2003) Nature. 291. pp. pp. 369-374 29. F. pp. 282. Fleischmann. 409. 6. 743-750 10. Genome sequence of the nematode C. Waterson.. pp. A comprehensive genetic map of the mouse genome (1996) Nature. pp. pp.G. Assessing the quality of the DNA sequence from the Human Genome Project (1999) Genome Res.000 genes (1996) Science. pp. G. 401.. 365-368 33. 529-535 28. pp. pp. pp. Gregory. 1453-1457 12. 268-274 31.R. pp. The DNA sequence of human chromosome 21 (2000) Nature..J. The male-specific regions of the human Y chromosome is a mosaic of discrete sequence classes (2003) Nature. 431. 429. 428. T. pp. pp.E... 825-837 24. 9. 287. pp.. The DNA sequence and analysis of human chromosome 13 (2004) Nature. J. 282. Dunham. A. pp. 429. Blattner.14 20. A. 8.J. K. A physical map of the mouse genome (2002) Nature. 2049-2054 Dib. 1945-1954 Deloukas. R. Heilig. Hattori. D..H. pp. The DNA sequence of human chromosome 22 (1999) Nature.W. 744-746 7. J. 157-164 25. M. R.264 microsatellites (1996) Nature. W. Schmutz. 934-941 Dietrich. Celniker. 277. Finishing a whole-genome shotgun: Release 3 of the Drosophila melanogaster euchromatic genome sequence (2002) Genome Biol. Initial sequencing and analysis of the human genome (2001) Nature.4. 425. 860-921 16. pp. 380. 2012-2018 14. 258. 424.000 human genes (1998) Science.1-0079. L. Guyer. Schmutz. pp. 274.. 1-4 32. pp. NCBI Reference Sequence Project: Update and current status (2003) Nucleic Acids Res..D. The DNA sequence and analysis of human chromosome 14 (2003) Nature. A comprehensive genetic linkage map of the human genome (1992) Science.. 408. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd (1995) Science..L. Hudson. 67-86 Gyapay. pp. Venter. 601-607 23. pp. The complete genome sequence of Escherichia coli K-12 (1997) Science. J. J. Mungall. The DNA sequence and biology of human chromosome 19 (2004) Nature.F. A comprehensive human linkage map with centimorgan density (1994) Science. 34-37 34... pp.. J... A physical map of 30. The 1993-94 Genethon human genetic linkage map (1994) Nature Genet. The sequence of the human genome (2001) Science. 2185-2195 15.. The DNA sequence of human chromosome 7 (2003) Nature.. pp.. Humphray.C.

pp. Rocchi. 16. pp. H. J. pp. D. pp. Locke.. 430. 281-297 53. Genome architecture. Clark. Ventura.. J.. An expanding universe of noncoding RNAs (2002) Science. D... Recent segmental duplications in the working draft assembly of the brown Norway Rat (2004) Genome Res. pp. Analysis of segmental duplications and genome assembly in the mouse (2004) Genome Res.. M. Church. Cliften. Patterson. Bork. Tuzun.K. An assessment of the sequence gaps: Unfinished business in a finished human genome (2004) Nature Rev. parent of origin and in situ distribution of centromeric sequences in human chromosomes 13 and 21 (1997) Nature Genet.. pp. 11. and function (2004) Cell.. P.. Bailey. MicroRNAs: Genomics. and RNA editing Genome Res.E. Torrents. rearrangements and genomic disorders (2002) Trends Genet. E.. A shotgun optical map of the entire Plasmodium falciparum genome (1999) Nature. R..E.. M. G. Finding functional features in Saccharomyces genomes by phylogenetic footprinting (2003) Science... Willard.. 10. 7176 50. polymorphisms. T.... Lessons from the human genome: Transitions between euchromatin and heterochromatin (2001) Hum. USA..E.. pp. Acad. Genet. 423. J. USA. M.A. . E. Nilsson. Rudd.. 296. P. Johnson. pp. Natl. Moyzis. 18. Szymanski. 25. Eichler. D. pp. Genet.A.. 789-801 45. 7482 43. E. Padlock probes reveal single-nucleotide differences. .K. 14...M. pp.. P. Bailey.P. 23. Riethman.. 345-354 38.S. pp. Zdobnov.. M.. M. 309-313 39. Bartel..C. M. Eichler. M. J.S. V.. Analysis of the centromeric regions of the human genome assembly Trends Genet.. 99.A.... 301. pp.P. 5.A. Maston. G... Cloning human telomeric DNA fragments into Saccharomyces cerevisiae using a yeast artificial chromosome vector (1989) Proc. Surveying Saccharomyces genomes to identify functional elements by comparative DNA sequence analysis (2001) Genome Res.A. Sequencing and comparison of yeast species to identify genes and regulatory elements (2003) Nature. P. 2559-2567 57. Collins. Olson. Erdmann. 6240-6244 37.. Chorionic gonadotropin has a recent origin within primates and an evolutionary history of . pp. in the press 36... N.A. pp. Positive selection of a novel gene family during the emergence of humans and great apes (2001) Nature. M. Furey. 31... Lander. 116. 429-431 55. 16899-16903 35. H. Barciszewski. Eichler. M. 916-919 56. 296. Mol.. 514-519 44.. Ruvolo. Burke. biogenesis. Re-evaluating human gene annotation: A second-generation analysis of chromosome 22 (2003) Genome Res. E. 13. J.. 235-238 52.. pp. Cliften.. 493-506 46. 252-255 42. M. Acad.F. Kapranov. 1175-1186 49. D. J. Horvath. Suyama. E.. X. Genet. She. 13. B... X. pp. Kellis. pp. P.. Bailey. Birren. E. Endrizzi. Estimate of human gene number provided by genome-wide analysis using Tetraodon nigroviridis DNA sequence (2000) Nature Genet. Natl. E. 27-36 48. Stankiewicz. Roest Crollius. J.E.E. She. 241-254 51.V. in the press 41... The structure and evolution of centromeric transition regions within the human genome (2004) Nature. Meyne. Storz. pp. pp.. Eichler. Lupski... A genome-wide survey of human peudo genes (2003) Genome Res. 86.. pp. R. J. 1260-1263 54.R. M. Sci. 2215-2223 47. Large-scale transcriptional activity in chromosomes 21 and 22 (2002) Science. mechanism..T. 857864 40.E. Noncoding regulatory RNAs database (2003) Nucleic Acids Res.. H. pp..... Lai. Sci.Proc.F. Z.. 413. Analysis of human mRNAs with the reference genome sequence reveals potential errors. D.. 14.E.

. pp. pp. 100. 8. Genet.. Recent segmental duplications in the human genome (2002) Science.. Lin. Green.. Sulston.. 835-847 62. Furguson-Smith. Initial sequencing and comparative analysis of the mouse genome (2002) Nature.. Sci.. 88. Genome duplications and other features in 12 Mb of DNA sequence from human chromosome 16p and 16q (1999) Genomics.A. Bailey. Scanlan. pp. S.A. 562-566 71. 19.. C..S.. M. J.. J. Chen. Y.. Short-insert libraries as a method of problem solving in genome sequencing (1998) Genome Res. Lancet. A. Genet.. Jungbluth. pp. Gordon. pp. The complete human olfactory subgenome (2001) Genome Res. B. 11.. 1916-1921 70.A. Glusman. A. E. pp.A...J. 17. Old. L. K. pp. 8.. pp..L.Y. Quail. pp. Genet.. USA. F. 188. Istrail... Cancer/testis antigens: An expanding family of targets for cancer immunotherapy (2002) Immunol. R. A. Chen. Bailey. 320-355 58..R.I.. M. pp. I. 420. pp. 11. 422. Natl. Hum. C. Biol. 557-561 . Acad. Guyer. Fisher. Parameters of the human genome (1991) Proc.E. Weverick. USA... M. 291-304 63. 7474-7476 64. Green. Sci. P. Gure.. Sequencing multimegabase-template DNA with BigDye terminator chemistry (1998) Genome Res. Collins.. Whole-genome shotgun assembly and comparison of human genome assemblies (2004) Proc. Chen.selection (2002) Mol.. Heiner.. 295-308 68. pp. Glass. 9 (1974) Ann. J.. Loftus. A... K. N. 101. McMurray.D. 70. Human centromeric DNAs (1997) Hum. Structural variation in chromosome no. 83-100 67. D. Morton. pp.J..O. E. Automated finishing with autofinish (2001) Genome Res.B.. Lee.A.. Madan. 22-32 59.E. J. J. Natl. Rubin.-T. M. A vision for the future of genomics research (2003) Nature. C.S. Desmarais.. I. S. 614-625 69.. D.J. Bobrow. Evol.M.. Human-specific duplication and mosaic transcripts: The recent paralogous structure of chromosome 22 (2002) Am. pp.. M.C. Yanai. Acad. 81-86 65. Rev.. G. 520-562 61. 60. 685702 60.A.. B.. pp. Guttmacher.. 297.. C. Hunkapiller. 1003-1007 66..E.

Sign up to vote on this title
UsefulNot useful