2. Despite the steric protection of the carbene matrix for data collection correspond to the orthorhom- 1602 (2000).
carbon atom, 2c,d undergo coupling reactions bic space group Pbcm, with a ! 6.0039(4) Å, b ! 31. C. W. Bielawski, R. H. Grubbs, Angew. Chem. Int. Ed.
with tert-butyl isocyanide at room tempera- 19.6483(13) Å, c ! 12.4792(8) Å, and V (cell volume) ! Engl. 39, 2903 (2000).
ture and afford the corresponding keten- 1472.1(2) Å3. A half molecule of C14H15F6N per asym- 32. S. C. Schu¨rer, S. Gessler, N. Buschmann, S. Blechert,
metric unit (number of formula units per cell ! 4), Angew. Chem. Int. Ed. Engl. 39, 3898 (2000).
imines 4c,d in good yields (Scheme 3). Be- giving a formula weight of 311.27 and a calculated 33. J. Louie, R. H. Grubbs, Angew. Chem. Int. Ed. Engl. 40,
cause this reaction, which is typical of tran- density (Dc) of 1.404 Mg m"3. The data of the structure 247 (2001).
sient singlet carbenes (27), is not observed were collected on a Bruker-AXS CCD 1000 diffractom- 34. For supplementary data, see Science Online (www.
eter at a temperature of 173(2) K with graphite-mono-
for push-push carbenes II, we concluded that chromated Mo K# radiation (wavelength ! 0.71073 Å) 35. We are grateful to the Centre National de la Recherche
the isocyanide acts here as a Lewis base by using phi- and omega-scans. We solved the structure Scientifique, Rhodia, and the Deutsche Forschungsge-
toward carbenes 2. This result demonstrates by direct methods, using SHELXS-97 [G. M. Sheldrick, meinschaft for financial support of this work.
Acta Crystallogr. A46, 467 (1990)]. The linear absorp-
that, in contrast with II, the vacant carbene tion coefficient, $, for Mo K radiation is 0.136 mm"1. 9 March 2001; accepted 20 April 2001
orbital of 2 remains accessible.
Up to now, the number and variety of
stable carbenes have been limited by the per-
ceived necessity for two strongly interacting Microbial Genes in the Human
substituents. Despite this perceived limita-
tion, these species have found applications (9, Genome: Lateral Transfer or
28 –33) even on a large scale. This work
establishes that only a single electron-active
substituent is necessary to isolate a carbene.
Gene Loss?
Therefore, a broad range of these species will Steven L. Salzberg,* Owen White, Jeremy Peterson,
soon be readily available, which will open the Jonathan A. Eisen
way for new synthetic developments and ap-
plications in various fields. The human genome was analyzed for evidence that genes had been laterally
transferred into the genome from prokaryotic organisms. Protein sequence
References and Notes comparisons of the proteomes of human, fruit fly, nematode worm, yeast,
1. J. B. Dumas, E. Peligot, Ann. Chim. Phys. 58, 5 (1835).
2. E. Buchner, T. Curtius, Ber. Dtsch. Chem. Ges. 8, 2377 mustard weed, eukaryotic parasites, and all completed prokaryote genomes
(1885). were performed, and all genes shared between human and each of the other
3. H. Staudinger, O. Kupfer, Ber. Dtsch. Chem. Ges. 45,
501 (1912).
groups of organisms were collected. About 40 genes were found to be exclu-
4. M. Jones, R. A. Moss, Eds., Carbenes ( Wiley, New sively shared by humans and bacteria and are candidate examples of horizontal
York, vols. I and II, 1973 and 1975). transfer from bacteria to vertebrates. Gene loss combined with sample size
5. M. Regitz, Ed., Carbene (Carbenoide), vol. E19b of effects and evolutionary rate variation provide an alternative, more biologically
Methoden der Organischen Chemie (Houben-Weyl)
(Thieme, Stuttgart, 1989). plausible explanation.
6. U. H. Brinker, Ed., Advances in Carbene Chemistry ( JAI
Press, Greenwich, CT, vols. 1 and 2, 1994 and 1998). Studies of the evolution of species long cluding those encoding antibiotic resis-
7. F. Z. Do¨rwald, Ed., Metal Carbenes in Organic Synthe- assumed that gene flow between species is
sis ( Wiley, Weinheim, Germany, 1999).
tance, can be exchanged between even dis-
8. H. Tomioka, Acc. Chem. Res. 30, 315 (1997). a minor contributor to genetic makeup, tantly related bacterial species (horizontal
9. W. A. Herrmann, C. Ko¨cher, Angew. Chem. Int. Ed. generally thought to only occur between or lateral gene transfer). A growing body of
Engl. 36, 2163 (1997). closely related species. This picture evidence suggests that lateral gene transfer
10. A. J. Arduengo III, Acc. Chem. Res. 32, 913 (1999).
11. D. Bourissou, O. Guerret, F. P. Gabbaı¨, G. Bertrand, changed when researchers began to study may be a much more important force in
Chem. Rev. 100, 39 (2000). the genetics of microorganisms. Genes, in- prokaryotic evolution than was previously SCIENCE VOL 292 8 JUNE 2001 1903

5). In the Ensembl pro- teome. only 135 Ensembl genes and 89 Celera genes remained as possible BVTs. the “fruit fly” line shows the aver- age number of genes remaining in the BVT set after removing all Drosophila melano- gaster genes plus one. we focused on detect- that 223 bacterial genes have been laterally and (ii) that the transferred genes be stably ing possible gene transfers from bacteria to transferred into the human genome some. and each line shows the effect with a different starting proteome. and four additional protein sets. 3915 genes from the Celera proteome match at least one prokaryotic gene with the same E-value threshold (Table 1). Those genes found in possibility is of interest because it implies somal elements. containing 26. REPORTS realized (1). not just into any somatic cell. Subse- quent points on the plots show averages after removing one more proteome. (8) and it mail: salzberg@tigr. such as the ability to duplicate and fered in that it included the human pro- *To whom correspondence should be addressed. 7 ). Like- Although the possibility of lateral gene included proteins from parasite lineages not included in the previous study (9). be continuing. One transfer has gained much support in recent ing eukaryotes have also been well docu. maintained in the host cell. in most cases involving transfers be manipulating the human genome for sequences (1. they need either to tebrates are considered possible cases of provide a selective advantage to their host lateral transfer ( putative bacteria to verte- The Institute for Genomic Research. we used the Ensembl set. vertebrates by analysis of gene distribution time during vertebrate evolution. MD 20850. As in genome led to the suggestion recently (3) cell lineage. because of problems with meth- Analysis of the rough draft of the human (i) that genes be transferred into the germ ods and with the data analyzed (6. Such a tion into a chromosome or as extrachromo. 1 nonvertebrates and a collection of miscellaneous nonvertebrates (“Other”). If the pattern of genes shared between prokaryotic and eukaryotic species is a ro- bust measure of lateral gene transfer. transpose. (Bottom) Celera protein set. Fig. Human genes for which homologs are found in completed prokaryotic genomes were iden- tified by searching against all publicly available complete genome sequences. 4388 genes have BlastP matches with E-values less than 10"10 to a protein from a complete prokaryotic genome. patterns across taxa. possible implication is that bacteria might years from analysis of complete genome mented. For these genes to spread bacteria and vertebrates but not in nonver- that bacterial infections have led to perma.sciencemag. either by inser. the recent study (3). (Top) Ensembl protein suggests that the number of BVTs might set. For our analysis of the human proteome. containing 31. teome reported by Venter et al. Such an event would require difficulty. 1. the inference of such from organellar genomes into the eukary. two. the number of BVTs de- creased (Fig. Lateral gene transfers involv. through the population. decrease further if more nonvertebrate ge- 1904 8 JUNE 2001 VOL 292 SCIENCE www. transfers into vertebrates were ruled out if a homolog of a gene was found in a nonvertebrate eukaryotic genome. nent transfer of genes into their hosts. or BVTs). After removal of all genes found in complete nonvertebrate ge- nomes. then we would expect that the total number of true BVTs would be independent of which and how many nonvertebrate genomes have been sampled. brate transfers. Genes shared by humans and prokaryotes after removing successive proteome sets from five The downward trend of the plot in Fig. 4. E. .780 proteins (3). We focused on analyzing complete ge- nome sequences because the absence of a gene from a species cannot be inferred from incomplete genome sequences. as the number of nonvertebrate proteomes screened against human increased. 9712 Medical or to exhibit some kind of “selfish” prop. Rockville. their own benefit and that this process may gene transfer events is still fraught with otic nucleus (2). erties. Our study dif- Center Drive. three.544 proteins (8). and the Celera set. However. for ex- ample. As in (3). The two plots show com- parable results for the Ensembl and Celera protein sets. 1).

is the phe. generation time (13).146 4. If this any fixed E-value SCIENCE VOL 292 8 JUNE 2001 1905 . soybean. will miss the annotation of the completed portions of common ancestor started with 10. we used that any one gene was lost from four lin. with a tion for fruit fly. and numerous genes have the data sets being analyzed. removal of the of the BVT set. There.606 5. 1). and the state of the art in eukary- “core proteome” sizes] and each lineage alignment scores may simply be the result otic gene finding is imperfect. First. respectively. shows a phylogenetic tree of three human interpreted with great caution. karyotic lineage into the vertebrate lineage.3)4 ! 0. including variation in be detectable in phylogenetic trees. then the probability of more rapid mutation in the invertebrate genes missing from the annotation. 1.080 15. or 81 genes lost ancestry on the basis of this evidence alone. duced the set to 72 BVTs. It is likely that many genes have been laterally transferred into BVT set reported in (3). the initial BVT sets against the nucleotide from all four of the nonvertebrate lineages. In cutoff of 10"10. Parasites 11. example. One pling effect shown in Fig.470 9.00081. any other prokaryotic lineage. any transferred gene should be more sion 8. For eukaryotic nonvertebrates (labeled “Other” DNA replication accuracy. and cyanobacteria. Another important aspect of the species. or any other tion illustrates the possible contribution of maximum E-value cutoff of 10"10 (i. This analysis resulted in two likely to be lost than others (e.. analyses of complete genomes. after most of the major been removed from the 31. nematode. iden.780 used for the eukaryotic complete genomes are from so. prokaryotic phyla were established. In addition. not eukaryotic genomes analyzed all resulted chance is less than 1 in 1010). thaliana 25. ated phylogenetic trees for genes from the Ensembl BVTs and 21 from the Celera should not be used to measure evolutionary BVT sets for which sufficient numbers of BVTs. and the reason nonessential. we reduced the size karyotes. 2. we lower. As a result of this relatedness (14. with low selective pressure such example is shown in Fig. the organism is not proof that the gene is miss- gene loss to the pattern. The October release (ver- called “crown” eukaryotes: animals.302 A.756 ies for different genes within a genome as www. This is particularly true in related genes were available and found that two sets. DNA of the Ensembl BVT set to 74 genes and the matches between Ensembl BVTs and A. Supposing that 20% of Celera BVT set to 56 genes. Because of the effects tochondrial or plastid genomes to eukary- tified matches to organisms such as of rate variation. all with E-values of 10"32 or translates into 65 genes lost in all four invertebrate mitochondrial genomes. The use of all of these genomes are complete. then 30% loss after comparing the 74 Ensembl BVTs to ditis elegans. has and fungi. ysis used to support the claim that 223 hyaluronan synthase paralogs. analysis in (3). phylogenetic trees built from in Fig. further limiting Table 1. which was chosen for genome spective. in the tree is consistent with normal vertical thaliana.081 The rate of nucleotide substitution var. similarity share a common ancestor.000 genes genes with slightly weaker similarity to some eukaryotic genomes is still in [see Rubin et al. other two left 70 Ensembl BVTs. ready been removed in the steps that re- alone could account for a large proportion reducing that BVT set to 72 genes. DNA repair.e. Suppose the five likelihood that any Blast hit was due to ing from that organism’s genome. respectively. the sampling of prokaryotic evo- lutionary diversity is much broader. REPORTS nomes are added to the analysis. Three of these five genes had al- lineages. leaving only 114 and 68 genes in the relatedness (15).0). Second. melanogaster 14. with a Blast the sample of evolutionary diversity. siae. 15). The phylogenetic genes shared by the eukaryotic common human (3). the Blast score for the bacterial match was not branch within any particular prokaryot- This seems especially likely in some of the at least 10"9-fold smaller than the nonver.388 3. It seems likely that the sequencing of Bacteria/Archaea 85. fore. genetic drift. where it can most did not show patterns consistent with One explanation for the species-sam. three of these (Cae- norhabditis elegans. D. some genes are probably less tebrate genes to 10"7. Instead.824 4. It appears likely that gene loss found two genes of mitochondrial origin. thaliana and three matches to Caenorhab- a proteome cannot be lost. the null hypothesis should be that inheritance. The Ensembl proteome set has been fur- sampling effect is the phylogenetic bias in this likely occurred within the past 400 to ther curated. and genes that have been transferred from mi- small number of characterized genes. If a gene was transferred from a pro. which would proteins in GenBank from numerous other nation of factors. Because the weaker progress. for which extensive gene loss has analysis used the same threshold for pro. containing 29. sequences of the genomes of complete Eu- Of course. closely related to its donor lineage than to ysis confirms this: Searching through all cies. Thus. lineage. The absence of a gene from the annota- been documented (10).915 a broader variety of eukaryotic genomes Yeast 9. and Homo sapiens) are animals.400 13. A simple computa.780 26. (11) for a discussion of nonvertebrate proteins. which why species distribution patterns must be allowing more rapid mutation.. melanogaster.151 9.sciencemag. In the anal. In addition. C.660 ber of BVTs. be expected that at least some genes will be bacterial to vertebrate gene transfer. D. most of which have a relatively lection. is not an accurate measure of evolutionary ferred genes branch with #-proteobacteria and Aspergillus terreus. karyotic and nonvertebrate matches. plants. con. though.544 (12).g. the absence of the gene from sequencing in part because of its small two genes with sufficiently high sequence nonvertebrate lineages may be due either to genome size. it is impossible to rule out common TBlastN to search the human proteins from eages is (0. well as for the same gene in different spe. Proteome sizes and number of genes shared with each of the human protein sets.770 12. ic lineage. We gener- filtering. all from the nomenon of gene loss. Blast E-values. By reducing the E-value cutoff for nonver. elegans 19. and Saccharomyces cerevi.508 7. Our gene loss or rate variation. Our anal. This rate variation is due to a combi. the placement of groups species analyzed here. a gene was considered a BVT if analysis reveals that the vertebrate genes do ancestor have been lost in some lineages. such as Arabidopsis tebrate match score. from a single adaptive radiation.030 7. All of the 500 million years. Number of Number matching Number matching Organisms proteins Ensembl proteome Celera proteome taining representatives from many widely divergent bacterial and Archaeal lineages Human – 31. To check for lost 30% of its genes.324 14. polymerase genes). From a statistical per. 21 genes were removed from the which are measures of sequence similarity. contrast.304 genes.103 will lead to a further reduction in the num. recombination. sequence similarity alone otic nuclei (16 –18) indicate that the trans- Suberites domuncula (sponge). se.

elegans Sequencing Consortium. Andersson et al. X. REPORTS eliminated some genes (including possible proteins that in turn match and www. 21. were proposed as lateral transfers from bacteria to vertebrates (3). Phylogenetic tree of homologs of three human hyaluronan synthase (HAS) proteins that 23. and those hits with a BLAST E-value of 10"10 or less were used for the initial analysis.A. J. Nelson et al. interesting final reductions in the data set. are the transfer (3) is essentially a statistical one. as was a set of all avail- matches one or more nonvertebrates. Felsenstein. J. 6960 (2000). Fraser. yeast.S.1126/science. Nature Biotechnol. all statistical arguments. The (R01 LM06845 to S. D. against the 56 Celera BVTs yields some several plausible biological explanations 6. 1997). Lin et al. 2. Science 291. 3. MA. though. Paulsen. reducing the BVT set to After careful reexamination of the hu- 47 genes. D. Dev. C. Biol. 4. In most necessarily so because of the inherent im. Heidelberg. The presence of multiple HAS genes in different vertebrate species is likely due to duplication in 10. elegans) (21). 10 be exercised to confirm assumptions and (20). Altschul. Scale bar corresponds to estimated evolutionary distance units. 12. A. More distantly related proteins were used as outgroups to root the tree. accepted 4 May 2001 Swissprot) are indicated in the tree.celera. M. Opin. Gish. that comprise candidates for possible 3. 11. 24.780 and 26. for the presence of these genes in the hu. 2204 (2000).1061036 vertebrates. These gene sets are available as supplementary in- formation at Science Online at www. The Arabidopsis Genome Initiative. Nature 408. 416 (2000). L. Both sets contain genes not included in the man genome. I. 1049 (2000). Nature 409. J. The more probable expla. and J. Gogarten.824 pro- teins (www.. S. M. five genes match Celera nation for the existence of genes shared by tigr. Of the 56 Celera BVTs.. Li. 162 (1998). D. Halpern.L. 761 (1999). 22. 606 (2000). cerevisiae) tebrate genes. A. F. 290 (1993). Genet. Venter et al. the resulting 4388 match- es (for Ensembl) and 3915 matches (for Celera) formed the set of shared human-prokaryotic genes. Murphey. International Human Genome Sequencing Consor- 89 genes in the initial Celera BVT set. H. 10. 9. G. Nature 399. 97. from the Ensembl and Celera human genome sets cases. 796 (2000). available at www. Trypanosoma brucei. Martin et al. 163 (1998). C. worm. 10.ensembl. All matches were collect- ed. L. which and the differences in the gene models have occurred in the distant past. P. 36. this Celera protein set. The merged set of proteins from all com- pleted prokaryotic genomes comprises 85.. K. States. humans and prokaryotes. nematode worm (C. 1906 8 JUNE 2001 VOL 292 SCIENCE www. 13. D. 10. After searching all human genes against the complete prokaryotic sets. possibility of observing events that may (www.) and NSF (IIS-9902923 to tree was generated from the alignment (variable regions and gaps excluded) with the neighbor. This nonvertebrate genomes. The human proteomes were searched against all proteins from all of these data sets with BlastP (24). major evolutionary groupings. Curr. joining algorithm implemented by Phylip (25) with a PAM-based distance matrix. If the original 135 Ensembl BVTs man proteome.tigr. and sequence IDs (gi for Genpept and sp for 26 March 2001. M. Science 284. genes were identified with iterative Blastp searches of a low-redundancy protein database and 26. Nelson. 19. Species names. 133 (1998). 8). Doolittle. O. the small sample of We screened the 70 BVTs against the new. Trends Cell Biol. F. G. E. W. R. melano- gaster) (23) were collected. Sci. Of the 47 best explanation. S. J. tionary rate variation.S. 17. lateral transfer between bacteria and human tium. 9. 860 (2001). Nature 396. M. or any com- bination of those organisms’ proteomes. Plasmodium yoelii.. Braun. great care needs to sons. Evol. This work was funded in part by grants from NIH aligned with clustalW. W. Doolittle. A. Adams et al. notated on unfinished sequences. F. L. gene names if available.A. 164 (1989). Genome Res. Published online 17 May 2001. 2012 (1998).L.544 proteins genes shared between the two sets. Science 287. parasites. The evidence presented here provides 5.sciencemag. S. Nature 402. 7. is a combination of evolu- into one. 359 (1999). Ensembl BVTs. and otherwise improved the data. 16. we find only 46 genes in the References and Notes 1. 323 (1999). the nonvertebrate lineages. Comparing the 47 Ensembl BVTs (19). extraordinary events such as sequence several eukaryotic parasites (Plasmodium vertebrate genomes. J. which appears to be a contaminant. 2124 (1999).L.sciencemag. D. K. Proc.. Natl. Hits with larger E-values were collected and used for subsequent analyses. J. A. 8. and gene loss in er proteome and found that 23 genes had reduces the Ensembl BVT set to 41 genes. The complete sets of 31. M5 (1999). The C.E. W.. E. There were also set. 2185 (2000). able protein sequences from the ongoing projects to these match all four of the complete non. These databases were then compared with one another to determine the genes common to humans and prokaryotes but not found in fruit fly. and KDI-9980088 to S. other set. As with were the basis for the analyses of the human genome (3. 266 (1993). Nature 393. the sequences do not match exactly. The argument for lateral gene Bull. cgi/content/full/1061036/DC1.S. 1304 (2001). Homologs of the human HAS 25. are screened against the newer release. thaliana) (22). Natvig. Goffeau et al. F. and 41 in the Ensembl 2. Genome Res. been eliminated. Rubin et al. Science 282. Molecular Evolution (Sinauer Associates. Nature Genet. Science 287.. In cases weed ( . Similarly. Sunderland. U. Mol. and fruit fly (D. more interesting. Fig. Acad. collapsed multiple genes and one short (115 amino acid) protein falls nonvertebrates. on an 825– base pair unmapped contig. Include this information when citing this paper. but missing in contaminants). Science 274.. 196. 20. Nelson. the genes shared by humans and each of the other four organisms or groups of organisms were collected. set is reduced to 89 genes. J. 18. and Theileria parva. mustard genes match an Ensembl protein that in turn explore alternative hypotheses. A. Eisen. anisms exist. J.). T. 546 (1996). W. six of where equally if not more plausible mech. This reduces the horizontal gene transfer do not provide the falciparum. 18. Palmer et al.S. including preliminary genes an- Celera BVT set to 46 genes. W. Cladistics Eisen. J. 8. 15. The complete proteomes of yeast (S. mustard weed. E. were used for all human sequence compari- sometimes yield further matches to

