Académique Documents
Professionnel Documents
Culture Documents
Eric Alm
Horizontal Gene Transfer is rampant Closely related strains harbor lots of newly acquired DNA HGT is a key mechanism for niche adaptation native genes are insulated from dynamics at the periphery of networks
Depts. Of Biological Eng. And Civil and Environmental Eng., MIT Broad Institute of MIT and Harvard
Pal, Papp & Lercher, Nature Genetics, 2005. *horizontal gene transfer into the E. coli lineage since its split from the Vibrio lineage.
HGT
Novel genes retained in genome
HGT
Novel genes retained in genome
Native genes
Gene s
1. 2. 3. 4. 5.
The rate of protein evolution follows a molecular clock Less important proteins evolve faster Conservative substitutions occur more frequently than disruptive ones Gene duplication allows emergence of new functions Positive Darwinian selection is less common than drift or purifying selection
=
(substitutions / site) (substitutions / site / billion years)
x
(billions of years)
Evolutionary Distance =
r t = (gene family) (genome) (gene,genome) t
(gene family)
(genome)
- Principle #1: Molecular clock. Rate of change depends on mutation rate, population size, etc.
(gene,genome)
Reject outliers
Evolutionary Distance =
r t = (gene family) (genome) (gene,genome) t
-5
-10
-10
-5
Negative Selection
FAST Lost ?
Selective Sweeps
Positive Selection?
P=0.05
Hypergeometric test for enrichment of COG functions in fast/slow (top 10% of genes)
Species E. coli Photorhabdus V. parahaemolyticus B. aphid. APS Wigglesworthia H. ducreyi V. vulnificus Yersinia pestis Idiomarina loihi. Xylella fastidiosa Idiomarina loihi. Photo. profundum COG Function Motility & Secretion Motility & Secretion Amino acid metabolism Ion Transport & metabolism Coenzyme transport Cell Division Nucleic acid metabolism Motility & Secretion Carbohydrate metabolism Energy production Amino acid metabolism Cell Division Enrichment fast fast slow slow slow fast slow fast fast slow fast fast Bonferroni P-value <0.001 <0.001 <0.01 <0.01 <0.01 <0.05 <0.05 <0.05 <0.05 <0.05 <0.05 <0.05
Hypergeometric test for enrichment of COG functions in fast/slow (top 10% of genes)
Species E. coli Photorhabdus V. parahaemolyticus B. aphid. APS Wigglesworthia H. ducreyi V. vulnificus Yersinia pestis Idiomarina loihi. Xylella fastidiosa Idiomarina loihi. Photo. profundum COG Function Motility & Secretion Motility & Secretion Amino acid metabolism Ion Transport & metabolism Coenzyme transport Cell Division Nucleic acid metabolism Motility & Secretion Carbohydrate metabolism Energy production Amino acid metabolism Cell Division Enrichment fast fast slow slow slow fast slow fast fast slow fast fast Bonferroni P-value <0.001 <0.001 <0.01 <0.01 <0.01 <0.05 <0.05 <0.05 <0.05 <0.05 <0.05 <0.05
Metabolism of Idiomarina
Hypergeometric test for enrichment of COG functions in fast/slow (top 10% of genes)
Species E. coli Photorhabdus COG Function Motility & Secretion Motility & Secretion Amino acid metabolism Ion Transport & metabolism Coenzyme transport Cell Division Nucleic acid metabolism Motility & Secretion Carbohydrate metabolism Energy production Amino acid metabolism Cell Division Enrichment fast fast slow slow slow fast slow fast fast slow fast fast Bonferroni P-value <0.001 <0.001 <0.01 <0.01 <0.01 <0.05 <0.05 <0.05 <0.05 <0.05 <0.05 <0.05
eno
V. parahaemolyticus B. aphid. APS Wigglesworthia H. ducreyi V. vulnificus Yersinia pestis Idiomarina loihi. Xylella fastidiosa Idiomarina loihi. Photo. profundum
Hou, Shaobin et al. (2004) Proc. Natl. Acad. Sci. USA 101, 18036-18041
COG 1377 1684 3418 4787 1261 3190 4967 1815 1345 1516 4786 1706 1677 2805 4969 1989
Name FlhB FliR FlgN FlgF FlgA FliO PilV FlgB FliD FliS FlgG FlgI FliE PilT PilA PulO
E. coli * * * * * * * * * *
Photor. * * * * *
Yersinia Flagellar biosynthesis pathway Flagellar biosynthesis pathway * Flagellar biosynthesis/type III secretory pathway chaperone Flagellar basal body rod protein Flagellar basal body P-ring biosynthesis protein Flagellar biosynthesis protein * Tfp pilus assembly protein Flagellar basal body protein Flagellar capping protein
* * * * * * *
Flagellar basal body rod protein Flagellar basal body P-ring protein Flagellar hook basal body protein Tfp pilus assembly protein, pilus retraction ATPase Tfp pilus assembly protein, major pilin Type II secretory pathway, prepilin signal peptidase PulO and related peptidases
Correlation of across all genes (orthologs) for each pair of genomes Deep-branching clades show significant correlation in genome-wide selective patterns
Gene s
Gene s
Flagellin-specific chaperone
HGT
Novel genes retained in genome
Front legs a puzzle: how Tyrannosaurus used its tiny front legs is a scientific puzzle; they were too short even to reach the mouth. They may have been used to help the animal rise from a lying position.
- Explanatory information, Museum of Science, Boston c. 1979
Native genes
P-value
<2.2e-16 2.4e-10 4.8e-5 <0.0001 <0.0001 ns 0.02
HGT
Direct selection
v X dist
QuickTime and a TIFF (Uncompressed) decompressor are needed to see this picture.
Indirect selection
Reconciliation - detail evolutionary events that explain discrepancies between gene and species phylogenies Possible events: -Horizontal gene transfer -Gene loss -Gene duplication
Reconciliation proceeds by labeling each node in gene tree as HGT, Dup, or Speciation (loss is implied) Pass LCA (and score) of each subtree from leaves to root
5 2 4 4 1 4 4 4 2
5 4 1 4 4
2 3 Species
1 2 4 Gene
The Algorithm
3 1 3
The Algorithm
Species tree
Species tree
The Algorithm
For all LCAs at parent: For all LCAs at left child: For all LCAs at right child:
Real Data
O(ngns3)
COG100: 30S ribosomal subunit protein S11
Species tree
Species
Species
Gene
32 transfers!!
Even with a good species phylogeny, gene trees may have significant uncertainty Bootstrap trees are a convenient but very limited sample of different topologies Consensus trees discard information
The Idea
The Idea
The Idea
The Algorithm
The Algorithm
Each internal node of each bootstrap has three potential parents For each node, three tables of potential LCAs must be maintained
1. Reconcile children
Bootstrap trees
The Algorithm
The Algorithm
1. Reconcile children
Bootstrap trees
1. Reconcile children
Bootstrap trees
The Algorithm
It Gets Messy
3. Return best answer and merge tables
link subtrees across bootstraps Find path through all bootstrap trees optimizing reconciliation
After all subtrees reconciled, select best reconciliation to represent linked subtrees.
1. Reconcile children
4 1 4 4
Bootstrap trees
different entries in the same table can have different subtree topologies!
Reconciliation Species
7 transfers
Reconciliation events
Summary
Acknowledgements
Jesse Shapiro (Evolutionary rates) Lawrence David (Reconciliation) Sonia Timberlake (Evolution of regulation) Sean Clarke (HGT in the laboratory) Arne materna (Experimental evolution)
Possible to reconcile gene and species trees efficiently Uncertainty in gene trees can hamper reconciliation Use bootstraps to sample reasonable subsets of tree space Are there 7 transfers for COG100?
Wrong species phylogeny Need more bootstraps Gold-standard? All metabolic genes Co-evolution among genes with similar function?
Next steps?