Vous êtes sur la page 1sur 11

The Modern View of Bacterial Genome Dynamics Genome Dynamics and Environmental Adaptation in Bacteria

Eric Alm
Horizontal Gene Transfer is rampant Closely related strains harbor lots of newly acquired DNA HGT is a key mechanism for niche adaptation native genes are insulated from dynamics at the periphery of networks

Depts. Of Biological Eng. And Civil and Environmental Eng., MIT Broad Institute of MIT and Harvard

Uptake of Foreign DNA


Transformation Phage Conjugation Genomic islands as reservoirs of new DNA

How Common Is It?


Marine isolates of co-existing microdiversity

Thompson et al., Science 2005

Large variation in genome size among closely related strains


Colemann et al., Science 2006

Genome Dynamics (HGT) at the Periphery

Pal, Papp & Lercher, Nature Genetics, 2005. *horizontal gene transfer into the E. coli lineage since its split from the Vibrio lineage.

From: Lerat et al. (2005) PLoS Biol 3(5): e130

Responses to Natural Selection


Environment Environment

Responses to Natural Selection


Environment Environment

HGT
Novel genes retained in genome

HGT
Novel genes retained in genome

Native genes

gene evolutionary rate variation

Comparing Rates Across Species


Microarray analogy: genomes as natural experiments on genes Genomes/Experiments

The System: Gammaproteobacteria


Intracellular parasites Enterobacteria Human pathogens Marine heterotrophs Soil bacteria Plant associated

Motoo Kimura: Some Principles of Molecular Evolution (1974)

Gene s

1. 2. 3. 4. 5.

The rate of protein evolution follows a molecular clock Less important proteins evolve faster Conservative substitutions occur more frequently than disruptive ones Gene duplication allows emergence of new functions Positive Darwinian selection is less common than drift or purifying selection

=
(substitutions / site) (substitutions / site / billion years)

x
(billions of years)

Evolutionary distance = rate X time

Evolutionary Distance =
r t = (gene family) (genome) (gene,genome) t

Overview of the Method


Seed possible orthologs: Single copy ubiquitous COGs

(gene family)

- Principle # 2 & 3: More important proteins evolve slowly. (e.g. Ribosome)

Align and build trees

~1000 gene families

(genome)

- Principle #1: Molecular clock. Rate of change depends on mutation rate, population size, etc.

Compare to species phylogeny

(gene,genome)

- Principle #5: Positive or negative selection?


Read out terminal branch lengths

Normalize against family rate and molecular clock

744 gene families


KH-test

Reject outliers

Evolutionary Distance =
r t = (gene family) (genome) (gene,genome) t

Protein Family and Molecular Clock Explain Most Distance Variation


Observed branch length log2(rt)
5 0

-5

-10

-10

-5

Predicted branch length log2(t)

Residual variation is an estimate of

What Can We Learn From Residual Variation?


Noise? Environment-specific selective pressures
Positive selection Negative selection Relaxed negative selection
Outgroup

Negative Selection
FAST Lost ?

Fisher's exact test: Odds Ratio = 3.1, P = 2.4e-7

Fast: > 4.0 Slow: < 0.25

Similar patterns in similar genes?

Odds Ratio = 0.55, P = 0.01

Selective Sweeps

Positive Selection?

P=0.05

Hypergeometric test for enrichment of COG functions in fast/slow (top 10% of genes)
Species E. coli Photorhabdus V. parahaemolyticus B. aphid. APS Wigglesworthia H. ducreyi V. vulnificus Yersinia pestis Idiomarina loihi. Xylella fastidiosa Idiomarina loihi. Photo. profundum COG Function Motility & Secretion Motility & Secretion Amino acid metabolism Ion Transport & metabolism Coenzyme transport Cell Division Nucleic acid metabolism Motility & Secretion Carbohydrate metabolism Energy production Amino acid metabolism Cell Division Enrichment fast fast slow slow slow fast slow fast fast slow fast fast Bonferroni P-value <0.001 <0.001 <0.01 <0.01 <0.01 <0.05 <0.05 <0.05 <0.05 <0.05 <0.05 <0.05

Hypergeometric test for enrichment of COG functions in fast/slow (top 10% of genes)
Species E. coli Photorhabdus V. parahaemolyticus B. aphid. APS Wigglesworthia H. ducreyi V. vulnificus Yersinia pestis Idiomarina loihi. Xylella fastidiosa Idiomarina loihi. Photo. profundum COG Function Motility & Secretion Motility & Secretion Amino acid metabolism Ion Transport & metabolism Coenzyme transport Cell Division Nucleic acid metabolism Motility & Secretion Carbohydrate metabolism Energy production Amino acid metabolism Cell Division Enrichment fast fast slow slow slow fast slow fast fast slow fast fast Bonferroni P-value <0.001 <0.001 <0.01 <0.01 <0.01 <0.05 <0.05 <0.05 <0.05 <0.05 <0.05 <0.05

Metabolism of Idiomarina

Lost: sugar transporters, transaldolase, G6PD, tpi pfk pgi

Hypergeometric test for enrichment of COG functions in fast/slow (top 10% of genes)
Species E. coli Photorhabdus COG Function Motility & Secretion Motility & Secretion Amino acid metabolism Ion Transport & metabolism Coenzyme transport Cell Division Nucleic acid metabolism Motility & Secretion Carbohydrate metabolism Energy production Amino acid metabolism Cell Division Enrichment fast fast slow slow slow fast slow fast fast slow fast fast Bonferroni P-value <0.001 <0.001 <0.01 <0.01 <0.01 <0.05 <0.05 <0.05 <0.05 <0.05 <0.05 <0.05

eno

V. parahaemolyticus B. aphid. APS Wigglesworthia H. ducreyi V. vulnificus Yersinia pestis Idiomarina loihi. Xylella fastidiosa Idiomarina loihi. Photo. profundum

Hou, Shaobin et al. (2004) Proc. Natl. Acad. Sci. USA 101, 18036-18041

COG 1377 1684 3418 4787 1261 3190 4967 1815 1345 1516 4786 1706 1677 2805 4969 1989

Name FlhB FliR FlgN FlgF FlgA FliO PilV FlgB FliD FliS FlgG FlgI FliE PilT PilA PulO

E. coli * * * * * * * * * *

Photor. * * * * *

Yersinia Flagellar biosynthesis pathway Flagellar biosynthesis pathway * Flagellar biosynthesis/type III secretory pathway chaperone Flagellar basal body rod protein Flagellar basal body P-ring biosynthesis protein Flagellar biosynthesis protein * Tfp pilus assembly protein Flagellar basal body protein Flagellar capping protein

Analysis of Patterns of Selection


Genomes/Experiments

* * * * * * *

Flagellar basal body rod protein Flagellar basal body P-ring protein Flagellar hook basal body protein Tfp pilus assembly protein, pilus retraction ATPase Tfp pilus assembly protein, major pilin Type II secretory pathway, prepilin signal peptidase PulO and related peptidases

Do correlations in between rows (genes) indicate similar functional roles?

Selection Acts Coherently Across Pathways/Functions

Analysis of Patterns of Selection


Genomes/Experiments

Do correlations in between columns (genomes) indicate similar ecology?

Evolution of Evolutionary Rates

No Correlation With Phylogeny Over Shorter Timespans

Correlation of across all genes (orthologs) for each pair of genomes Deep-branching clades show significant correlation in genome-wide selective patterns

Gene s

Gene s

Flagellin-specific chaperone

Responses to Natural Selection


Environment Environment

A critique of the adaptionist programme


Gould & Lewontin, 1979

HGT
Novel genes retained in genome

Front legs a puzzle: how Tyrannosaurus used its tiny front legs is a scientific puzzle; they were too short even to reach the mouth. They may have been used to help the animal rise from a lying position.
- Explanatory information, Museum of Science, Boston c. 1979

Native genes

gene evolutionary rate variation

Direct vs. Indirect Selection


Environment Environment

Gene Content Influences Selection on Genes?


Test
v X g-c

(partial) Spearman corr.


0.44 0.30 0.19 0.34 0.41 0.09 0.11

P-value
<2.2e-16 2.4e-10 4.8e-5 <0.0001 <0.0001 ns 0.02

HGT

Direct selection Novel genes retained in genome

Direct selection

v X dist
QuickTime and a TIFF (Uncompressed) decompressor are needed to see this picture.

v X time (v X g-c | dist) (v X g-c | time)

Indirect selection

(v X dist | g-c) (v X time | g-c)

gene evolutionary rate variation

Summary of Rate Variation


Variation in evolutionary rates provides can inform studies of natural selection Co-evolution of lineage-specific rates may imply similar function What is the environment of a gene?
Direct vs. Indirect selection

Inferring Genome Dynamics

Reconciliation - detail evolutionary events that explain discrepancies between gene and species phylogenies Possible events: -Horizontal gene transfer -Gene loss -Gene duplication

Background: The DownPass Algorithm in Phylogenetic Inference

A Downpass Algorithm for Reconciliation

What information is passed from leaves to parents?


sequence and score

Reconciliation proceeds by labeling each node in gene tree as HGT, Dup, or Speciation (loss is implied) Pass LCA (and score) of each subtree from leaves to root
5 2 4 4 1 4 4 4 2

5 4 1 4 4

2 3 Species

1 2 4 Gene

The Algorithm
3 1 3

The Algorithm

3 3 3 0 Downpass species tree

Species tree

Species tree

1 1 2 Gene tree Calculate optimal scenario resulting in each possible LCA

1 1 2 Gene tree Calculate optimal scenario resulting in each possible LCA

The Algorithm
For all LCAs at parent: For all LCAs at left child: For all LCAs at right child:

Real Data
O(ngns3)
COG100: 30S ribosomal subunit protein S11

Species tree

1 2 Gene tree Calculate optimal scenario resulting in each possible LCA

Species

Species

Gene
32 transfers!!

Uncertainty in Gene Trees

Love the Bootstrap

Even with a good species phylogeny, gene trees may have significant uncertainty Bootstrap trees are a convenient but very limited sample of different topologies Consensus trees discard information

Dont fear the bootstrap embrace it! Reconcile ALL bootstraps:


For each subtree reconciliation, check other bootstraps for more efficient reconciliation

The Idea

The Idea

The Idea

Reconciliation meets construction

Reconciliation as a tool for tree construction


Incorporation of bootstrap subtrees explores a very large region of plausible tree space

Constructed tree is most parsimonious, plausible gene tree

The Algorithm

The Algorithm

Each internal node of each bootstrap has three potential parents For each node, three tables of potential LCAs must be maintained
1. Reconcile children

Bootstrap trees

The Algorithm

The Algorithm

2. Reconcile same node in bootstrap trees

2. Reconcile same node in bootstrap trees

3. Return best answer and merge tables

1. Reconcile children

Bootstrap trees

1. Reconcile children

Bootstrap trees

The Algorithm

It Gets Messy

4. Return table to parent


3. Return best answer and merge tables

2. Reconcile same node in bootstrap trees

link subtrees across bootstraps Find path through all bootstrap trees optimizing reconciliation
After all subtrees reconciled, select best reconciliation to represent linked subtrees.

1. Reconcile children

4 1 4 4

Bootstrap trees

different entries in the same table can have different subtree topologies!

Rooting trees is easy!

Real Data Revisited!


COG100: 30S ribosomal subunit protein S11

Iterate through all branches Root at branch with best reconciliation

Reconciliation Species
7 transfers

Reconciliation events

Summary

Acknowledgements
Jesse Shapiro (Evolutionary rates) Lawrence David (Reconciliation) Sonia Timberlake (Evolution of regulation) Sean Clarke (HGT in the laboratory) Arne materna (Experimental evolution)

Possible to reconcile gene and species trees efficiently Uncertainty in gene trees can hamper reconciliation Use bootstraps to sample reasonable subsets of tree space Are there 7 transfers for COG100?
Wrong species phylogeny Need more bootstraps Gold-standard? All metabolic genes Co-evolution among genes with similar function?

Next steps?

Vous aimerez peut-être aussi