Vous êtes sur la page 1sur 47

Introduc*on

to phylogene*cs

Maria Anisimova
Applied Computa.onal Genomics Team ACGT
Zrich University of Applied sciences
Swiss Ins.tute of Bioinforma.cs


September 2016, Hanoi
Evolu*on of
millions
of species

Described by
the Tree of Life?

History of phylogene*c inference

From anatomical observa.ons Aristotle (382-322 BC) found:
dolphins & whales are close to mammals and not sh
hQp://www.sciencedaily.com/releases/2005/02/050205103109.htm

Figure based on:


Boisserie et al 2005 PNAS
History of phylogene*c inference

Recent support from fossil data:


1579 drawing of
the great chain of being
from Didacus Valades,
Rhetorica Chris.ana
Binomial classica*on, Carl Linneus (1735)

It is not pleasing to me that I must place


humans among the primates, but man is
in*mately familiar with himself. Let's not
quibble over words. It will be the same to me
whatever name is applied. But I desperately
seek from you and from the whole world a
general dierence between men and simians
from the principles of Natural History. I certainly
know of none. If only someone might tell me
one! If I called man a simian or vice versa I
would bring together all the theologians against
me. Perhaps I ought to, in accordance with the
law of the discipline [of Natural History].
Descent with modica*on

Charles Darwin (1859)


On the origin of species by natural selec5on

over genera.ons, natural selec.on promotes structural


varia.on in the organisms form or behavior
these varia.ons accumulate to change a species over .me
popula.ons within a species tend to become dierent
from one another
structural change eventually produces new species
several species can arise over .me from a single ancestral
species
a new genus can evolve from a line of new species
ex.nc.on is a natural part of the evolu.onary process
all species are related to one another
clusters of similar species can form because they have a
common origin
Phylogeny became the domina*ng model

Figures from hQp://www.visionlearning.com/library/module_viewer.php?mid=112



Rise of cladis*cs
E.C. Zimmerman (30s) and W. Hennig (50s):
objec.ve measures for reconstruc.ng cladograms based on the analysis of
shared morphological ancestral characteris.cs of fossils & living organisms

From Carlson 1999


Discovery of DNA

Friedrich Miescher
dcouvre l'AND
en 1869

Rosalind
Franklin
ob.ens
lvidence
pour la
structure
dADN.
Francis Crick & James Watson, 1953
First molecular data & molecular clock

The origin of molecular clock hypothesis:



Dissimilarity in protein ngerprints was approx.
propor.onal to the phylogene.c distance between species
From genes to phenotype

How does genotype

shape phenotype?
Homologous genes

Molecular data allows cross-species comparisons


Evolu.on can be studied on molecular level

Molecular change: point muta.ons, indels, etc.

Sequence similarity: common ancestry, similar
func.on?

Sequence homologs: originate from a common
ancestral sequence by mol. changes over .me
Orthology vs paralogy
Sequence alignment before trees
Species trees from mul*ple genes
Problems with the Tree of Life

DooliQle (2000) Uproo.ng the tree of Life, Scien5c American


Xenology
Homology via Lateral Gene Transfer (LGT):
novel gene acquisi.on
orthologous gene replacement
Which homologs?

LGT Dieren.al gene loss


Phylogenies explain gene history
Gene trees vs species trees
Trees es.mated from individual genes may dier from the species tree due
to es.ma.on errors, horizontal gene transfers, or use of paralogous
sequences.
In closely related species, ancestral polymorphism (or lineage sor.ng) can
also cause such conicts. Sequences from mul.ple neutral loci can be used
to es.mate ancestral popula.on sizes.

Slide from Ziheng Yang


HCG
tHCG

HC
tHC

1 2 3 4 132 4 H C G
Takahata, et al. 1995. Theor. Popul. Biol. 48:198-221
Yang 2002. Gene5cs 162:1811-1823
Rannala & Yang 2003. Gene5cs 164:1645-1656
Burgess, R. and Z. Yang. 2008 Mol. Biol. Evol. 25: 1979-1994
How about Forest of Life?

Figure from Puigbo, Wolf, Koonin (2009) J. Biol


Tree representa*ons
Figures from Yang (2006) Computa.onal Molecular Evolu.on, OUP

E
E E
D D D
C C
A A A
B 0.1 B 0.1
C B
(a) cladogram (b) phylogram (c) unrooted tree

(a) : ((((A,B),C),D),E)
(b) : ((((A: 0.1,B:0.2):0.12,C:0.3):0.123,D:0.4):0.1234,E:0.5)
(c) : (((A: 0.1,B:0.2):0.12,C:0.3):0.123,D:0.4,E:0.6234)

Visualiza.on sovware:
TreeViewX, Forester ATV, FigTree, ITOL (itol.embl.de), Dendroscope
Binary vs. mul*furca*ng trees
polytomy

Figures from Yang (2006) Computa.onal Molecular Evolu.on, OUP


star tree partially-resolved fully-resolved

polytomy

star tree partially-resolved fully-resolved

hard polytomy: simultaneous specia.on events


so` polytomy (in es.mated trees): lack of resolu.on in the data
Rooted trees

Which species is taxon A


most closely related to?

Most methods
cannot infer the root:
they do not dis.nguish
the direc.on of change
(parsimony or model-based
approaches with reversibility)
gure by Caro-Beth Stewart
Rearrangements that leave tree intact

gure by Caro-Beth Stewart


Tree representa*ons: exercise

Write down this tree as a NEWICK string


Tree representa*ons: exercise

Correct answer: ((G,E),((C,((A,K,B),F)),((D,H),M)));


Spot the dierence

A B
gorilla human
human comm chimp
comm chimp pigmy chimp
pigmy chimp gorilla
Sumatran orangutan Sumatran orangutan
bornean orangutan bornean orangutan
gibbon gibbon
baboon baboon
cow cow
pig pig
0.1 0.1

cow
pig
baboon

C gibbon
Sumatran orangutan
bornean orangutan
gorilla
human
comm chimp
pigmy chimp
0.1 gure by Ziheng Yang
Molecular clock roo*ng

Under the clock, every .p is equidistant from the root


If clock holds the root can be inferred from sequence data

O O

gure by Ziheng Yang


b1 b2 b b

A B A B

(a) no clock (b) clock

When clock does not hold one needs a nonreversible model


to es.mate b1 and b2 separately
Outgroup roo*ng
The root is placed on the branch leading to the outgroup

Root of the mammalian tree

gure by Ziheng Yang


Roo*ng using gene duplica*ons

Root of the universal


tree of life

Gogarten, J. P., et al. 1989. P.N.A.S


Iwabe, et al. 1989. P.N.A.S
Applica*ons of phylogenies

Reconstruct molecular history


Study ancient proteins (ancestral reconstruc.on)
Molecular da.ng of specia.on events
Study change of gene func.on
Find molecular changes that cause disease
Study host pathogen dynamics
Choose model organism for drug design
Distribu.on and cohabita.on in metagenomics
Diversity of birds (9993 species)

Jetz et al. 2012 (Nature)


Morgan, Xochitl et al. 2013 (Trends in Gene.cs)
Criteria for a publishable phylogenomic study


Strong biological mo.va.on
Jus.ca.on for methods choice
Use alterna.ve methodologies
Account for uncertainty and data ltering
Reproducibility and data/code sharing
Reviews of the state-of the art
From genome assembly
and gene predic.on

New experiments
generate data
to test new
hypotheses

to popula.on genomics, omics and


aspects of data sharing and representa.on
Viewing phylogene.c trees
hQps://en.wikipedia.org/wiki/List_of_phylogene.c_tree_visualiza.on_sovware

FigTree
TreeView
Archaeopteryx
Dendroscope
DensiTree
ETE toolkit (Python)
iTOL (online)
ggtree (R/Bioconductor)
Bio::Phylo (Perl)
Phylo.io

Exercise: View Tree with FigTree

1. Open tree le: VirusTree.tree


2. Adjust layout, fonts, branch thickness, etc.
3. Display branch lengths and branch supports
4. Color .ps/branches by host (or highlight clades)
5. Re-root using whale virus as outgroup
View and compare phylogene.c trees
on the web - hcp://phylo.io

COMPARE mode
VIEW mode
Basic features - hcp://phylo.io
1
2

3 3

1
4

4
Basic features - hcp://phylo.io
1

Vous aimerez peut-être aussi