Vous êtes sur la page 1sur 79

Phylogenetics

and
phylogeography
10 March 2020
Reading: Rowe, Sweet & Beebee,
Chapter 9
Phylogenetics + geography = phylogeography
Phylogeography
Understanding the processes that shaped the current
geographical distribution of alleles and populations

http://www.insectsingers.com
Tree of life

Delsuc et al. (2005) Nature Rev. Genet. 16: 361-375


Phylogenetics
Reconstruction of evolutionary history by tracing phylogenetic relationships of species, or
other units of interest, e.g. populations

Phylogenetic trees
https://www.cs.us.es/~fran/students/julian/phylogenetics/phylogenetics.html
Phylogenetics

Reconstruction of evolutionary history by tracing phylogenetic relationships

… based on:

o morphology

o or DNA
Phenotypical traits
Homology

Homology is similarity that results


from inheritance from a common
ancestor

Identification of homologies is
essential to phylogenetics

https://en.wikipedia.org/wiki/Homology_(biology)
Homology
Homology is similarity that results Homologous genes
from inheritance from a common
ancestor

http://evolution.berkeley.edu/evolibrary/article/1_0_0/eyes_10
Molecular evolution and phylogenetics

1. Molecular evolution
studying the rates and patterns of changes in the DNA and its
products (RNA or protein) during evolutionary time

2. Molecular phylogenetics
reconstruction of the evolutionary history of organisms as inferred
from molecular data and the methodology of tree construction
Molecular evolution
Kinds of molecular evolution

Single base substitutions

Insertions/deletions/transpositions/inversions/duplications

Gene structure
Molecular evolution
Single base mutations and substitutions
…what is the difference?

Mutations: random changes in nucleotide sequences that


occur due to mistakes in replication or repair processes

Substitutions: (single nucleotide) mutations that have passed


through the filter of selection on at least some level

Note: germ line vs somatic mutations


Molecular evolution
Patterns of substitution within genes

The relative frequency of mutations: Subject to:

1. Deleterious (disadvantageous) negative selection

2. Beneficial (advantageous) positive selection

3. Neutral (little or no effect on the fitness of an organism) genetic drift


Molecular evolution
Patterns of substitution within genes

Functional constraints
Changes to genes that are disadvantageous are removed by
negative natural selection

It follows that portions of genes that are most functionally


important will accumulate changes more slowly during
evolutionary time
Molecular evolution

5’ UTR 3’ UTR

Evolutionary rates:
Introns and flanking sequences > regions transcribed but not
translated > coding sequences
Molecular evolution
Different types of genomic DNA will evolve differently due to differing
functional constraints

Coding DNA

Non coding DNA


Repeat DNA
Non functional DNA
Transposons
Pseudogenes (~ 4 substitutions per site per million year)

Regulatory DNA
Molecular evolution
Coding sequences

Triplet codon positions can be put in three categories:

1. Non degenerate

2. Twofold degenerate

3. Fourfold degenerate
Molecular evolution
Coding sequences

Triplet codon positions can be put in three categories:

1. Non degenerate

Codon position where mutation will always result in amino


acid change. E.g. UUU phenylalanine, CUU leucine, AUU
isoleucine, GUU valine
Molecular evolution
Coding sequences

Triplet codon positions can be put in three categories:

2. Twofold degenerate

Codon position where one or two different substitutions will


result in amino acid change, while one or two other
substitutions will code for the same amino acid.
E.g. GAU and GAC aspartic acid; GAA and GAG glutamic acid.
Molecular evolution
Coding sequences

Triplet codon positions can be put in three categories:

3. Fourfold degenerate

Codon position where changing the nucleotide to any of the


three alternatives will have no effect on the amino acid.
E.g. third position of glycine: GGU, GGC, GGA, GGG, glycine.
Molecular evolution
Genetic or evolutionary distance

The observed number of (single nucleotide) substitutions


between any two sequences in an alignment is typically the
most important variable in any molecular evolution analysis
Molecular clocks
K: number of nucleotide
substitutions two
sequences have
experienced since they
shared a last common
ancestor

T: divergence time

r: rate of nucleotide
substitution

http://www.evolution.berkeley.edu/evosite/evo101/IIE1cMolecularclocks.shtml r = K/2T
Molecular clocks

http://www.bio.miami.edu/
Molecular clocks
o Functional constraints
make molecular clocks
tick at different rates
for different proteins

o The rate differs


between different
organismal lineages

o The rate can speed up


or slow down over
evolutionary time for
any given gene

http://sandwalk.blogspot.no/2012/01/modern-molecular-clock.html
Molecular evolution
Models of molecular evolution

It is not only about the number of


substitutions, but also the nature
or pattern of the substitutions. This
leads us to looking at
models of molecular evolution

http://www.bio.miami.edu/
Molecular evolution
Uncorrected model of molecular evolution
p distance
the proportion (p) of nucleotide sites at which two sequences
being compared are different. It is simply obtained by dividing
the number of nucleotide differences by the total number of
nucleotides compared.

Seq 1: TAGTTGCAAC p = 2/10


Seq 2: TAGTCGCAGC
Molecular evolution
Corrected models of molecular evolution

These types of distance measure takes into consideration at


least one of the following:
• multiple substitutions at the same site
• substitution rate biases (for instance between transitions
and transversions)
• base frequencies
• differences in substitution rate between sites
Molecular evolution
Corrected models of molecular evolution

Ancestral sequence
TGCCAGCTTAGCCA
Molecular evolution
Corrected models of molecular evolution

Ancestral sequence
TGCCAGCTTAGCCA

Sp 1 TGCTAGCTTAACCA Sp 2 TGTCAACTTAACCT
Molecular evolution
Corrected models of molecular evolution

Ancestral sequence
TGCCAGCTTAGCCA

Sp 1 TGCTAGCTTAACCA Sp 2 TGTCAACTTAACCT
Molecular evolution
Corrected models of molecular evolution

Ancestral sequence
TGCCAGCTTAGCCA

Sp 1 TGCTAGCTTAACCA Sp 2 TGTCAACTTAGTCT
Molecular evolution
Corrected models of molecular evolution

Ancestral sequence
TGCCAGCTTAGCCA

Sp 1 TGCTAGCTTAACCA Sp 2 TGTCAACTTAGTCT
Differences: 2 4
Molecular evolution
Corrected models of molecular evolution

Ancestral sequence
TGCCAGCTTAGCCA

Sp 1 TGCTAGCTTAACCA Sp 2 TGTCAACTTAGTCT
Differences: 2 4

However, multiple substitutions can take place at the same site


Molecular evolution
Corrected models of molecular evolution
Multiple substitutions at the same site
Ancestral sequence
TGCCAGCTTAGCCA

Sp 1 TGCTAGCTTAACCA Sp 2 TGTCAACTTAACCT
Differences: 2 4
Substitutions: 4 4
Number of substitutions
Sp 1: G → A → C → A
Molecular evolution
Corrected models of molecular evolution
Jukes-Cantor Model
3 4
d=− ln(1 − 𝑝)
4 3

Assumptions:
- equal base frequencies
- all substitutions are equally likely
(happen at rate α)
Molecular evolution
Corrected models of molecular evolution

Transitions: purine (A – G) – purine


pyrimidine (C – T) – pyrimidine

Transversions: purine - pyrimidine


Molecular evolution

Empirically:
Transitions are more frequent
than transversions

http://openi.nlm.nih.gov/
Molecular evolution
Corrected models of molecular evolution
Kimura’s two-parameter model (K2P)
1 1 p: proportion of sites that show transitional differences
d= − ln 1 − 2𝑝 − 𝑞 − ln(1 − 2𝑞) q: proportion of sites that show transversional differences
2 4

Assumptions:
- equal base frequencies
- transitions and transversions happen
at different rates (α and β respectively)
Molecular evolution

A multitude of models of molecular evolution

Jukes-Cantor 1969 (JC69)


Kimura’s two-parameter 1980 (K2P)
Felsenstein 1981 (F81)
Hasegawa, Kishino and Yano 1985 (HKY85)
Tamura 1992 (T92)
Tamura and Nei 1993 (TN93)
Tavaré 1986 Generalised time-reversible (GTR)
Molecular evolution
A multitude of models of molecular evolution
Jukes-Cantor 1969 (JC69)
Kimura’s two-parameter 1980 (K2P)
Felsenstein 1981 (F81)
e.g. Mega 7 Hasegawa, Kishino and Yano 1985 (HKY85)
Tamura 1992 (T92)
(or Mega X) Tamura and Nei 1993 (TN93)
Tavaré 1986 Generalised time-reversible (GTR)

How to choose a model?


Choose the simplest model that fit your data!
There are programs available for model choice (e.g. jModelTest)
Tracing the evolutionary history of mammals

Artiodactyla
Tracing the evolutionary history of mammals

Artiodactyla

Cetartiodactyla
Tracing the evolutionary history of mammals

Artiodactyla

Cetartiodactyla
Tracing the evolutionary history of mammals
Convergent evolution

https://online.science.psu.edu/biol011_sandbox_7239/node/7328
Molecular phylogenetics
Using DNA or protein sequences

Are we tracing the evolutionary history of:

o organisms

o or genes…
Molecular phylogenetics

Molecular phylogeny of
collembolean Acanthanura sp.
based on mt COI sequences

Emerson et al. 2011


Phylogenetic tree

A graphical representation of the


evolutionary relationships among
three or more genes or organisms
Phylogenetics
Cladograms
Phylogenetics
Cladograms vs phylograms

cladogram phylogram

Distance scale
http://carrot.mcb.uconn.edu/~olgazh/bioinf2010/class16.html
https://www.cs.us.es/~fran/students/julian/phylogenetics/phylogenetics.html
Phylogenetics
Rooted and unrooted trees
Species trees vs gene trees

Problems associated with


molecular phylogenetics due to

o Hybridization

o Lineage sorting

https://nothinginbiology.org
Phenotypical traits
Homology

Homology is similarity that results


from inheritance from a common
ancestor

Identification of homologies is
essential to phylogenetics

https://en.wikipedia.org/wiki/Homology_(biology)
Homology
Homology is similarity that results Homologous genes
from inheritance from a common
ancestor

http://evolution.berkeley.edu/evolibrary/article/1_0_0/eyes_10
Homology

Orthologs and paralogs

http://beacon-center.org/
Phylogenetics

Autapomorphy:
Derived, but not shared
(unique to a taxon)
Phylogenetics

Plesiomorphy:
Shared ancestral trait,
but not derived
Phylogenetics

Synapomorphy:
Shared and derived
Phylogenetics

Homoplasy:
independently derived
(convergence, secondary loss,
reversion)
Sequence alignments

Cod GCTATCGTAGCTTAATTAAAGTTTAATACTGAAGATATTAGGATGGACCCTAGAAAGTCCCGAAAGCA
Trout GTTGGCGTAGCTTAACTAAAGCATAACACTGAAGCTGTTAAGATGGACCCTAGAAAGTCCCGCGAGCA
Auk GTCTTCGTAGCTTACCGATTAAAGCATGGCACTGAAGATGCCAAGATGGCTGCCATTCATGCACCCGAAGACA
Crow GTCCATGTAGCTTACAACAAAGCATGACACTGAAGATGTCAAGACGGCTGCCACAAACACCCCATGGACA
Chimp GTTTATGTAGCTTACCCCCTCAAAGCAATACACTGAAAATGTTTCGACGGGTTTACATCACCCCATAAACA
Human GTTTATGTAGCTTACCTCCTCAAAGCAATACACTGAAAATGTTTAGACGGGCTCACATCACCCCATAAACA

Mitochondrial tRNA-Phe genes in:


two fishes, two birds, chimpanzee and human.
Sequence alignments

Cod GCTATCGTAGCTTAATTAAAGTTTAATACTGAAGATATTAGGATGGACCCTAGAAAGTCCCGAAAGCA
Trout GTTGGCGTAGCTTAACTAAAGCATAACACTGAAGCTGTTAAGATGGACCCTAGAAAGTCCCGCGAGCA
Auk GTCTTCGTAGCTTACCGATTAAAGCATGGCACTGAAGATGCCAAGATGGCTGCCATTCATGCACCCGAAGACA
Crow GTCCATGTAGCTTACAACAAAGCATGACACTGAAGATGTCAAGACGGCTGCCACAAACACCCCATGGACA
Chimp GTTTATGTAGCTTACCCCCTCAAAGCAATACACTGAAAATGTTTCGACGGGTTTACATCACCCCATAAACA
Human GTTTATGTAGCTTACCTCCTCAAAGCAATACACTGAAAATGTTTAGACGGGCTCACATCACCCCATAAACA
* ******** * *

Mitochondrial tRNA-Phe genes in:


two fishes, two birds, chimpanzee and human.
Sequence alignments

Cod GCTATCGTAGCTTA----ATTAAAGTTTAATACTGAAGATATTAGGATGGACCCTAGAAAGT--CCCGAAAGCA
Trout GTTGGCGTAGCTTA----ACTAAAGCATAACACTGAAGCTGTTAAGATGGACCCTAGAAAGT--CCCGCGAGCA
Auk GTCTTCGTAGCTTACCG-ATTAAAGCATGGCACTGAAGATGCCAAGATGGCTGCCATTCATGCACCCGAAGACA
Crow GTCCATGTAGCTTAC---AACAAAGCATGACACTGAAGATGTCAAGACGGCTGCCACAAACAC-CCCATGGACA
Chimp GTTTATGTAGCTTACCCCCTCAAAGCAATACACTGAAAATGTTTCGACGGGTTTACATCAC---CCCATAAACA
Human GTTTATGTAGCTTACCTCCTCAAAGCAATACACTGAAAATGTTTAGACGGGCTCACATCAC---CCCATAAACA

Nucleotide sequence alignment, produced by ClustalX between


mitochondrial tRNA-Phe genes in:
two fishes, two birds, chimpanzee and human.
Sequence alignments

Cod GCTATCGTAGCTTA----ATTAAAGTTTAATACTGAAGATATTAGGATGGACCCTAGAAAGT--CCCGAAAGCA
Trout GTTGGCGTAGCTTA----ACTAAAGCATAACACTGAAGCTGTTAAGATGGACCCTAGAAAGT--CCCGCGAGCA
Auk GTCTTCGTAGCTTACCG-ATTAAAGCATGGCACTGAAGATGCCAAGATGGCTGCCATTCATGCACCCGAAGACA
Crow GTCCATGTAGCTTAC---AACAAAGCATGACACTGAAGATGTCAAGACGGCTGCCACAAACAC-CCCATGGACA
Chimp GTTTATGTAGCTTACCCCCTCAAAGCAATACACTGAAAATGTTTCGACGGGTTTACATCAC---CCCATAAACA
Human GTTTATGTAGCTTACCTCCTCAAAGCAATACACTGAAAATGTTTAGACGGGCTCACATCAC---CCCATAAACA
* ******** **** ****** * ** ** * *** **

Nucleotide sequence alignment, produced by ClustalX between


mitochondrial tRNA-Phe genes in:
two fishes, two birds, chimpanzee and human.
Sequence alignments

Cod GCTATCGTAGCTTA----ATTAAAGTTTAATACTGAAGATATTAGGATGGACCCTAGAAAGT--CCCGAAAGCA
Trout .T.GG.........----.C.....CA...C.......C.G...A.................--....CG....
Auk .TCT..........CCG-.......CA.GGC.........GCC.A.....CTG.C.TTC.TGCA......GA..
Crow .TCCAT........C---.AC....CA.G.C.........G.C.A..C..CTG.C.C...CAC-...ATGGA..
Chimp .T.TAT........CCCCC.C....CAAT.C......A..G..TC..C..GTTTACATC.C---...AT..A..
Human .T.TAT........CCTCC.C....CAAT.C......A..G..TA..C..G.T.ACATC.C---...AT..A..
* ******** **** ****** * ** ** * *** **

Nucleotide sequence alignment, produced by ClustalX between


mitochondrial tRNA-Phe genes in:
two fishes, two birds, chimpanzee and human.
Sequence alignments

Sequence similarity

Human GTTTATGTAGCTTACCTCCTCAAAGCAATACACTGAAAATGTTTAGACGGGCTCACATCACCCCATAAACA
Chimp ****************C***************************C******T*T*****************

Human GTTTATGTAGCTTACCTCCTCAAAGCAATACACTGAAAATGTTTAGACGG-GCTCACATCACCCCATAAACA
Trout ***GGC********----ACT******TA********GC****A***T**AC*CT*G*AAGT***GCG*G**

Human/chimp: 94.4%

Human/trout: 63.2%
Sequence alignments

Sequence similarity vs. homology


DNA

Homo sapiens
Pan troglodytes
Corvus frugilegus
Alca torda
Salmo trutta
Gadus morhua
Sequence alignments

Homology
Orthology
Paralogy
DNA

Homo sapiens
Pan troglodytes
Corvus frugilegus
Alca torda
Salmo trutta
Gadus morhua
Sequence alignments

Sequence identity and sequence homology

Two protein sequences, 100 aa length

“Safe zone” > 30%

“Twilight zone” 20 – 30%

“Midnight zone” < 20%

protein
Sequence alignments

Sequence comparisons

Evolutionary inference

Structural and functional inferences


Sequence alignments and homologous positions
Cod GCTATCGTAGCTTA----ATTAAAGTTTAATACTGAAGATATTAGGATGGACCCTAGAAAGT--CCCGAAAGCA
Trout GTTGGCGTAGCTTA----ACTAAAGCATAACACTGAAGCTGTTAAGATGGACCCTAGAAAGT--CCCGCGAGCA
Auk GTCTTCGTAGCTTACCG-ATTAAAGCATGGCACTGAAGATGCCAAGATGGCTGCCATTCATGCACCCGAAGACA
Crow GTCCATGTAGCTTAC---AACAAAGCATGACACTGAAGATGTCAAGACGGCTGCCACAAACAC-CCCATGGACA
Chimp GTTTATGTAGCTTACCCCCTCAAAGCAATACACTGAAAATGTTTCGACGGGTTTACATCAC---CCCATAAACA
Human GTTTATGTAGCTTACCTCCTCAAAGCAATACACTGAAAATGTTTAGACGGGCTCACATCAC---CCCATAAACA
* ******** **** ****** * ** ** * *** **

Use secondary structure to guide alignments!

Ribosomal DNA

Protein coding genes


Tree building methods
Optimality criterion
Using the multiple alignment directly by comparing
characters within each column (each site) in the alignment
Maximum Parsimony (MP) looks for the tree with the minimum number of
changes

Maximum Likelihood (ML) looks for the tree that, under some model of
evolution, maximizes the likelihood of observing the data

Bayesian method (MB) is a more recent variant of ML, seeks the trees with
the greatest likelihood given the data
Tree building methods
Distance methods
UPGMA (Unweighted Pair-Group Method with Arithmetic Mean). Clustering
method. First finds the pair of taxa with the smallest distance between
them, then defines the branching between them by placing a node at the
midpoint of the branch

Minimal Evolution (ME) the sum (S) of all branch length estimates is
computed for all possible topologies, and the topology that has the smallest
S value is chosen as the best tree

Neighbour Joining (NJ) is based on the ME principle, but does not examine
all possible topologies
Tree building methods
Comparison of methods

Neither the distance- nor the character-based methods of


phylogenetic reconstruction make any guarantee that they yield
one true tree that describes the evolutionary history of a set of
aligned sequences

Results of simulations:
Data sets that allow one method to infer the correct
phylogenetic relationship generally work well with all currently
popular tree-building methods
Tree building methods
Comparison of methods
Neighbour Joining: fast – but some information is lost in compressing sequences
into distances

Parsimony: fast enough to run hundreds of sequences – but can perform poorly if
there is substantial variation in branch lengths

Maximum Likelihood: fully captures what the data tell us about the phylogeny
under a given model – but can be prohibitively slow

Bayesian: strong connection to the ML method and faster than ML – but the prior
distributions for parameters must be specified; it can be difficult to determine if
the Markov chain Monte Carlo (MCMC) approximation has run for long enough
Tree building methods
Comparison of methods

It is the number of sequences in the alignment that causes


computational problems, not their lengths
Tree building methods
Comparison of methods

There is no single best method!

General rule:
If a data set yields similar trees when analysed by fundamentally
different distance matrix, likelihood and parsimony methods, that
tree can be considered to be fairly reliable

Test significance: bootstrapping


Tree building methods
Guidelines

Choose a Obtain Is there a Yes Maximum


set of multiple strong parsimony
related sequence sequence methods
sequences alignment similarity?

No

Is there clearly Yes


recognizable Distance
sequence methods
similarity?

No Analyse how
Maximum well data
likelihood support
methods prediction
Literature

Graur, D. and Li, W.-H. (1999) Fundamentals of molecular evolution. 2nd edition. Sinauer Associates.

Hall, B.G. (2011) Phylogenetic trees made easy. A how-to manual. 4th edition. Sinauer Associates.

Page, R.D.M. and Holmes, E.C. (1998) Molecular evolution. A phylogenetic approach. Blackwell
Science.

Lemey, P., Salemi, M. and Vandamme, A.-M. (2009) The phylogenetic handbook. A practical
approach to phylogenetic analysis and hypothesis testing. 2nd editon. Cambridge University Press.

Vous aimerez peut-être aussi