Académique Documents
Professionnel Documents
Culture Documents
and
phylogeography
10 March 2020
Reading: Rowe, Sweet & Beebee,
Chapter 9
Phylogenetics + geography = phylogeography
Phylogeography
Understanding the processes that shaped the current
geographical distribution of alleles and populations
http://www.insectsingers.com
Tree of life
Phylogenetic trees
https://www.cs.us.es/~fran/students/julian/phylogenetics/phylogenetics.html
Phylogenetics
… based on:
o morphology
o or DNA
Phenotypical traits
Homology
Identification of homologies is
essential to phylogenetics
https://en.wikipedia.org/wiki/Homology_(biology)
Homology
Homology is similarity that results Homologous genes
from inheritance from a common
ancestor
http://evolution.berkeley.edu/evolibrary/article/1_0_0/eyes_10
Molecular evolution and phylogenetics
1. Molecular evolution
studying the rates and patterns of changes in the DNA and its
products (RNA or protein) during evolutionary time
2. Molecular phylogenetics
reconstruction of the evolutionary history of organisms as inferred
from molecular data and the methodology of tree construction
Molecular evolution
Kinds of molecular evolution
Insertions/deletions/transpositions/inversions/duplications
Gene structure
Molecular evolution
Single base mutations and substitutions
…what is the difference?
Functional constraints
Changes to genes that are disadvantageous are removed by
negative natural selection
5’ UTR 3’ UTR
Evolutionary rates:
Introns and flanking sequences > regions transcribed but not
translated > coding sequences
Molecular evolution
Different types of genomic DNA will evolve differently due to differing
functional constraints
Coding DNA
Regulatory DNA
Molecular evolution
Coding sequences
1. Non degenerate
2. Twofold degenerate
3. Fourfold degenerate
Molecular evolution
Coding sequences
1. Non degenerate
2. Twofold degenerate
3. Fourfold degenerate
T: divergence time
r: rate of nucleotide
substitution
http://www.evolution.berkeley.edu/evosite/evo101/IIE1cMolecularclocks.shtml r = K/2T
Molecular clocks
http://www.bio.miami.edu/
Molecular clocks
o Functional constraints
make molecular clocks
tick at different rates
for different proteins
http://sandwalk.blogspot.no/2012/01/modern-molecular-clock.html
Molecular evolution
Models of molecular evolution
http://www.bio.miami.edu/
Molecular evolution
Uncorrected model of molecular evolution
p distance
the proportion (p) of nucleotide sites at which two sequences
being compared are different. It is simply obtained by dividing
the number of nucleotide differences by the total number of
nucleotides compared.
Ancestral sequence
TGCCAGCTTAGCCA
Molecular evolution
Corrected models of molecular evolution
Ancestral sequence
TGCCAGCTTAGCCA
Sp 1 TGCTAGCTTAACCA Sp 2 TGTCAACTTAACCT
Molecular evolution
Corrected models of molecular evolution
Ancestral sequence
TGCCAGCTTAGCCA
Sp 1 TGCTAGCTTAACCA Sp 2 TGTCAACTTAACCT
Molecular evolution
Corrected models of molecular evolution
Ancestral sequence
TGCCAGCTTAGCCA
Sp 1 TGCTAGCTTAACCA Sp 2 TGTCAACTTAGTCT
Molecular evolution
Corrected models of molecular evolution
Ancestral sequence
TGCCAGCTTAGCCA
Sp 1 TGCTAGCTTAACCA Sp 2 TGTCAACTTAGTCT
Differences: 2 4
Molecular evolution
Corrected models of molecular evolution
Ancestral sequence
TGCCAGCTTAGCCA
Sp 1 TGCTAGCTTAACCA Sp 2 TGTCAACTTAGTCT
Differences: 2 4
Sp 1 TGCTAGCTTAACCA Sp 2 TGTCAACTTAACCT
Differences: 2 4
Substitutions: 4 4
Number of substitutions
Sp 1: G → A → C → A
Molecular evolution
Corrected models of molecular evolution
Jukes-Cantor Model
3 4
d=− ln(1 − 𝑝)
4 3
Assumptions:
- equal base frequencies
- all substitutions are equally likely
(happen at rate α)
Molecular evolution
Corrected models of molecular evolution
Empirically:
Transitions are more frequent
than transversions
http://openi.nlm.nih.gov/
Molecular evolution
Corrected models of molecular evolution
Kimura’s two-parameter model (K2P)
1 1 p: proportion of sites that show transitional differences
d= − ln 1 − 2𝑝 − 𝑞 − ln(1 − 2𝑞) q: proportion of sites that show transversional differences
2 4
Assumptions:
- equal base frequencies
- transitions and transversions happen
at different rates (α and β respectively)
Molecular evolution
Artiodactyla
Tracing the evolutionary history of mammals
Artiodactyla
Cetartiodactyla
Tracing the evolutionary history of mammals
Artiodactyla
Cetartiodactyla
Tracing the evolutionary history of mammals
Convergent evolution
https://online.science.psu.edu/biol011_sandbox_7239/node/7328
Molecular phylogenetics
Using DNA or protein sequences
o organisms
o or genes…
Molecular phylogenetics
Molecular phylogeny of
collembolean Acanthanura sp.
based on mt COI sequences
cladogram phylogram
Distance scale
http://carrot.mcb.uconn.edu/~olgazh/bioinf2010/class16.html
https://www.cs.us.es/~fran/students/julian/phylogenetics/phylogenetics.html
Phylogenetics
Rooted and unrooted trees
Species trees vs gene trees
o Hybridization
o Lineage sorting
https://nothinginbiology.org
Phenotypical traits
Homology
Identification of homologies is
essential to phylogenetics
https://en.wikipedia.org/wiki/Homology_(biology)
Homology
Homology is similarity that results Homologous genes
from inheritance from a common
ancestor
http://evolution.berkeley.edu/evolibrary/article/1_0_0/eyes_10
Homology
http://beacon-center.org/
Phylogenetics
Autapomorphy:
Derived, but not shared
(unique to a taxon)
Phylogenetics
Plesiomorphy:
Shared ancestral trait,
but not derived
Phylogenetics
Synapomorphy:
Shared and derived
Phylogenetics
Homoplasy:
independently derived
(convergence, secondary loss,
reversion)
Sequence alignments
Cod GCTATCGTAGCTTAATTAAAGTTTAATACTGAAGATATTAGGATGGACCCTAGAAAGTCCCGAAAGCA
Trout GTTGGCGTAGCTTAACTAAAGCATAACACTGAAGCTGTTAAGATGGACCCTAGAAAGTCCCGCGAGCA
Auk GTCTTCGTAGCTTACCGATTAAAGCATGGCACTGAAGATGCCAAGATGGCTGCCATTCATGCACCCGAAGACA
Crow GTCCATGTAGCTTACAACAAAGCATGACACTGAAGATGTCAAGACGGCTGCCACAAACACCCCATGGACA
Chimp GTTTATGTAGCTTACCCCCTCAAAGCAATACACTGAAAATGTTTCGACGGGTTTACATCACCCCATAAACA
Human GTTTATGTAGCTTACCTCCTCAAAGCAATACACTGAAAATGTTTAGACGGGCTCACATCACCCCATAAACA
Cod GCTATCGTAGCTTAATTAAAGTTTAATACTGAAGATATTAGGATGGACCCTAGAAAGTCCCGAAAGCA
Trout GTTGGCGTAGCTTAACTAAAGCATAACACTGAAGCTGTTAAGATGGACCCTAGAAAGTCCCGCGAGCA
Auk GTCTTCGTAGCTTACCGATTAAAGCATGGCACTGAAGATGCCAAGATGGCTGCCATTCATGCACCCGAAGACA
Crow GTCCATGTAGCTTACAACAAAGCATGACACTGAAGATGTCAAGACGGCTGCCACAAACACCCCATGGACA
Chimp GTTTATGTAGCTTACCCCCTCAAAGCAATACACTGAAAATGTTTCGACGGGTTTACATCACCCCATAAACA
Human GTTTATGTAGCTTACCTCCTCAAAGCAATACACTGAAAATGTTTAGACGGGCTCACATCACCCCATAAACA
* ******** * *
Cod GCTATCGTAGCTTA----ATTAAAGTTTAATACTGAAGATATTAGGATGGACCCTAGAAAGT--CCCGAAAGCA
Trout GTTGGCGTAGCTTA----ACTAAAGCATAACACTGAAGCTGTTAAGATGGACCCTAGAAAGT--CCCGCGAGCA
Auk GTCTTCGTAGCTTACCG-ATTAAAGCATGGCACTGAAGATGCCAAGATGGCTGCCATTCATGCACCCGAAGACA
Crow GTCCATGTAGCTTAC---AACAAAGCATGACACTGAAGATGTCAAGACGGCTGCCACAAACAC-CCCATGGACA
Chimp GTTTATGTAGCTTACCCCCTCAAAGCAATACACTGAAAATGTTTCGACGGGTTTACATCAC---CCCATAAACA
Human GTTTATGTAGCTTACCTCCTCAAAGCAATACACTGAAAATGTTTAGACGGGCTCACATCAC---CCCATAAACA
Cod GCTATCGTAGCTTA----ATTAAAGTTTAATACTGAAGATATTAGGATGGACCCTAGAAAGT--CCCGAAAGCA
Trout GTTGGCGTAGCTTA----ACTAAAGCATAACACTGAAGCTGTTAAGATGGACCCTAGAAAGT--CCCGCGAGCA
Auk GTCTTCGTAGCTTACCG-ATTAAAGCATGGCACTGAAGATGCCAAGATGGCTGCCATTCATGCACCCGAAGACA
Crow GTCCATGTAGCTTAC---AACAAAGCATGACACTGAAGATGTCAAGACGGCTGCCACAAACAC-CCCATGGACA
Chimp GTTTATGTAGCTTACCCCCTCAAAGCAATACACTGAAAATGTTTCGACGGGTTTACATCAC---CCCATAAACA
Human GTTTATGTAGCTTACCTCCTCAAAGCAATACACTGAAAATGTTTAGACGGGCTCACATCAC---CCCATAAACA
* ******** **** ****** * ** ** * *** **
Cod GCTATCGTAGCTTA----ATTAAAGTTTAATACTGAAGATATTAGGATGGACCCTAGAAAGT--CCCGAAAGCA
Trout .T.GG.........----.C.....CA...C.......C.G...A.................--....CG....
Auk .TCT..........CCG-.......CA.GGC.........GCC.A.....CTG.C.TTC.TGCA......GA..
Crow .TCCAT........C---.AC....CA.G.C.........G.C.A..C..CTG.C.C...CAC-...ATGGA..
Chimp .T.TAT........CCCCC.C....CAAT.C......A..G..TC..C..GTTTACATC.C---...AT..A..
Human .T.TAT........CCTCC.C....CAAT.C......A..G..TA..C..G.T.ACATC.C---...AT..A..
* ******** **** ****** * ** ** * *** **
Sequence similarity
Human GTTTATGTAGCTTACCTCCTCAAAGCAATACACTGAAAATGTTTAGACGGGCTCACATCACCCCATAAACA
Chimp ****************C***************************C******T*T*****************
Human GTTTATGTAGCTTACCTCCTCAAAGCAATACACTGAAAATGTTTAGACGG-GCTCACATCACCCCATAAACA
Trout ***GGC********----ACT******TA********GC****A***T**AC*CT*G*AAGT***GCG*G**
Human/chimp: 94.4%
Human/trout: 63.2%
Sequence alignments
Homo sapiens
Pan troglodytes
Corvus frugilegus
Alca torda
Salmo trutta
Gadus morhua
Sequence alignments
Homology
Orthology
Paralogy
DNA
Homo sapiens
Pan troglodytes
Corvus frugilegus
Alca torda
Salmo trutta
Gadus morhua
Sequence alignments
protein
Sequence alignments
Sequence comparisons
Evolutionary inference
Ribosomal DNA
Maximum Likelihood (ML) looks for the tree that, under some model of
evolution, maximizes the likelihood of observing the data
Bayesian method (MB) is a more recent variant of ML, seeks the trees with
the greatest likelihood given the data
Tree building methods
Distance methods
UPGMA (Unweighted Pair-Group Method with Arithmetic Mean). Clustering
method. First finds the pair of taxa with the smallest distance between
them, then defines the branching between them by placing a node at the
midpoint of the branch
Minimal Evolution (ME) the sum (S) of all branch length estimates is
computed for all possible topologies, and the topology that has the smallest
S value is chosen as the best tree
Neighbour Joining (NJ) is based on the ME principle, but does not examine
all possible topologies
Tree building methods
Comparison of methods
Results of simulations:
Data sets that allow one method to infer the correct
phylogenetic relationship generally work well with all currently
popular tree-building methods
Tree building methods
Comparison of methods
Neighbour Joining: fast – but some information is lost in compressing sequences
into distances
Parsimony: fast enough to run hundreds of sequences – but can perform poorly if
there is substantial variation in branch lengths
Maximum Likelihood: fully captures what the data tell us about the phylogeny
under a given model – but can be prohibitively slow
Bayesian: strong connection to the ML method and faster than ML – but the prior
distributions for parameters must be specified; it can be difficult to determine if
the Markov chain Monte Carlo (MCMC) approximation has run for long enough
Tree building methods
Comparison of methods
General rule:
If a data set yields similar trees when analysed by fundamentally
different distance matrix, likelihood and parsimony methods, that
tree can be considered to be fairly reliable
No
No Analyse how
Maximum well data
likelihood support
methods prediction
Literature
Graur, D. and Li, W.-H. (1999) Fundamentals of molecular evolution. 2nd edition. Sinauer Associates.
Hall, B.G. (2011) Phylogenetic trees made easy. A how-to manual. 4th edition. Sinauer Associates.
Page, R.D.M. and Holmes, E.C. (1998) Molecular evolution. A phylogenetic approach. Blackwell
Science.
Lemey, P., Salemi, M. and Vandamme, A.-M. (2009) The phylogenetic handbook. A practical
approach to phylogenetic analysis and hypothesis testing. 2nd editon. Cambridge University Press.