Lecture 8 Phylogenetics March 10 2020 Can PDF

Phylogenetics
and
phylogeography
10 March 2020
Reading: Rowe, Sweet & Beebee,
Chapter 9
Phylogenetics + geography = phylogeography
Phylogeography
Understanding the processes that shaped the current
geographical distribution of alleles and populations
http://www.insectsingers.com
Tree of life
Delsuc et al. (2005) Nature Rev. Genet. 16: 361-375

Phylogenetics
Reconstruction of evolutionary history by tracing phylogenetic relationships of species, or
other units of interest, e.g. populations
Phylogenetic trees
https://www.cs.us.es/~fran/students/julian/phylogenetics/phylogenetics.html
Phylogenetics
Reconstruction of evolutionary history by tracing phylogenetic relationships
… based on:
o morphology
o or DNA
Phenotypical traits
Homology
Homology is similarity that results

from inheritance from a common
ancestor
Identification of homologies is
essential to phylogenetics
https://en.wikipedia.org/wiki/Homology_(biology)
Homology
Homology is similarity that results Homologous genes
ancestor
http://evolution.berkeley.edu/evolibrary/article/1_0_0/eyes_10
Molecular evolution and phylogenetics
1. Molecular evolution
studying the rates and patterns of changes in the DNA and its
products (RNA or protein) during evolutionary time
2. Molecular phylogenetics
reconstruction of the evolutionary history of organisms as inferred
from molecular data and the methodology of tree construction
Molecular evolution
Kinds of molecular evolution
Single base substitutions
Insertions/deletions/transpositions/inversions/duplications
Gene structure
Molecular evolution
Single base mutations and substitutions
…what is the difference?
Mutations: random changes in nucleotide sequences that

occur due to mistakes in replication or repair processes
Substitutions: (single nucleotide) mutations that have passed

through the filter of selection on at least some level
Note: germ line vs somatic mutations

Molecular evolution
Patterns of substitution within genes
The relative frequency of mutations: Subject to:
1. Deleterious (disadvantageous) negative selection
2. Beneficial (advantageous) positive selection
3. Neutral (little or no effect on the fitness of an organism) genetic drift

Molecular evolution
Patterns of substitution within genes
Functional constraints
Changes to genes that are disadvantageous are removed by
negative natural selection
It follows that portions of genes that are most functionally

important will accumulate changes more slowly during
evolutionary time
Molecular evolution
5’ UTR 3’ UTR
Evolutionary rates:
Introns and flanking sequences > regions transcribed but not
translated > coding sequences
Molecular evolution
Different types of genomic DNA will evolve differently due to differing
functional constraints
Coding DNA
Non coding DNA

Repeat DNA
Non functional DNA
Transposons
Pseudogenes (~ 4 substitutions per site per million year)
Regulatory DNA
Molecular evolution
Coding sequences
Triplet codon positions can be put in three categories:
1. Non degenerate
2. Twofold degenerate
3. Fourfold degenerate
Molecular evolution
Coding sequences
1. Non degenerate
Codon position where mutation will always result in amino

acid change. E.g. UUU phenylalanine, CUU leucine, AUU
isoleucine, GUU valine
Molecular evolution
Coding sequences
2. Twofold degenerate
Codon position where one or two different substitutions will

result in amino acid change, while one or two other
substitutions will code for the same amino acid.
E.g. GAU and GAC aspartic acid; GAA and GAG glutamic acid.
Molecular evolution
Coding sequences
3. Fourfold degenerate
Codon position where changing the nucleotide to any of the

three alternatives will have no effect on the amino acid.
E.g. third position of glycine: GGU, GGC, GGA, GGG, glycine.
Molecular evolution
Genetic or evolutionary distance
The observed number of (single nucleotide) substitutions

between any two sequences in an alignment is typically the
most important variable in any molecular evolution analysis
Molecular clocks
K: number of nucleotide
substitutions two
sequences have
experienced since they
shared a last common
ancestor
T: divergence time
r: rate of nucleotide
substitution
http://www.evolution.berkeley.edu/evosite/evo101/IIE1cMolecularclocks.shtml r = K/2T
Molecular clocks
http://www.bio.miami.edu/
Molecular clocks
o Functional constraints
make molecular clocks
tick at different rates
for different proteins
o The rate differs

between different
organismal lineages
o The rate can speed up

or slow down over
evolutionary time for
any given gene
http://sandwalk.blogspot.no/2012/01/modern-molecular-clock.html
Molecular evolution
Models of molecular evolution
It is not only about the number of

substitutions, but also the nature
or pattern of the substitutions. This
leads us to looking at
models of molecular evolution
http://www.bio.miami.edu/
Molecular evolution
Uncorrected model of molecular evolution
p distance
the proportion (p) of nucleotide sites at which two sequences
being compared are different. It is simply obtained by dividing
the number of nucleotide differences by the total number of
nucleotides compared.
Seq 1: TAGTTGCAAC p = 2/10

Seq 2: TAGTCGCAGC
Molecular evolution
Corrected models of molecular evolution
These types of distance measure takes into consideration at

least one of the following:
• multiple substitutions at the same site
• substitution rate biases (for instance between transitions
and transversions)
• base frequencies
• differences in substitution rate between sites
Molecular evolution
Ancestral sequence
TGCCAGCTTAGCCA
Molecular evolution
Ancestral sequence
TGCCAGCTTAGCCA
Sp 1 TGCTAGCTTAACCA Sp 2 TGTCAACTTAACCT
Molecular evolution
Ancestral sequence
TGCCAGCTTAGCCA
Molecular evolution
Ancestral sequence
TGCCAGCTTAGCCA
Sp 1 TGCTAGCTTAACCA Sp 2 TGTCAACTTAGTCT
Molecular evolution
Ancestral sequence
TGCCAGCTTAGCCA
Differences: 2 4
Molecular evolution
Ancestral sequence
TGCCAGCTTAGCCA
Differences: 2 4
However, multiple substitutions can take place at the same site

Molecular evolution
Multiple substitutions at the same site
Ancestral sequence
TGCCAGCTTAGCCA
Differences: 2 4
Substitutions: 4 4
Number of substitutions
Sp 1: G → A → C → A
Molecular evolution
Jukes-Cantor Model
3 4
d=− ln(1 − 𝑝)
4 3
Assumptions:
- equal base frequencies
- all substitutions are equally likely
(happen at rate α)
Molecular evolution
Transitions: purine (A – G) – purine

pyrimidine (C – T) – pyrimidine
Transversions: purine - pyrimidine

Molecular evolution
Empirically:
Transitions are more frequent
than transversions
http://openi.nlm.nih.gov/
Molecular evolution
Kimura’s two-parameter model (K2P)
1 1 p: proportion of sites that show transitional differences
d= − ln 1 − 2𝑝 − 𝑞 − ln(1 − 2𝑞) q: proportion of sites that show transversional differences
2 4
Assumptions:
- equal base frequencies
- transitions and transversions happen
at different rates (α and β respectively)
Molecular evolution
A multitude of models of molecular evolution
Jukes-Cantor 1969 (JC69)

Kimura’s two-parameter 1980 (K2P)
Felsenstein 1981 (F81)
Hasegawa, Kishino and Yano 1985 (HKY85)
Tamura 1992 (T92)
Tamura and Nei 1993 (TN93)
Tavaré 1986 Generalised time-reversible (GTR)
Molecular evolution
A multitude of models of molecular evolution
Jukes-Cantor 1969 (JC69)
Kimura’s two-parameter 1980 (K2P)
Felsenstein 1981 (F81)
e.g. Mega 7 Hasegawa, Kishino and Yano 1985 (HKY85)
Tamura 1992 (T92)
(or Mega X) Tamura and Nei 1993 (TN93)
Tavaré 1986 Generalised time-reversible (GTR)
How to choose a model?

Choose the simplest model that fit your data!
There are programs available for model choice (e.g. jModelTest)
Tracing the evolutionary history of mammals
Artiodactyla
Artiodactyla
Cetartiodactyla
Artiodactyla
Cetartiodactyla
Convergent evolution
https://online.science.psu.edu/biol011_sandbox_7239/node/7328
Molecular phylogenetics
Using DNA or protein sequences
Are we tracing the evolutionary history of:
o organisms
o or genes…
Molecular phylogenetics
Molecular phylogeny of
collembolean Acanthanura sp.
based on mt COI sequences
Emerson et al. 2011

Phylogenetic tree
A graphical representation of the

evolutionary relationships among
three or more genes or organisms
Phylogenetics
Cladograms
Phylogenetics
Cladograms vs phylograms
cladogram phylogram
Distance scale
http://carrot.mcb.uconn.edu/~olgazh/bioinf2010/class16.html
https://www.cs.us.es/~fran/students/julian/phylogenetics/phylogenetics.html
Phylogenetics
Rooted and unrooted trees
Species trees vs gene trees
Problems associated with

molecular phylogenetics due to
o Hybridization
o Lineage sorting
https://nothinginbiology.org
Phenotypical traits
Homology
Homology is similarity that results

ancestor
Identification of homologies is
essential to phylogenetics
https://en.wikipedia.org/wiki/Homology_(biology)
Homology
Homology is similarity that results Homologous genes
ancestor
http://evolution.berkeley.edu/evolibrary/article/1_0_0/eyes_10
Homology
Orthologs and paralogs
http://beacon-center.org/
Phylogenetics
Autapomorphy:
Derived, but not shared
(unique to a taxon)
Phylogenetics
Plesiomorphy:
Shared ancestral trait,
but not derived
Phylogenetics
Synapomorphy:
Shared and derived
Phylogenetics
Homoplasy:
independently derived
(convergence, secondary loss,
reversion)
Sequence alignments
Cod GCTATCGTAGCTTAATTAAAGTTTAATACTGAAGATATTAGGATGGACCCTAGAAAGTCCCGAAAGCA
Trout GTTGGCGTAGCTTAACTAAAGCATAACACTGAAGCTGTTAAGATGGACCCTAGAAAGTCCCGCGAGCA
Auk GTCTTCGTAGCTTACCGATTAAAGCATGGCACTGAAGATGCCAAGATGGCTGCCATTCATGCACCCGAAGACA
Crow GTCCATGTAGCTTACAACAAAGCATGACACTGAAGATGTCAAGACGGCTGCCACAAACACCCCATGGACA
Chimp GTTTATGTAGCTTACCCCCTCAAAGCAATACACTGAAAATGTTTCGACGGGTTTACATCACCCCATAAACA
Human GTTTATGTAGCTTACCTCCTCAAAGCAATACACTGAAAATGTTTAGACGGGCTCACATCACCCCATAAACA
Mitochondrial tRNA-Phe genes in:

two fishes, two birds, chimpanzee and human.
Sequence alignments
Cod GCTATCGTAGCTTAATTAAAGTTTAATACTGAAGATATTAGGATGGACCCTAGAAAGTCCCGAAAGCA
Trout GTTGGCGTAGCTTAACTAAAGCATAACACTGAAGCTGTTAAGATGGACCCTAGAAAGTCCCGCGAGCA
Auk GTCTTCGTAGCTTACCGATTAAAGCATGGCACTGAAGATGCCAAGATGGCTGCCATTCATGCACCCGAAGACA
Crow GTCCATGTAGCTTACAACAAAGCATGACACTGAAGATGTCAAGACGGCTGCCACAAACACCCCATGGACA
Chimp GTTTATGTAGCTTACCCCCTCAAAGCAATACACTGAAAATGTTTCGACGGGTTTACATCACCCCATAAACA
* ******** * *
Mitochondrial tRNA-Phe genes in:

Sequence alignments
Cod GCTATCGTAGCTTA----ATTAAAGTTTAATACTGAAGATATTAGGATGGACCCTAGAAAGT--CCCGAAAGCA
Trout GTTGGCGTAGCTTA----ACTAAAGCATAACACTGAAGCTGTTAAGATGGACCCTAGAAAGT--CCCGCGAGCA
Auk GTCTTCGTAGCTTACCG-ATTAAAGCATGGCACTGAAGATGCCAAGATGGCTGCCATTCATGCACCCGAAGACA
Crow GTCCATGTAGCTTAC---AACAAAGCATGACACTGAAGATGTCAAGACGGCTGCCACAAACAC-CCCATGGACA
Chimp GTTTATGTAGCTTACCCCCTCAAAGCAATACACTGAAAATGTTTCGACGGGTTTACATCAC---CCCATAAACA
Human GTTTATGTAGCTTACCTCCTCAAAGCAATACACTGAAAATGTTTAGACGGGCTCACATCAC---CCCATAAACA
Nucleotide sequence alignment, produced by ClustalX between

mitochondrial tRNA-Phe genes in:
Sequence alignments
* ******** **** ****** * ** ** * *** **

Sequence alignments
Trout .T.GG.........----.C.....CA...C.......C.G...A.................--....CG....
Auk .TCT..........CCG-.......CA.GGC.........GCC.A.....CTG.C.TTC.TGCA......GA..
Crow .TCCAT........C---.AC....CA.G.C.........G.C.A..C..CTG.C.C...CAC-...ATGGA..
Chimp .T.TAT........CCCCC.C....CAAT.C......A..G..TC..C..GTTTACATC.C---...AT..A..
Human .T.TAT........CCTCC.C....CAAT.C......A..G..TA..C..G.T.ACATC.C---...AT..A..
* ******** **** ****** * ** ** * *** **

Sequence alignments
Sequence similarity
Chimp ****************C***************************C******T*T*****************
Human GTTTATGTAGCTTACCTCCTCAAAGCAATACACTGAAAATGTTTAGACGG-GCTCACATCACCCCATAAACA
Trout ***GGC********----ACT******TA********GC****A***T**AC*CT*G*AAGT***GCG*G**
Human/chimp: 94.4%
Human/trout: 63.2%
Sequence alignments
Sequence similarity vs. homology

DNA
Homo sapiens
Pan troglodytes
Corvus frugilegus
Alca torda
Salmo trutta
Gadus morhua
Sequence alignments
Homology
Orthology
Paralogy
DNA
Homo sapiens
Pan troglodytes
Corvus frugilegus
Alca torda
Salmo trutta
Gadus morhua
Sequence alignments
Sequence identity and sequence homology
Two protein sequences, 100 aa length
“Safe zone” > 30%
“Twilight zone” 20 – 30%
“Midnight zone” < 20%
protein
Sequence alignments
Sequence comparisons
Evolutionary inference
Structural and functional inferences

Sequence alignments and homologous positions
* ******** **** ****** * ** ** * *** **
Use secondary structure to guide alignments!
Ribosomal DNA
Protein coding genes

Tree building methods
Optimality criterion
Using the multiple alignment directly by comparing
characters within each column (each site) in the alignment
Maximum Parsimony (MP) looks for the tree with the minimum number of
changes
Maximum Likelihood (ML) looks for the tree that, under some model of
evolution, maximizes the likelihood of observing the data
Bayesian method (MB) is a more recent variant of ML, seeks the trees with
the greatest likelihood given the data
Distance methods
UPGMA (Unweighted Pair-Group Method with Arithmetic Mean). Clustering
method. First finds the pair of taxa with the smallest distance between
them, then defines the branching between them by placing a node at the
midpoint of the branch
Minimal Evolution (ME) the sum (S) of all branch length estimates is
computed for all possible topologies, and the topology that has the smallest
S value is chosen as the best tree
Neighbour Joining (NJ) is based on the ME principle, but does not examine
all possible topologies
Comparison of methods
Neither the distance- nor the character-based methods of

phylogenetic reconstruction make any guarantee that they yield
one true tree that describes the evolutionary history of a set of
aligned sequences
Results of simulations:
Data sets that allow one method to infer the correct
phylogenetic relationship generally work well with all currently
popular tree-building methods
Neighbour Joining: fast – but some information is lost in compressing sequences
into distances
Parsimony: fast enough to run hundreds of sequences – but can perform poorly if
there is substantial variation in branch lengths
Maximum Likelihood: fully captures what the data tell us about the phylogeny
under a given model – but can be prohibitively slow
Bayesian: strong connection to the ML method and faster than ML – but the prior
distributions for parameters must be specified; it can be difficult to determine if
the Markov chain Monte Carlo (MCMC) approximation has run for long enough
It is the number of sequences in the alignment that causes

computational problems, not their lengths
There is no single best method!
General rule:
If a data set yields similar trees when analysed by fundamentally
different distance matrix, likelihood and parsimony methods, that
tree can be considered to be fairly reliable
Test significance: bootstrapping

Guidelines
Choose a Obtain Is there a Yes Maximum

set of multiple strong parsimony
related sequence sequence methods
sequences alignment similarity?
No
Is there clearly Yes

recognizable Distance
sequence methods
similarity?
No Analyse how
Maximum well data
likelihood support
methods prediction
Literature
Graur, D. and Li, W.-H. (1999) Fundamentals of molecular evolution. 2nd edition. Sinauer Associates.
Hall, B.G. (2011) Phylogenetic trees made easy. A how-to manual. 4th edition. Sinauer Associates.
Page, R.D.M. and Holmes, E.C. (1998) Molecular evolution. A phylogenetic approach. Blackwell
Science.
Lemey, P., Salemi, M. and Vandamme, A.-M. (2009) The phylogenetic handbook. A practical
approach to phylogenetic analysis and hypothesis testing. 2nd editon. Cambridge University Press.

Lecture 8 Phylogenetics March 10 2020 Can PDF

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Lecture 8 Phylogenetics March 10 2020 Can PDF

Transféré par

Droits d'auteur :

Formats disponibles

Phylogenetics

Delsuc et al. (2005) Nature Rev. Genet. 16: 361-375

Reconstruction of evolutionary history by tracing phylogenetic relationships

Homology is similarity that results

Single base substitutions

Mutations: random changes in nucleotide sequences that

Substitutions: (single nucleotide) mutations that have passed

Note: germ line vs somatic mutations

The relative frequency of mutations: Subject to:

1. Deleterious (disadvantageous) negative selection

2. Beneficial (advantageous) positive selection

3. Neutral (little or no effect on the fitness of an organism) genetic drift

It follows that portions of genes that are most functionally

Non coding DNA

Triplet codon positions can be put in three categories:

Triplet codon positions can be put in three categories:

Codon position where mutation will always result in amino

Triplet codon positions can be put in three categories:

Codon position where one or two different substitutions will

Triplet codon positions can be put in three categories:

Codon position where changing the nucleotide to any of the

The observed number of (single nucleotide) substitutions

o The rate differs

o The rate can speed up

It is not only about the number of

Seq 1: TAGTTGCAAC p = 2/10

These types of distance measure takes into consideration at

However, multiple substitutions can take place at the same site

Transitions: purine (A – G) – purine

Transversions: purine - pyrimidine

A multitude of models of molecular evolution

Jukes-Cantor 1969 (JC69)

How to choose a model?

Are we tracing the evolutionary history of:

Emerson et al. 2011

A graphical representation of the

Problems associated with

Homology is similarity that results

Orthologs and paralogs

Mitochondrial tRNA-Phe genes in:

Mitochondrial tRNA-Phe genes in:

Nucleotide sequence alignment, produced by ClustalX between

Nucleotide sequence alignment, produced by ClustalX between

Nucleotide sequence alignment, produced by ClustalX between

Sequence similarity vs. homology

Sequence identity and sequence homology

Two protein sequences, 100 aa length

“Safe zone” > 30%

“Twilight zone” 20 – 30%

“Midnight zone” < 20%

Structural and functional inferences

Use secondary structure to guide alignments!

Protein coding genes

Neither the distance- nor the character-based methods of

It is the number of sequences in the alignment that causes

There is no single best method!

Test significance: bootstrapping

Choose a Obtain Is there a Yes Maximum

Is there clearly Yes

Vous aimerez peut-être aussi