Vous êtes sur la page 1sur 46

PHYLOGENETIC ANALYSIS

(I)

Wellyzar Sjamsuridzal
4/10/2017 1
 A key goal of
evolutionary biology:
reconstruct history of
speciation events (i.e.
build Phylogenetic trees)

 Phylogenetic trees have


been constructed for
years using
morphological (i.e.,
physical) features and
DNA sequence data has
led to wider interest in
such trees.
4/10/2017 2
(Haeckel, 1887)
 DNA or the genome is called as the
blueprint of life It is more reliable.

 Even the phenotypes are actually


coined by these Blueprints

 Thus molecular evolution based on


molecular chronometers is widely
accepted.

3
 Changes in DNA sequences:
- Insert and deletion (del)
- Transition, transversion

 The changes in Blueprints are


translated to functional proteins
DNA RNA Protein

4/10/2017 4
 Molecules have changed with time to metamorphose
into diversity of life from a basic type.

Basic set of life Change in molecules with time


4/10/2017 5
1. A phylogeny is a graphical summary of
the evolutionary relationship of taxa or
populations, or genes
2. It is a hypothesis
3. It shows the sequence of the appearance
of species and the relationships of species

4/10/2017 6
Present

Most recent
common ancestor
species to B & C

Nodes –
branch point,
speciation event
Most recent common
Past ancestor species to
A, B & C
The number of nucleic acid or amino acid differences
between two organisms is proportional to the
time since they diverged from a common ancestor.

1 AAGGCTA 1 2 3
2 AAGGGTA 100years

3 AAGGATG
Example 200
Rate of Evolution = years
1bp per 100 years
4/10/2017 8
Tree thinking and phylogeny
Nodes: branching points
Branches: lines
Phylogenetic tree Topology: branching pattern

A simple tree and


associated terms
4/10/2017 9
Common Phylogenetic Tree Terminology
Terminal Nodes
Branches or
Lineages A Represent the
TAXA (genes,
B populations,
species, etc.)
used to infer
C the phylogeny

D
Ancestral Node
or ROOT of Internal Nodes or E
the Tree Divergence Points
(represent hypothetical
ancestors of the taxa)

4/10/2017 10
Terminals / Taxa

4/10/2017 11
4/10/2017 12
Types of trees
Cladogram Phylogram
6
Taxon B 1 Taxon B
1
Taxon C 3 Taxon C
1
Taxon A Taxon A

Taxon D 5 Taxon D
no meaning genetic change

All show the same evolutionary relationships, or


branching orders, between the taxa.
4/10/2017 13
1. In classification (taxonomy).
2. In grouping of genes, proteins and other molecular
sequences including non-coding sequences.
3. In epidemiological investigations mainly in relation
to virus.
4. In the analysis of parallel evolution between host
and parasite.

14
 Past: (Priority wise)  Present: (Priority wise)

- Morphology - Sequence
- Biochemical/physiological/ -Biochemical/physiological/
cultural tests
cultural tests
- Chemotaxonomy
- DNA-DNA hybridization
- DNA-DNA hybridization
-16S rDNA sequence - Morphology
- Chemotaxonomy

4/10/2017 15
IDENTIFICATION OF ORGANISMS THROUGH
SEQUENCES OF RIBOSOMAL DNA

4/10/2017 16
Work scheme
Pure culture

DNA isolation

PCR reaction Elektroforesis

Cycle sequencing

Software analysis and rDNA databank

Phylogenetic tree
ID result
4/10/2017 17
 Determination of sequences using DNA
sequencer
 Assembling and editing sequences
software: BioEDIT, MEGA, ATCG, Auto-
assembler
 Determining phylogenetic position

4/10/2017 18
A number of different algorithms are used.

Following algorithms are being widely used:


1. Distance matrix methods
- unweighted pair-group method using arithmetic
average (UPGMA)
- neighbor-joining method (NJ)
2. Maximum parsimony methods (MP)
3. Maximum likelihood method (ML)
4/10/2017 19
(1) Genetic distance/Distance method (fast)
- pairs up the closest sequences (lowest % difference) as
sister taxa, builds a tree from there
(2) Maximum parsimony (slow)
- uses only informative sites to draw the most
parsimonious tree
- discards lots of information
(3) Maximum likelihood (slow)
- uses a model of DNA sequence evolution to figure out
the odds of getting a particular tree; chooses most likely
4/10/2017 20
 Neighbor-joining ClustalX
 Maximum parsimony MEGA, PAUP
 Maximum likelihood Bioedit, Phylip

4/10/2017 21
1. Distance methods - uses pairs of sequences to get a
dissimilatory measure
a. Common ones - UPGMA (Unweighted-Pair-group
Method with Arithmetic mean) and NJ (Neighbor
joining)
b. Calculates total number of changes - scored according
to type-between every pair of sequences in alignment
c. Represents minimum number of changes required to
convert 1 sequence to another
d. Results written to distance matrix used to generate
tree several possible ways - branch lengths visually
represent amount of change
4/10/2017 22
 Simplest among algorithm (Sneath & Sokal, 1973)
 Here on each sequence used will be referred to as
Operational taxonomic unit (OTU)
 Distance among OTUs is calculated
 Most evolutionary close pair is found

4/10/2017 23
Most closely related group is found

4/10/2017 24
4/10/2017 25
4/10/2017 26
4/10/2017 27
 Jukes & Cantor method, any change is scored
equivalently.
 Kimura 2-parameter model, in which transversions
and transitions are scored differently since
Transitions are 2-20 times more common than
transversions.
 Transition: Change of a pyrimidine nucleotide into
to another pyrimidine or change of a purine
nucleotide into an another purine nucleotide.
 Transversion: Change of a pyrimidine nucleotide
into a purine nucleotide or vice versa. Transversions
are 2-20 rarer than transitions.
4/10/2017 28
4/10/2017 29
 Based on minimum evolution concept

 Searches for closest neighbor or sub-


tree

 Starts with the search for two closest


taxa connected by a single node
4/10/2017 30
Neighbor-joining
This method is a least-squares distance-matrix.

 A B C D E
A - - - - -
B 0.10 - - - -
C 0.19 0.21 - - -
D 0.25 0.25 0.25 - -
E 0.24 0.26 0.25 0.05 -

The closest neighbors in the distance matrix are D and


E (0.05), so these branches are joined:

The distances from all other sequences to D and E are


then averaged to reduce the distance matrix:

Now the closest neighbors are A and B, so join them:

That's it! If there were more sequences, you'd re-reduce


the matrix as before, & repeat the process over-and
over until all of the nodes were resolved.
4/10/2017 31
4/10/2017 32
4/10/2017 33
4/10/2017 34
New cluster in Cystofilobasidium lineage (Class Hymenomycetes)
Based on ITS-D1/D2 region of LSU rDNA sequence data

Cystofilobasidium infirmominiatum AB072226


Cryptococcus macerans AB032642

NJ Tree 62 Cystofilobasidium ferigula AB032628


62 Cystofilobasidium bisporidii AB072225

New Cluster, isolates from Litter


ID05-Y033
76
ID05-Y075
98 ID05-Y064
ID05-Y050
100 ID05-Y031
ID05-Y076
95
ID05-Y048
ID05-Y034
100 ID05-Y046
59
ID05-Y045
ID05-Y025
Udeniomyces pseudopyricola AY841862
88 Cystofilobasidium capitatum AJ508233
0.1 64
Cystofilobasidium lari-marini AY052486
Cystofilobasidium infirmominiatum DQ645523
4/10/2017 35
 a. Parsimony means thrift or stringiness

 b. Based on corresponding sequence positions

 c. Uses only “informational” positions

 d. Finds tree that requires the fewest number of mutational events

 e. Calculates branch order - not branch length

 f. Advantage - calculations are rapid, can infer ancestral sequences

 g. Disadvantage - large amount of data is discarded, problem if use


short sequence or one without many informative sites
4/10/2017 36
Parsimony

 The tree that requires the smallest number of


sequence changes is the most likely tree.

 No distance matrix is calculated, instead trees


are searched and each ancestral sequence
calculated, then the number of "mutations"
required are added up.

 Testing every possible tree is not usually


possible, so a variety of search algorithms are
used to examine only the most likely trees.
4/10/2017 37
4/10/2017 38
4/10/2017 39
4/10/2017 40
4/10/2017 41
4/10/2017 42
4/10/2017 43
4/10/2017 44
 a. Purely statistical based method
 b. Uses every site unlike parsimony as unchanged sites have a
chance of having changed and then changed back
 c. For each possible tree - likelihood of changes is calculated
 and probabilities for each aligned position are multiplied to
 get tree likelihood
 d. Tree with maximum likelihood is most probable tree
 e. Disadvantage - very slow to calculate, only as good as
substitution model used

4/10/2017 45
 As for a particular base different trees are obtained. Then the
score for each tree is calculated by comparing all the bases with
its external most node using HKY model. A tree with minimum
score is most likely.

4/10/2017 46