BIOSIS 2017-Phylogenetic Tree (I)

PHYLOGENETIC ANALYSIS
(I)
Wellyzar Sjamsuridzal
4/10/2017 1
 A key goal of
evolutionary biology:
reconstruct history of
speciation events (i.e.
build Phylogenetic trees)
 Phylogenetic trees have

been constructed for
years using
morphological (i.e.,
physical) features and
DNA sequence data has
led to wider interest in
such trees.
4/10/2017 2
(Haeckel, 1887)
 DNA or the genome is called as the
blueprint of life It is more reliable.
 Even the phenotypes are actually

coined by these Blueprints
 Thus molecular evolution based on

molecular chronometers is widely
accepted.
3
 Changes in DNA sequences:
- Insert and deletion (del)
- Transition, transversion
 The changes in Blueprints are

translated to functional proteins
DNA RNA Protein
4/10/2017 4
 Molecules have changed with time to metamorphose
into diversity of life from a basic type.
Basic set of life Change in molecules with time

4/10/2017 5
1. A phylogeny is a graphical summary of
the evolutionary relationship of taxa or
populations, or genes
2. It is a hypothesis
3. It shows the sequence of the appearance
of species and the relationships of species
4/10/2017 6
Present
Most recent
common ancestor
species to B & C
Nodes –
branch point,
speciation event
Most recent common
Past ancestor species to
A, B & C
The number of nucleic acid or amino acid differences
between two organisms is proportional to the
time since they diverged from a common ancestor.
1 AAGGCTA 1 2 3
2 AAGGGTA 100years
3 AAGGATG
Example 200
Rate of Evolution = years
1bp per 100 years
4/10/2017 8
Tree thinking and phylogeny
Nodes: branching points
Branches: lines
Phylogenetic tree Topology: branching pattern
A simple tree and

associated terms
4/10/2017 9
Common Phylogenetic Tree Terminology
Terminal Nodes
Branches or
Lineages A Represent the
TAXA (genes,
B populations,
species, etc.)
used to infer
C the phylogeny
D
Ancestral Node
or ROOT of Internal Nodes or E
the Tree Divergence Points
(represent hypothetical
ancestors of the taxa)
4/10/2017 10
Terminals / Taxa
4/10/2017 11
4/10/2017 12
Types of trees
Cladogram Phylogram
6
Taxon B 1 Taxon B
1
Taxon C 3 Taxon C
1
Taxon A Taxon A
Taxon D 5 Taxon D
no meaning genetic change
All show the same evolutionary relationships, or

branching orders, between the taxa.
4/10/2017 13
1. In classification (taxonomy).
2. In grouping of genes, proteins and other molecular
sequences including non-coding sequences.
3. In epidemiological investigations mainly in relation
to virus.
4. In the analysis of parallel evolution between host
and parasite.
14
 Past: (Priority wise)  Present: (Priority wise)
- Morphology - Sequence
- Biochemical/physiological/ -Biochemical/physiological/
cultural tests
cultural tests
- Chemotaxonomy
- DNA-DNA hybridization
- DNA-DNA hybridization
-16S rDNA sequence - Morphology
- Chemotaxonomy
4/10/2017 15
IDENTIFICATION OF ORGANISMS THROUGH
SEQUENCES OF RIBOSOMAL DNA
4/10/2017 16
Work scheme
Pure culture
DNA isolation
PCR reaction Elektroforesis
Cycle sequencing
Software analysis and rDNA databank
Phylogenetic tree
ID result
4/10/2017 17
 Determination of sequences using DNA
sequencer
 Assembling and editing sequences
software: BioEDIT, MEGA, ATCG, Auto-
assembler
 Determining phylogenetic position
4/10/2017 18
A number of different algorithms are used.
Following algorithms are being widely used:

1. Distance matrix methods
- unweighted pair-group method using arithmetic
average (UPGMA)
- neighbor-joining method (NJ)
2. Maximum parsimony methods (MP)
3. Maximum likelihood method (ML)
4/10/2017 19
(1) Genetic distance/Distance method (fast)
- pairs up the closest sequences (lowest % difference) as
sister taxa, builds a tree from there
(2) Maximum parsimony (slow)
- uses only informative sites to draw the most
parsimonious tree
- discards lots of information
(3) Maximum likelihood (slow)
- uses a model of DNA sequence evolution to figure out
the odds of getting a particular tree; chooses most likely
4/10/2017 20
 Neighbor-joining ClustalX
 Maximum parsimony MEGA, PAUP
 Maximum likelihood Bioedit, Phylip
4/10/2017 21
1. Distance methods - uses pairs of sequences to get a
dissimilatory measure
a. Common ones - UPGMA (Unweighted-Pair-group
Method with Arithmetic mean) and NJ (Neighbor
joining)
b. Calculates total number of changes - scored according
to type-between every pair of sequences in alignment
c. Represents minimum number of changes required to
convert 1 sequence to another
d. Results written to distance matrix used to generate
tree several possible ways - branch lengths visually
represent amount of change
4/10/2017 22
 Simplest among algorithm (Sneath & Sokal, 1973)
 Here on each sequence used will be referred to as
Operational taxonomic unit (OTU)
 Distance among OTUs is calculated
 Most evolutionary close pair is found
4/10/2017 23
Most closely related group is found
4/10/2017 24
4/10/2017 25
4/10/2017 26
4/10/2017 27
 Jukes & Cantor method, any change is scored
equivalently.
 Kimura 2-parameter model, in which transversions
and transitions are scored differently since
Transitions are 2-20 times more common than
transversions.
 Transition: Change of a pyrimidine nucleotide into
to another pyrimidine or change of a purine
nucleotide into an another purine nucleotide.
 Transversion: Change of a pyrimidine nucleotide
into a purine nucleotide or vice versa. Transversions
are 2-20 rarer than transitions.
4/10/2017 28
4/10/2017 29
 Based on minimum evolution concept
 Searches for closest neighbor or sub-

tree
 Starts with the search for two closest

taxa connected by a single node
4/10/2017 30
Neighbor-joining
This method is a least-squares distance-matrix.
 A B C D E
A - - - - -
B 0.10 - - - -
C 0.19 0.21 - - -
D 0.25 0.25 0.25 - -
E 0.24 0.26 0.25 0.05 -
The closest neighbors in the distance matrix are D and

E (0.05), so these branches are joined:
The distances from all other sequences to D and E are

then averaged to reduce the distance matrix:
Now the closest neighbors are A and B, so join them:
That's it! If there were more sequences, you'd re-reduce

the matrix as before, & repeat the process over-and
over until all of the nodes were resolved.
4/10/2017 31
4/10/2017 32
4/10/2017 33
4/10/2017 34
New cluster in Cystofilobasidium lineage (Class Hymenomycetes)
Based on ITS-D1/D2 region of LSU rDNA sequence data
Cystofilobasidium infirmominiatum AB072226

Cryptococcus macerans AB032642
NJ Tree 62 Cystofilobasidium ferigula AB032628

62 Cystofilobasidium bisporidii AB072225
New Cluster, isolates from Litter

ID05-Y033
76
ID05-Y075
98 ID05-Y064
ID05-Y050
100 ID05-Y031
ID05-Y076
95
ID05-Y048
ID05-Y034
100 ID05-Y046
59
ID05-Y045
ID05-Y025
Udeniomyces pseudopyricola AY841862
88 Cystofilobasidium capitatum AJ508233
0.1 64
Cystofilobasidium lari-marini AY052486
Cystofilobasidium infirmominiatum DQ645523
4/10/2017 35
 a. Parsimony means thrift or stringiness
 b. Based on corresponding sequence positions
 c. Uses only “informational” positions
 d. Finds tree that requires the fewest number of mutational events
 e. Calculates branch order - not branch length
 f. Advantage - calculations are rapid, can infer ancestral sequences
 g. Disadvantage - large amount of data is discarded, problem if use

short sequence or one without many informative sites
4/10/2017 36
Parsimony
 The tree that requires the smallest number of

sequence changes is the most likely tree.
 No distance matrix is calculated, instead trees

are searched and each ancestral sequence
calculated, then the number of "mutations"
required are added up.
 Testing every possible tree is not usually

possible, so a variety of search algorithms are
used to examine only the most likely trees.
4/10/2017 37
4/10/2017 38
4/10/2017 39
4/10/2017 40
4/10/2017 41
4/10/2017 42
4/10/2017 43
4/10/2017 44
 a. Purely statistical based method
 b. Uses every site unlike parsimony as unchanged sites have a
chance of having changed and then changed back
 c. For each possible tree - likelihood of changes is calculated
 and probabilities for each aligned position are multiplied to
 get tree likelihood
 d. Tree with maximum likelihood is most probable tree
 e. Disadvantage - very slow to calculate, only as good as
substitution model used
4/10/2017 45
 As for a particular base different trees are obtained. Then the
score for each tree is calculated by comparing all the bases with
its external most node using HKY model. A tree with minimum
score is most likely.
4/10/2017 46

BIOSIS 2017-Phylogenetic Tree (I)

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

BIOSIS 2017-Phylogenetic Tree (I)

Transféré par

Droits d'auteur :

Formats disponibles

PHYLOGENETIC ANALYSIS

 Phylogenetic trees have

 Even the phenotypes are actually

 Thus molecular evolution based on

 The changes in Blueprints are

Basic set of life Change in molecules with time

A simple tree and

All show the same evolutionary relationships, or

PCR reaction Elektroforesis

Software analysis and rDNA databank

Following algorithms are being widely used:

 Searches for closest neighbor or sub-

 Starts with the search for two closest

The closest neighbors in the distance matrix are D and

The distances from all other sequences to D and E are

Now the closest neighbors are A and B, so join them:

That's it! If there were more sequences, you'd re-reduce

Cystofilobasidium infirmominiatum AB072226

NJ Tree 62 Cystofilobasidium ferigula AB032628

New Cluster, isolates from Litter

 b. Based on corresponding sequence positions

 c. Uses only “informational” positions

 d. Finds tree that requires the fewest number of mutational events

 e. Calculates branch order - not branch length

 f. Advantage - calculations are rapid, can infer ancestral sequences

 g. Disadvantage - large amount of data is discarded, problem if use

 The tree that requires the smallest number of

 No distance matrix is calculated, instead trees

 Testing every possible tree is not usually

Vous aimerez peut-être aussi