Académique Documents
Professionnel Documents
Culture Documents
1 of 6
http://www.icp.ucl.ac.be/~opperd/private/parsimony.html
11/20/2013 6:18 AM
Maximum parsimony
2 of 6
http://www.icp.ucl.ac.be/~opperd/private/parsimony.html
_________________________
Sequence
For four OTUs there are three possible unrooted trees. The trees are then analysed by searching for the
ancestral sequences and by counting the number of mutations required to explain the respective trees as
shown below:
(1) AAGAGTGCA
\4
\
4
AGCCGTGCG --/
/0
(2) AGCCGTGCG
AGATATCCA (3)
2/
/
AGAGATCCG
\
0\
AGAGATCCG (4)
Number of mutations
Tree I:
11
(1) AAGAGTGCA
AGCCGTGCG (2)
\1
3/
\
5
/
AGGAGTGCA --- AGAGGTCCG
/
\
/4
1\
(3) AGATATCCA
AGAGATCCG (4)
Tree II:
14
(1) AAGAGTGCA
AGCCGTGCG (2)
\1
3/
\
5
/
AGGAGTGCA --- AGATGTCCG
/
\
/5
2\
(4) AGAGATCCG
AGATATCCA (3)
Tree III: 16
Tree I has the topology with the least number of mutations and thus is the most parsimonious tree.
NB: The above analysis is based on all the sites in the sequence alignment . However, a number of the
sites are non-informative and, therefore, do not have to be included in the analysis. When only
informative sites are included a much lesser number of sites can be analysed, which means in the case of
large datasets a considerable gain in CPU time.
Informative sites
The definition of an informative site is as follows. A site is informative only when there are at least two
different kinds of nucleotides at the site, each of which is represented in at least two of the sequences
under study.
To illustrate the distinction between informative and non-informative sites, lets have a look the same four
hypothetical sequences as above.
11/20/2013 6:18 AM
Maximum parsimony
3 of 6
http://www.icp.ucl.ac.be/~opperd/private/parsimony.html
Site
_________________________
Sequence
There are three possible unrooted trees for four OTUs (tree I, II and III, see figure below). Site 1 is not
informative because all sequences at this site have A, so that no change is required in any of the three
possible trees. At site 2, sequence 1 has A while all other sequences have G, and so a simple assumption is
that the nucleotide has changed from G to A in the lineage leading to sequence 1. Thus, this site is also not
informative, because each of the three possible trees requires 1 change. As shown in the figure, for site 3
each of the three possible trees requires 2 changes and so it is also not informative. Note that if we assume
that the nucleotide at the node connecting OTUs 1 and 2 in tree I is C (or A) instead of G, the number of
changes required for the tree remains 2. The figure shows that for site 4 each of the three trees requires 3
changes and thus site 4 is also non-informative. For site 5, tree I requires only 1 change, whereas trees II
and III require 2 changes each (Figure c). Therefore, this site is informative.
From these examples, we see that, as far as molecular data are concerned, a site is informative only when
there are at least two different kinds of nucleotides at the site, each of which is represented in at least two
of the sequences under study. In the above example, informative sites are indicated by an asterisk (*).
Below you see the four sequences and their corresponding three possible trees made with only the
informative sites :
1
2
3
4
GGA
GGG
ACA
ACG
***
(1)
(2)
(1)
(3)
(1)
GGA
ACA (3)
\1
1/
\
2
/
GGG --- ACG
/
\
/0
0\
GGG
ACG (4)
GGA
GGG (2)
\1
1/
\
1
/
GCA --- GCG
/
\
/1
1\
ACA
ACG (4)
GGA
\2
Number of mutations
Tree I:
Tree II:
GGG (2)
1/
11/20/2013 6:18 AM
Maximum parsimony
4 of 6
(4)
\
0
/
GCG --- GCG
/
\
/1
2\
ACG
ACA (3)
http://www.icp.ucl.ac.be/~opperd/private/parsimony.html
Tree III: 6
To infer a maximum parsimony tree, for each possible tree we calculate the minimum number of
substitutions at each informative site. In the above example, for sites 5, 7, and 9, tree I requires in total 4
changes, tree II requires 5 changes, and tree III requires 6 changes. In the final step, we sum the number
of changes over all the informative sites for each tree and choose the tree associated with the smallest
number of substitutions. In our case, tree I is chosen because it requires the smallest number of changes
(4) at the informative sites.
In the case of four OTUS, an informative site favours only one of the three possible alternative trees. For
example, site 5 favours tree I over trees II and III, and is said to support tree I. It is easy to see that the
tree supported by the largest number of informative sites is the maximum parsimony tree. For instance, in
the above example, tree I is supported by 2 sites, tree II by one site, and tree III by none.
Maximum parsimony searches for the optimal (minimal) tree. In this process more than one minimal trees
may be found. In order to guarantee to find the best possible tree an exhaustive evaluation of all possible
tree topologies has to be carried out. However, this becomes impossible when there are more than 12
OTUs in a dataset.
Branch and Bound: is a variation on maximum parsimony that garantees to find the minimal tree without
having to evaluate all possible trees. This way a larger number of taxa can be evaluated but the method is
still limited.
Heuristic searches is a method with step-wise addition and rearrangement (branch swapping) of OTUs.
Here it is not guaranteed to find the best tree.
Since, in view of the size of the dataset, it is often not possible to carry out an exhaustive or other search
for the best tree, it is adviced to change the order of the taxa in the dataset and to repeat the analysis, or to
indicate to the program to do this for you by providing a so-called jumble factor to the program.
Consensus tree
Since the Maximum Parsimony method may result in more than one equally parsimonious tree, a
consensus tree should be created. For the creation of a consensus tree see bootstrapping.
11/20/2013 6:18 AM
Maximum parsimony
5 of 6
http://www.icp.ucl.ac.be/~opperd/private/parsimony.html
(1)
(2)
A (3)
\1
0/
\
1
/
C -----A
/
\
/0
1\
C
T (4)
(1)
(2)
A (3)
\0
1/
\
1
/
G -----T
/
\
/1
0\
C
T (4)
(1)
(2)
A (3)
\1
1/
\
1
/
C -----A
/
\
/0
0\
C
A (4)
11/20/2013 6:18 AM
Maximum parsimony
6 of 6
http://www.icp.ucl.ac.be/~opperd/private/parsimony.html
11/20/2013 6:18 AM