Vous êtes sur la page 1sur 7

TREE CONSTRUCTION

1. Definition of a phylogenetic tree


2. Features of a phylogenetic tree
ranches
No!es "E#ternal $ Internal%
&. Unroote! trees
'. Roote! trees
Choice of an outgroup
(. Inferre! an! true trees
). *ene trees are not the sa+e as species trees
,. Tree reconstruction
,.1 -olecular se.uences
,.2 Se.uence align+ent is the essential preli+inary to tree reconstruction
,.& Con/erting the align+ent !ata into a phylogenetic tree
,.' 0ssessing accuracy of a reconstructe! tree
,.( -olecular cloc1s ena2le the ti+e of !i/ergence of ancestral se.uences to 2e
esti+ate!
3. The applications "e#a+ples% of +olecular phylogenetics
Clarifying e/olutionary relationships 2et4een hu+ans $ other pri+ates
The ol!est life on earth "rRN0 an! phylogeny%
The origin of 0i!s
5ro2le+s 4ith prion
-olecular phylogeny as a tool of the stu!y of hu+an prehistory
Intraspecific stu!ies re.uire highly /aria2le genetic loci
The origins of +o!ern hu+ans 6 out of 0frica or not7
The pattern of +ore recent +igration into Europe are also contro/ersial
5rehistoric hu+an +igration into the ne4 4orl!
E#ercises for !ra4ing a phylogenetic tree
Soft4are pac1ages for reconstruction of phylogenetic trees
1. Definition of a phylogenetic tree
A tree is an acyclic connected graph that consists of a collection of nodes (internal and external)
and branches connecting them so that every node can be reached by a unique path from every
other branch.
Figure: An unrooted phylogenetic tree joining 4 taxonomic units.
2. Features of a phylogenetic tree
n the area of phylogenetic inference! trees are used as visual displays that represent
hypothetical! reconstructed evolutionary events. "he tree in this case consists of:
internal nodes #hich represent taxonomic units such as species or genes$ the external nodes!
those at the ends of the branches! represent living organisms.
"he lengths of the branches usually represent an elapsed time! measured in years! or the
length of the branches may represent number of molecular changes (e.g. mutations) that
have ta%en place bet#een the t#o nodes. "his is calculated is from the degree of differences
#hen sequences are compared (refer to &alignments' later)
(ometimes! the lengths are irrelevant and the tree represents only the order of evolution. )n
a dendrogram! only the lengths of hori*ontal (or vertical! as the case may be) branches
count+.
Finally the tree may be rooted or unrooted.
&. Unroote! trees
An unrooted tree simply represents phylogenetic but doesnot provide an evolutionary path. n
an unroote! tree! an external node represents a contemporary organism. nternal nodes
represent common ancestors of some of the external nodes. n this case! the tree sho#s the
relationship bet#een organisms A! ,! - . / and does not tell us anything about the series of
evolutionary events that led to these genes (see figure above). "here is also no #ay to tell
#hether or not a given internal node is a common ancestor of any 0 external nodes.
'. Roote! trees
1ene trees are not the same as species trees
n case of a roote! tree! one of the internal nodes is used as an outgroup! and! in essence!
becomes the common ancestor of all the other external nodes. "he outgroup therefore enables
the root of a tree to be located and the correct evolutionary path#ay to be identified. n the
above case! five different evolutionary path#ays are possible using an outgroup! each depicted
by a different roote! tree.
0

D
C ranches
Internal no!es
E#ternal no!es
&
(
2
'
1
C

D
0
C
D
&
0
(

C
D
0
1
D

0
C
'

D
C
0
2
D
0

C
2nrooted tree
(. Inferre! an! true trees
"he criteria used to choose an outgroup depends very much on the type of analysis that is
carried out. (uppose that 4 homologous (orthologous) genes in a tree come from human!
chimpan*ee! gorilla and orangutan. A useful homologous primate outgroup sequence is that
from baboon as palaeontological evidence suggests that baboons branched a#ay from the
lineage leading to human! chimpan*ee! gorilla and orangutan before the time of the common
ancestor of the four species (figure belo#).
Figure: "he use of an outgroup to root a phylogenetic tree.
3e refer to the rooted tree given above! as an inferre! tree. "his is to emphasise that it depicts the
series of evolutionary events that are inferred from the data that #ere analysed! and may not
necessarily be the same as the true tree! the one that depicts the actual series of events that
occurred. (ometimes #e can be fairly confident that the inferred tree is the true tree but most
phylogenetic data analysis are prone to uncertainties. /egrees of confidence can be assigned to the
Figure. "he five rooted trees that can be dra#n from the unrooted tree (box). "he positions of
the roots are indicated by the number on the outline of the unrooted tree (box)
,aboon
4rangutan
1orilla
5uman
-himpan*ee
branching patterns in an inferred tree using bootstrap analysis (discussed in a later section). /ue to
the imprecise nature of phylogenetic analysis controversies have arisen.
). *ene trees are not the sa+e as species trees
"he above tree is a gene tree i.e. a tree derived by comparing orthologous sequences (those
derived from the same ancestral sequence). "he assumption is that this gene tree is a more accurate
reflection of a species tree than the one that can be inferred from morphological data. "his
assumption is generally correct but it does not mean that the gene tree is the same as a species tree.
6utation and speciation are not expected to occur at the same time. For example! the mutation
event could precede the speciation event. "his #ould mean that! to begin #ith! both alleles #ill
still be present in the same unsplit population of the ancestral species. 3hen the population split
occurs! it is li%ely that both alleles #ill be present in each of the resulting groups. After the split!
the ne# population evolve independently. 4ne possibility is that as a result of random genetic drift
loss of one allele from one population and the loss of the other allele from the second population
occurs. "his establishes the t#o separate genetic lineages that #ere inferred from phylogenetic
analysis of the gene. 5o# do these considerations affect the coincidence bet#een a gene and a
species tree7
(a) f a molecular cloc% is used to date the time at #hich gene divergence too% place! than it cannot
be assumed that this is also the time of the speciation event. A significant difference bet#een a
gene and a species event can exist though the species tree . gene tree loo% the same (see 85(
figure a belo#).
(b) f the first speciation event is follo#ed closely by a second speciation event in one of the t#o
populations! then the branching order of the gene tree might be different to that of the species
tree. "his can occur if the genes in the modern species are derived from alleles that had already
appeared before the first of the t#o speciation (95( Figure! belo#)
Allele loss
A
A , - A , -
A , A ,
,b

(peciation
6utation
1ene tree . species tree loo% the same. 5o#ever!
mutation might precede speciation giving an
incorrect time for the latter if a molecular cloc% is
used

(peciation
6utation
A gene tree can have a different branching order
from a species tree
(peciation
6utation
A , -
,
,. Tree reconstruction
n any molecular phylogenetic reconstruction the follo#ing 4 points need to be addressed.
6olecular sequences
(equence alignment is the essential preliminary to tree reconstruction
-onverting the alignment data into a phylogenetic tree
Assessing accuracy of a reconstructed tree
6olecular cloc%s enable the time of divergence of ancestral sequences to be estimated
,.1 -olecular se.uences
:ucleic acids (r9:A! /:A) and protein sequences are used in molecular phylogenetic tree
construction. /:A yields more phylogenetic information than /:A and has become by the far
predominant molecule for phylogeny:
6ore statistical information from /:A data: "he nucleotide sequences of a pair of
homologous genes has a higher information content than the amino acid of the
corresponding proteins! because mutation that result in nonsynchrononymous
changes affect only the /:A sequence. 5ence coding as #ell as non;coding regions
of the genome can be examined. 3rite out the /:A sequences or the follo#ing t#o
amino acids as an example of this. <ou can see that at the protein level there is only
= difference but at the nucleic acid level there are > differences.

Protein1 -gly-ala-ile-leu-asp-arg-
DNA1 -gga-gcc-ata-tta-gat-aga
DNA2 -gga-gca-att-ttt-gat-aga-
Protein2 -gly-ala-ile-phe-asp-arg-

?ase of sequencing /:A: (amples for /:A sequencing can be prepared by @-9
#hich is an extremely easy technique.
@rotein electrograms! 9estriction fragment length polymorphism (9F8@)! (imple sequence
length polymorphism (((8@)! (ingle nucleotide polymorphism ((:@) and /:A;/:A
hybrida*ation data have also been used for molecular phylogenetic reconstruction.
mmunological data from cross;reactivity studies #ere used in =AB4 for such #or% as #ell.
,.2 Se.uence align+ent is the essential preli+inary to tree construction.
"his is the most important step in molecular phylogeny and a number of issues have to be
considered:
(equence 5omologs: (equences that are to be aligned should be homologs. An example of
this are the ;globin genes of different vertebrates. "his is to satisfy the phylogeny criteria
#hich states that the sequence should be derived from an common ancestral sequence.
:on;homologous sequences: f the sequences are not homologous and hence do not share a
common ancestor phylogenetic construction methods #ill al#ays produce a tree but the tree
#ill not be of any biological relevance. "his type of error commonly occurs #hen
underta%ing homology analysis to assign functions to ne#ly generated gene sequences.
last is used extensively as on of the homology analysis methods and hence interpretation
of the data arising from the analysis should be underta%en #ith care.
?asy alignments: -orrectly aligning the homologous sequence is the next tas%. n some
cases it is an easy tas%. A simple sequence alignment is sho#n belo#:
Sequence 1 AGCAATGGCCAGACAATAATG
Sequence 2 AGCTATGGACAGACATTAATG
*** **** ****** *****
/ifficult alignments: f sequences have evolved and diverged by accumulating insertions
and deletions as #ell as point mutations! then these sequence are not al#ays easy to align.
nsertions and deletions cannot be distinguished #hen pairs of sequences are aligned so #e
refer to them as in!els ,elo# is a pair of difficult sequences for alignment #here placing
the indel at the correct location can become a problem.
Sequence 1 GACGACCATAGACCAGCATAG
Sequence 2 GACTACCATAGA-CTGCAAAG
*** ******** * *** **

Sequence 1 GACGACCATAGACCAGCATAG
Sequence 2 GACTACCATAGACT-GCAAAG
*** ********* *** **
"he dot matrix technique for alignment: (ome alignments can be easily done by Ceye
ballingC the sequences yet others may require a pen and paper. "he simplest is %no#n as the
!ot +atri# method. "he t#o sequences are #ritten out on the x; and y; axes of the graph
paper at the positions corresponding to the identical nucleotides of the t#o sequences. "he
alignment is indicated by a diagonal series of dots bro%en by empty squares #here the
sequences have nucleotide differences! and shifting from one column to another #here
indels occur.
A G A C A T T T A G A C C A A
Figure: "he dot matrix technique for sequence alignments
(imilarity approach is a mathematical based alignment technique: "he si+ilarity approach
(:eedleman and 3unesh! =ADB) aims to maximise the number of identical matched
nucleotides in the t#o sequences. "he !istance +etho!! (3aterman! =ADE) on the other
hand! minimises the number of mismatches. 4ften the t#o approaches #ill identify the same
alignment as being the best one.
"#o possible positions for
the indel
A
G
A
C
A
T
T
A
G
A
C
C
A
A
6ultiple alignments are generated for more then t#o sequences: 9arely can one do multiple
alignments #ith a pen and paper and all the steps required for phylogenetic analysis is
underta%en on a computer. For automatically generating multiple alignments several
computer programs are available (discussed later)
r9:A genes (a%a r/:A) and r9:A have been used as molecular chronometers and
phylogentetic studies underta%en. 9efer to the section on r9:A for detailed notes on the
methods of aligning these types of nucleic acids.
,.& Con/erting the align+ent !ata into a phylogenetic tree

Vous aimerez peut-être aussi