Académique Documents
Professionnel Documents
Culture Documents
Alignment
Alignment can be easy or
difficult
GCGGCCCA TCAGGTAGTT GGTGG
GCGGCCCA TCAGGTAGTT GGTGG
GCGTTCCA TCAGCTGGTT GGTGG Easy
GCGTCCCA TCAGCTAGTT GGTGG
GCGGCGCA TTAGCTAGTT GGTGA
******** ********** *****
Difficult due
TTGACATG CCGGGG---A AACCG
TTGACATG CCGGTG--GT AAGCC to insertions
TTGACATG -CTAGG---A ACGCG or deletions
TTGACATG -CTAGGGAAC ACGCG (indels)
TTGACATC -CTCTG---A ACGCG
******** ?????????? *****
Homology: Definition
• Homology: similarity that is the result of inheritance from a
common ancestor - identification and analysis of homologies is
central to phylogenetic systematics.
• An Alignment is an hypothesis of positional homology between
bases/Amino Acids.
Multiple Sequence Alignment-
Goals
• To generate a concise, information-rich summary of
sequence data.
• Sometimes used to illustrate the dissimilarity
between a group of sequences.
• Alignments can be treated as models that can be
used to test hypotheses.
• Does this model of events accurately reflect known
biological evidence.
Alignment of 16S rRNA can be guided
by secondary structure
<---------------(--------------------HELIX 19---------------------)
<---------------(22222222-000000-111111-00000-111111-0000-22222222
Thermus ruber UCCGAUGC-UAAAGA-CCGAAG=CUCAA=CUUCGG=GGGU=GCGUUGGA
Th. thermophilus UCCCAUGU-GAAAGA-CCACGG=CUCAA=CCGUGG=GGGA=GCGUGGGA
E.coli UCAGAUGU-GAAAUC-CCCGGG=CUCAA=CCUGGG=AACU=GCAUCUGA
Ancyst.nidulans UCUGUUGU-CAAAGC-GUGGGG=CUCAA=CCUCAU=ACAG=GCAAUGGA
B.subtilis UCUGAUGU-GAAAGC-CCCCGG=CUCAA=CCGGGG=AGGG=UCAUUGGA
Chl.aurantiacus UCGGCGCU-GAAAGC-GCCCCG=CUUAA=CGGGGC=GAGG=CGCGCCGA
match ** *** * ** ** * **
• Progressive alignment
Progressive Alignment
• Devised by Feng and Doolittle in 1987.
• Essentially a heuristic method and as such
is not guaranteed to find the ‘optimal’
alignment.
• Requires n-1+n-2+n-3...n-n+1 pairwise
alignments as a starting point
• Most successful implementation is Clustal
(Des Higgins)
Overview of ClustalW Procedure
Hbb_Human 1 - CLUSTAL W
Hbb_Horse 2 .17 -
Hba_Human 3 .59 .60 -
Hba_Horse 4 .59 .59 .13 - Quick pairwise alignment:
Myg_Whale 5 .77 .77 .75 .75 -
calculate distance matrix
Hbb_Human
1 3 4
Hbb_Horse
Hba_Human
2
Neighbor-joining tree
Hba_Horse (guide tree)
Myg_Whale
alpha-helices
1 PEEKSAVTALWGKVN--VDEVGG 1 3 4
2 GEEKAAVLALWDKVN--EEEVGG
3 PADKTNVKAAWGKVGAHAGEYGA Progressive alignment
2
4 AADKTNVKAAWSKVGGHAGEYGA following guide tree
5 EHEWQLVLHVWAKVEADVAGHGQ
ClustalW- Pairwise Alignments
-1
Alignment using this path
1 GATTC-
0 GAATTC
1
-1
Optimal Alignment 1
1 Alignment using
this path
1
GA-TTC
GAATTC
-1
1
Alignment score: 4
1
Optimal Alignment 2
1 Alignment using
this path
-1
G-ATTC
GAATTC
1
1
Alignment score: 4
1
ClustalW- Guide Tree
• Generate a Neighbor-Joining
‘guide tree’ from these pairwise
distances.
• This guide tree gives the order
in which the progressive
alignment will be carried out.
Neighbor joining method
•The neighbor joining method is a greedy heuristic which
joins at each step, the two closest sub-trees that are not
already joined.
•It is based on the minimum evolution principle.
•One of the important concepts in the NJ method is
neighbors, which are defined as two taxa that are
connected by a single node in an unrooted tree
Node 1
A B
What is required for the Neighbour joining method?
Distance matrix
Distance Matrix
PAM Spinach Rice Mosquito Monkey Human
Spinach 0.0 84.9 105.6 90.8 86.3
Rice 84.9 0.0 117.8 122.4 122.6
Mosquito 105.6 117.8 0.0 84.7 80.8
Monkey 90.8 122.4 84.7 0.0 3.3
Human 86.3 122.6 80.8 3.3 0.0
First Step
PAM distance 3.3 (Human - Monkey) is the minimum. So we'll
join Human and Monkey to MonHum and we'll calculate the new
distances.
Mon-Hum
Mon-Hum
Mos-(Mon-Hum)
Mon-Hum
Mos-(Mon-Hum)
Spin-Rice
Mon-Hum
(Spin-Rice)-(Mos-(Mon-Hum))
Mos-(Mon-Hum)
Spin-Rice
Mon-Hum
Human
Spinach
Monkey
Mosquito
Rice
Multiple Alignment- First pair
• Align the two most closely-related
sequences first.
• This alignment is then ‘fixed’ and
will never change. If a gap is to be
introduced subsequently, then it will
be introduced in the same place in
both sequences, but their relative
alignment remains unchanged.
ClustalW- Decision time
• Next, consult the guide tree to see what alignment is
performed next.
– Align a third sequence to the first two
Or
– Align two entirely different sequences to each other.
Option 1 Option 2
ClustalW- Alternative 1
If the situation arises
where a third sequence is
aligned to the first two, +
then when a gap has to be
introduced to improve the
alignment, each of these
two entities are treated as
two single sequences.
ClustalW- Alternative 2
• If, on the other hand,
two separate sequences
have to be aligned +
together, then the first
pairwise alignment is
placed to one side and the
pairwise alignment of the
other two is carried out.
ClustalW- Progression