Vous êtes sur la page 1sur 9

Practical no:

Dated:

Multiple Sequence Alignment using Clustal w.

THEORY:
Clustal is a widely used multiple sequence alignment computer program. The latest
version is 2.1, There are two main variations:

• ClustalW: command line interface

• ClustalX: This version has a graphical user interface. It is available for Windows, Mac OS,
and Unix/Linux.
This program is available from the Clustal Homepage or European Bioinformatics Institute ftp
server. This program accepts a wide range on input format. Included NBRF/PIR, FASTA,
EMBL/Swissprot, Clustal, GCC/MSF, GCG9 RSF, and GDE.The output format can be one or many
of the following: Clustal, NBRF/PIR, GCG/MSF, PHYLIP, GDE, or NEXUS. There are three main
steps:
1. Do a pairwise alignment

2. Create a phylogenetic tree (or use a user-defined tree)

3. Use the phylogenetic tree to carry out a multiple alignment

These are done automatically when you select "Do Complete Alignment". Other options
are "Do Alignment from guide tree" and "Produce guide tree only".Users can align the
sequences using the default setting, but occasionally it may be useful to customize one's own
parameters.The main parameters are the gap opening penalty, and the gap extension penalty.

Steps:
 Fetch the fasta format of nucleotide or protein sequences(more than two) from
www.ncbi.nlm.nih.gov
1
1) Homo sapiens keratin associated
protein 9-1 (KRTAP9-1), mRNA
NCBI Reference Sequence: NM_001190460.1

GenBank Graphics

nuccore 1 fasta

>gi|299473752|ref|NM_001190460.1| Homo sapiens keratin associated protein 9-1


(KRTAP9-1), mRNA
ATGACCCACTGCTGTTCCCCTTGCTGTCAGCCTACATGCTGCAGGACCACCTGCTGCAGGACAACCTGCT
GGAAGCCCACCACTGTGACCACCTGCAGCAGCACACCCTGTTGCCAGCCCTCCTGCTGTGTGCCCAGCTG
CTGCCAGCCTTGCTGCCACCCAACTTGCTGTCAAAACACCTGCTGCAGGACCACCTGCTGCCAGCCCACT
TGTGTGGCCAGCTGCTGCCAGCCTTCCTGCTGCAGCACACCCTGCTGCCAGCCCACCTGCTGTGGGTCCA
GCTGCTGTGGCCAAACCAGCTGTGGGTCCAGCTGCTGTCAGCCTATTTGTGGGTCCAGTTGCTGTCAGCC
TTGCTGTCACCCGACTTGCTATCAAACTATCTGCTTCAGGACCACCTGCTGCCAGCCTACCTGCTGCCAG
CCCACCTGCTGCAGGAACACCTCTTGCCAGCCCACCTGCTGTGGGTCCAGCTGCTGCCAGCCTTGCTGCC
ACCCAACATGCTGTCAAACCATTTGTAGATCCACCTGCTGCCAACCATCCTGTGTGACCAGATGCTGCAG
CACACCCTGTTGCCAGCCAACCTGTGGTGGGTCCAGCTGCTGTAGCCAAACCTGCAATGAGTCCAGCTAT
TGTCTGCCTTGCTGCCGTCCCACCTGCTGCCAGACCACCTGCTACAGGACCACCTGTTGCCGCCCCAGCT
GTTGCTGCAGTCCTTGCTGTGTCTCCAGCTGCTGCCAGCCTTCCTGCTGCTAA

1
2) Mus musculus microRNA 101a
(Mir101a), microRNA
NCBI Reference Sequence: NR_029537.1

GenBank Graphics

nuccore 1 fasta

>gi|262205863|ref|NR_029537.1| Mus musculus microRNA 101a (Mir101a), microRNA


AGGCTGCCCTGGCTCAGTTATCACAGTGCTGATGCTGTCCATTCTAAAGGTACAGTACTGTGATAACTGA
AGGATGGCAGCCA

1
3) Mus musculus microRNA 380
(Mir380), microRNA
NCBI Reference Sequence: NR_029881.1

GenBank Graphics

nuccore 1 fasta

>gi|262206297|ref|NR_029881.1| Mus musculus microRNA 380 (Mir380), microRNA


AAGATGGTTGACCATAGAACATGCGCTACTTCTGTGTCGTATGTAGTATGGTCCACATCTT

4) Mus musculus microRNA 30b (Mir30b), microRNA

NCBI Reference Sequence: NR_029534.1

GenBank Graphics

nuccore 1 fasta

>gi|262205850|ref|NR_029534.1| Mus musculus microRNA 30b (Mir30b), microRNA


CTAAGCCAAGTTTCAGTTCATGTAAACATCCTACACTCAGCTGTCATACATGCGTTGGCTGGGATGTGG
A
TGTTTACGTCAGCTGTCTTGGAGTAT

 Open multiple sequence alignment by clustalw using www.google.com

Multiple Sequence Alignment by


CLUSTALW

CLUSTALW MAFFT PRRN

Help
General Setting Parameters:
CLUSTAL
Output Format:

Pairwise Alignment: FAST/APPROXIMATE SLOW/ACCURATE

Enter your sequences (with labels) below (copy & paste): PROTEIN DNA
Support Formats: FASTA (Pearson), NBRF/PIR, EMBL/Swiss Prot, GDE, CLUSTAL, and GCG/MSF

Or give the file name containing your query

Execute Multiple Alignment Reset

More Detail Parameters...


Pairwise Alignment Parameters:

For FAST/APPROXIMATE:
1 5 3
K-tuple(word) size: , Window size: , Gap Penalty:
5 PERCENT
Number of Top Diagonals: , Scoring Method:

For SLOW/ACCURATE:
10.0 0.1
Gap Open Penalty: , Gap Extension Penalty:
BLOSUM (for PROTEIN)
Select Weight Matrix:

(Note that only parameters for the algorithm specified by the above "Pairwise Alignment" are valid.)

Multiple Alignment Parameters:

10 0.05
Gap Open Penalty: , Gap Extension Penalty:

0.5
Weight Transition: YES (Value: ), NO
GPSNDQE
Hydrophilic Residues for Proteins:

Hydrophilic Gaps: YES NO

BLOSUM (for PROTEIN)


Select Weight Matrix:

Type additional options (delimited by whitespaces) below:


(-options for help)

clustalw Execute Multiple Alignment Reset

 Paste the fasta format of nucleotide or protein sequences in clustal w program

CLUSTAL
Output Format:

Pairwise Alignment: FAST/APPROXIMATE SLOW/ACCURATE

Enter your sequences (with labels) below (copy & paste): PROTEIN DNA

Support Formats: FASTA (Pearson), NBRF/PIR, EMBL/Swiss Prot, GDE, CLUSTAL, and
GCG/MSF
>gi|262205863|ref|NR_029537.1| Mus musculus microRNA 101a
AGGCTGCCCTGGCTCAGTTATCACAGTGCTGATGCTGTCCATT
AGGATGGCAGCCA

Or give the file name containing your query

Execute Multiple Alignment Reset

 Click on execute multiple alignment

CLUSTALW Result
[clustalw.aln][clustalw.dnd][readme]
CLUSTAL W (1.81) Multiple Sequence Alignments

Sequence type explicitly set to DNA


Sequence format is Pearson
Sequence 1: gi|262205850|ref|NR_029534.1|_ 96 bp
Sequence 2: gi|262206297|ref|NR_029881.1|_ 61 bp
Sequence 3: gi|299473752|ref|NM_001190460. 753 bp
Sequence 4: gi|262205863|ref|NR_029537.1|_ 83 bp
Start of Pairwise alignments

Start of Pairwise alignments


Aligning...
Sequences (1:2) Aligned. Score: 13
Sequences (2:3) Aligned. Score: 19
Sequences (2:4) Aligned. Score: 24
Sequences (3:4) Aligned. Score: 16
Sequences (1:3) Aligned. Score: 14
Sequences (1:4) Aligned. Score: 12
Guide tree file created: [clustalw.dnd]
Start of Multiple Alignment
There are 3 groups
Aligning...
Group 1: Delayed
Group 2: Delayed
Group 3: Delayed
Sequence:4 Score:590
Sequence:3 Score:691
Sequence:1 Score:734
Alignment Score 675
CLUSTAL-Alignment file created [clustalw.aln]

clustalw.aln

CLUSTAL W (1.81) multiple sequence alignment

gi|262206297|ref|NR_029881.1|_ --------------------------------------------------
gi|262205863|ref|NR_029537.1|_ --------------------------------------------------
gi|299473752|ref|NM_001190460. ATGACCCACTGCTGTTCCCCTTGCTGTCAGCCTACATGCTGCAGGACCAC
gi|262205850|ref|NR_029534.1|_ --------------------------------------------------

gi|262206297|ref|NR_029881.1|_ --------------------------------------------------
gi|262205863|ref|NR_029537.1|_ --------------------------------------------------
gi|299473752|ref|NM_001190460. CTGCTGCAGGACAACCTGCTGGAAGCCCACCACTGTGACCACCTGCAGCA
gi|262205850|ref|NR_029534.1|_ --------------------------------------------------

gi|262206297|ref|NR_029881.1|_ --------------------------------------------------
gi|262205863|ref|NR_029537.1|_ --------------------------------------------------
gi|299473752|ref|NM_001190460. GCACACCCTGTTGCCAGCCCTCCTGCTGTGTGCCCAGCTGCTGCCAGCCT
gi|262205850|ref|NR_029534.1|_ --------------------------------------------------

gi|262206297|ref|NR_029881.1|_ --------------------------------------------------
gi|262205863|ref|NR_029537.1|_ --------------------------------------------------
gi|299473752|ref|NM_001190460. TGCTGCCACCCAACTTGCTGTCAAAACACCTGCTGCAGGACCACCTGCTG
gi|262205850|ref|NR_029534.1|_ --------------------------------------------------

gi|262206297|ref|NR_029881.1|_ --------------------------------------------------
gi|262205863|ref|NR_029537.1|_ --------------------------------------------------
gi|299473752|ref|NM_001190460. CCAGCCCACTTGTGTGGCCAGCTGCTGCCAGCCTTCCTGCTGCAGCACAC
gi|262205850|ref|NR_029534.1|_ --------------------------------------------------

gi|262206297|ref|NR_029881.1|_ --------------------------------------------------
gi|262205863|ref|NR_029537.1|_ --------------------------------------------------
gi|299473752|ref|NM_001190460. CCTGCTGCCAGCCCACCTGCTGTGGGTCCAGCTGCTGTGGCCAAACCAGC
gi|262205850|ref|NR_029534.1|_ --------------------------------------------------

gi|262206297|ref|NR_029881.1|_ --------------------------------------------------
gi|262205863|ref|NR_029537.1|_ --------------------------------------------------
gi|299473752|ref|NM_001190460. TGTGGGTCCAGCTGCTGTCAGCCTATTTGTGGGTCCAGTTGCTGTCAGCC
gi|262205850|ref|NR_029534.1|_ --------------------------------------------------

gi|262206297|ref|NR_029881.1|_ --------------------------------------------------
gi|262205863|ref|NR_029537.1|_ --------------------------------------------------
gi|299473752|ref|NM_001190460. TTGCTGTCACCCGACTTGCTATCAAACTATCTGCTTCAGGACCACCTGCT
gi|262205850|ref|NR_029534.1|_ --------------------------------------------------

gi|262206297|ref|NR_029881.1|_ --------------------------------------------------
gi|262205863|ref|NR_029537.1|_ --------------------------------------------------
gi|299473752|ref|NM_001190460. GCCAGCCTACCTGCTGCCAGCCCACCTGCTGCAGGAACACCTCTTGCCAG
gi|262205850|ref|NR_029534.1|_ --------------------------------------------------

gi|262206297|ref|NR_029881.1|_ --------------------------------------------------
gi|262205863|ref|NR_029537.1|_ --------------------------------------------------
gi|299473752|ref|NM_001190460. CCCACCTGCTGTGGGTCCAGCTGCTGCCAGCCTTGCTGCCACCCAACATG
gi|262205850|ref|NR_029534.1|_ --------------------------------------------------

gi|262206297|ref|NR_029881.1|_ --------------------------------------------------
gi|262205863|ref|NR_029537.1|_ --------------------------------------------------
gi|299473752|ref|NM_001190460. CTGTCAAACCATTTGTAGATCCACCTGCTGCCAACCATCCTGTGTGACCA
gi|262205850|ref|NR_029534.1|_ --------------------------------------------------

gi|262206297|ref|NR_029881.1|_ --------------------------------------------------
gi|262205863|ref|NR_029537.1|_ --------------------------------------------------
gi|299473752|ref|NM_001190460. GATGCTGCAGCACACCCTGTTGCCAGCCAACCTGTGGTGGGTCCAGCTGC
gi|262205850|ref|NR_029534.1|_ --------------------------------------------------

gi|262206297|ref|NR_029881.1|_ --------------------------------------------------
gi|262205863|ref|NR_029537.1|_ --------------------------------------------------
gi|299473752|ref|NM_001190460. TGTAGCCAAACCTGCAATGAGTCCAGCTATTGTCTGCCTTGCTGCCGTCC
gi|262205850|ref|NR_029534.1|_ -----------------------------------------------CTA

gi|262206297|ref|NR_029881.1|_ ----------AAGATGGT-TGACCATAGAAC----ATGCGCTACTTCTGT
gi|262205863|ref|NR_029537.1|_ -AGGCTGCCCTGGCTCAG-TTATCACAGTGC----TGATGCTGTCCATTC
gi|299473752|ref|NM_001190460. CACCTGCTGCCAGACCAC-CTGCTACAGGACCACCTGTTGCCGCCCCAGC
gi|262205850|ref|NR_029534.1|_ AGCCAAGTTTCAGTTCATGTAAACATCCTACACTCAGCTGTCATACATGC
* * * *

gi|262206297|ref|NR_029881.1|_ GTCGTATGTAGTATGGTCCACATCTT------------------------
gi|262205863|ref|NR_029537.1|_ TAAAGGTACAGTACTGTG-ATAACTGAAG--GATGGCAGCCA--------
gi|299473752|ref|NM_001190460. TGTTGCTGCAGTCCTTGCTGTGTCTCCAGCTGCTGCCAGCCTTCCTGCTG
gi|262205850|ref|NR_029534.1|_ GTTGGCTGGGATGTGGATGTTTACGTCAGCTGTCTTGGAGTAT-------
* * *

gi|262206297|ref|NR_029881.1|_ ----
gi|262205863|ref|NR_029537.1|_ ----
gi|299473752|ref|NM_001190460. CTAA
gi|262205850|ref|NR_029534.1|_ ----

clustalw.dnd

(
(
gi|262205850|ref|NR_029534.1|_:0.45552,
gi|299473752|ref|NM_001190460.:0.39864)
:0.04161,
gi|262206297|ref|NR_029881.1|_:0.36737,
gi|262205863|ref|NR_029537.1|_:0.38673);

Result interpretation:

Multiple sequence alignment of four nucleotide sequences results is shown in above data.
Sequence format is pearson and sequence 1,2,3 and 4 is 96,61,753 and 83 bp long respectively.
Pairwise alignments results,
Sequences (1:2) Aligned. Score: 13
Sequences (2:3) Aligned. Score: 19
Sequences (2:4) Aligned. Score: 24
Sequences (3:4) Aligned. Score: 16
Sequences (1:3) Aligned. Score: 14
Sequences (1:4) Aligned. Score: 12

Multiple alignment results,


Sequence:4 Score:590
Sequence:3 Score:691
Sequence:1 Score:734
Alignment Score 675

* shows the similarity in nucleotide of four nucleotide sequences. Eg,


----------AAGATGGT-TGACCATAGAAC----ATGCGCTACTTCTGT
-AGGCTGCCCTGGCTCAG-TTATCACAGTGC----TGATGCTGTCCATTC
CACCTGCTGCCAGACCAC-CTGCTACAGGACCACCTGTTGCCGCCCCAGC
AGCCAAGTTTCAGTTCATGTAAACATCCTACACTCAGCTGTCATACATGC
* * * *

Vous aimerez peut-être aussi