Académique Documents
Professionnel Documents
Culture Documents
Dated:
THEORY:
Clustal is a widely used multiple sequence alignment computer program. The latest
version is 2.1, There are two main variations:
• ClustalX: This version has a graphical user interface. It is available for Windows, Mac OS,
and Unix/Linux.
This program is available from the Clustal Homepage or European Bioinformatics Institute ftp
server. This program accepts a wide range on input format. Included NBRF/PIR, FASTA,
EMBL/Swissprot, Clustal, GCC/MSF, GCG9 RSF, and GDE.The output format can be one or many
of the following: Clustal, NBRF/PIR, GCG/MSF, PHYLIP, GDE, or NEXUS. There are three main
steps:
1. Do a pairwise alignment
These are done automatically when you select "Do Complete Alignment". Other options
are "Do Alignment from guide tree" and "Produce guide tree only".Users can align the
sequences using the default setting, but occasionally it may be useful to customize one's own
parameters.The main parameters are the gap opening penalty, and the gap extension penalty.
Steps:
Fetch the fasta format of nucleotide or protein sequences(more than two) from
www.ncbi.nlm.nih.gov
1
1) Homo sapiens keratin associated
protein 9-1 (KRTAP9-1), mRNA
NCBI Reference Sequence: NM_001190460.1
GenBank Graphics
nuccore 1 fasta
1
2) Mus musculus microRNA 101a
(Mir101a), microRNA
NCBI Reference Sequence: NR_029537.1
GenBank Graphics
nuccore 1 fasta
1
3) Mus musculus microRNA 380
(Mir380), microRNA
NCBI Reference Sequence: NR_029881.1
GenBank Graphics
nuccore 1 fasta
GenBank Graphics
nuccore 1 fasta
Help
General Setting Parameters:
CLUSTAL
Output Format:
Enter your sequences (with labels) below (copy & paste): PROTEIN DNA
Support Formats: FASTA (Pearson), NBRF/PIR, EMBL/Swiss Prot, GDE, CLUSTAL, and GCG/MSF
For FAST/APPROXIMATE:
1 5 3
K-tuple(word) size: , Window size: , Gap Penalty:
5 PERCENT
Number of Top Diagonals: , Scoring Method:
For SLOW/ACCURATE:
10.0 0.1
Gap Open Penalty: , Gap Extension Penalty:
BLOSUM (for PROTEIN)
Select Weight Matrix:
(Note that only parameters for the algorithm specified by the above "Pairwise Alignment" are valid.)
10 0.05
Gap Open Penalty: , Gap Extension Penalty:
0.5
Weight Transition: YES (Value: ), NO
GPSNDQE
Hydrophilic Residues for Proteins:
CLUSTAL
Output Format:
Enter your sequences (with labels) below (copy & paste): PROTEIN DNA
Support Formats: FASTA (Pearson), NBRF/PIR, EMBL/Swiss Prot, GDE, CLUSTAL, and
GCG/MSF
>gi|262205863|ref|NR_029537.1| Mus musculus microRNA 101a
AGGCTGCCCTGGCTCAGTTATCACAGTGCTGATGCTGTCCATT
AGGATGGCAGCCA
CLUSTALW Result
[clustalw.aln][clustalw.dnd][readme]
CLUSTAL W (1.81) Multiple Sequence Alignments
clustalw.aln
gi|262206297|ref|NR_029881.1|_ --------------------------------------------------
gi|262205863|ref|NR_029537.1|_ --------------------------------------------------
gi|299473752|ref|NM_001190460. ATGACCCACTGCTGTTCCCCTTGCTGTCAGCCTACATGCTGCAGGACCAC
gi|262205850|ref|NR_029534.1|_ --------------------------------------------------
gi|262206297|ref|NR_029881.1|_ --------------------------------------------------
gi|262205863|ref|NR_029537.1|_ --------------------------------------------------
gi|299473752|ref|NM_001190460. CTGCTGCAGGACAACCTGCTGGAAGCCCACCACTGTGACCACCTGCAGCA
gi|262205850|ref|NR_029534.1|_ --------------------------------------------------
gi|262206297|ref|NR_029881.1|_ --------------------------------------------------
gi|262205863|ref|NR_029537.1|_ --------------------------------------------------
gi|299473752|ref|NM_001190460. GCACACCCTGTTGCCAGCCCTCCTGCTGTGTGCCCAGCTGCTGCCAGCCT
gi|262205850|ref|NR_029534.1|_ --------------------------------------------------
gi|262206297|ref|NR_029881.1|_ --------------------------------------------------
gi|262205863|ref|NR_029537.1|_ --------------------------------------------------
gi|299473752|ref|NM_001190460. TGCTGCCACCCAACTTGCTGTCAAAACACCTGCTGCAGGACCACCTGCTG
gi|262205850|ref|NR_029534.1|_ --------------------------------------------------
gi|262206297|ref|NR_029881.1|_ --------------------------------------------------
gi|262205863|ref|NR_029537.1|_ --------------------------------------------------
gi|299473752|ref|NM_001190460. CCAGCCCACTTGTGTGGCCAGCTGCTGCCAGCCTTCCTGCTGCAGCACAC
gi|262205850|ref|NR_029534.1|_ --------------------------------------------------
gi|262206297|ref|NR_029881.1|_ --------------------------------------------------
gi|262205863|ref|NR_029537.1|_ --------------------------------------------------
gi|299473752|ref|NM_001190460. CCTGCTGCCAGCCCACCTGCTGTGGGTCCAGCTGCTGTGGCCAAACCAGC
gi|262205850|ref|NR_029534.1|_ --------------------------------------------------
gi|262206297|ref|NR_029881.1|_ --------------------------------------------------
gi|262205863|ref|NR_029537.1|_ --------------------------------------------------
gi|299473752|ref|NM_001190460. TGTGGGTCCAGCTGCTGTCAGCCTATTTGTGGGTCCAGTTGCTGTCAGCC
gi|262205850|ref|NR_029534.1|_ --------------------------------------------------
gi|262206297|ref|NR_029881.1|_ --------------------------------------------------
gi|262205863|ref|NR_029537.1|_ --------------------------------------------------
gi|299473752|ref|NM_001190460. TTGCTGTCACCCGACTTGCTATCAAACTATCTGCTTCAGGACCACCTGCT
gi|262205850|ref|NR_029534.1|_ --------------------------------------------------
gi|262206297|ref|NR_029881.1|_ --------------------------------------------------
gi|262205863|ref|NR_029537.1|_ --------------------------------------------------
gi|299473752|ref|NM_001190460. GCCAGCCTACCTGCTGCCAGCCCACCTGCTGCAGGAACACCTCTTGCCAG
gi|262205850|ref|NR_029534.1|_ --------------------------------------------------
gi|262206297|ref|NR_029881.1|_ --------------------------------------------------
gi|262205863|ref|NR_029537.1|_ --------------------------------------------------
gi|299473752|ref|NM_001190460. CCCACCTGCTGTGGGTCCAGCTGCTGCCAGCCTTGCTGCCACCCAACATG
gi|262205850|ref|NR_029534.1|_ --------------------------------------------------
gi|262206297|ref|NR_029881.1|_ --------------------------------------------------
gi|262205863|ref|NR_029537.1|_ --------------------------------------------------
gi|299473752|ref|NM_001190460. CTGTCAAACCATTTGTAGATCCACCTGCTGCCAACCATCCTGTGTGACCA
gi|262205850|ref|NR_029534.1|_ --------------------------------------------------
gi|262206297|ref|NR_029881.1|_ --------------------------------------------------
gi|262205863|ref|NR_029537.1|_ --------------------------------------------------
gi|299473752|ref|NM_001190460. GATGCTGCAGCACACCCTGTTGCCAGCCAACCTGTGGTGGGTCCAGCTGC
gi|262205850|ref|NR_029534.1|_ --------------------------------------------------
gi|262206297|ref|NR_029881.1|_ --------------------------------------------------
gi|262205863|ref|NR_029537.1|_ --------------------------------------------------
gi|299473752|ref|NM_001190460. TGTAGCCAAACCTGCAATGAGTCCAGCTATTGTCTGCCTTGCTGCCGTCC
gi|262205850|ref|NR_029534.1|_ -----------------------------------------------CTA
gi|262206297|ref|NR_029881.1|_ ----------AAGATGGT-TGACCATAGAAC----ATGCGCTACTTCTGT
gi|262205863|ref|NR_029537.1|_ -AGGCTGCCCTGGCTCAG-TTATCACAGTGC----TGATGCTGTCCATTC
gi|299473752|ref|NM_001190460. CACCTGCTGCCAGACCAC-CTGCTACAGGACCACCTGTTGCCGCCCCAGC
gi|262205850|ref|NR_029534.1|_ AGCCAAGTTTCAGTTCATGTAAACATCCTACACTCAGCTGTCATACATGC
* * * *
gi|262206297|ref|NR_029881.1|_ GTCGTATGTAGTATGGTCCACATCTT------------------------
gi|262205863|ref|NR_029537.1|_ TAAAGGTACAGTACTGTG-ATAACTGAAG--GATGGCAGCCA--------
gi|299473752|ref|NM_001190460. TGTTGCTGCAGTCCTTGCTGTGTCTCCAGCTGCTGCCAGCCTTCCTGCTG
gi|262205850|ref|NR_029534.1|_ GTTGGCTGGGATGTGGATGTTTACGTCAGCTGTCTTGGAGTAT-------
* * *
gi|262206297|ref|NR_029881.1|_ ----
gi|262205863|ref|NR_029537.1|_ ----
gi|299473752|ref|NM_001190460. CTAA
gi|262205850|ref|NR_029534.1|_ ----
clustalw.dnd
(
(
gi|262205850|ref|NR_029534.1|_:0.45552,
gi|299473752|ref|NM_001190460.:0.39864)
:0.04161,
gi|262206297|ref|NR_029881.1|_:0.36737,
gi|262205863|ref|NR_029537.1|_:0.38673);
Result interpretation:
Multiple sequence alignment of four nucleotide sequences results is shown in above data.
Sequence format is pearson and sequence 1,2,3 and 4 is 96,61,753 and 83 bp long respectively.
Pairwise alignments results,
Sequences (1:2) Aligned. Score: 13
Sequences (2:3) Aligned. Score: 19
Sequences (2:4) Aligned. Score: 24
Sequences (3:4) Aligned. Score: 16
Sequences (1:3) Aligned. Score: 14
Sequences (1:4) Aligned. Score: 12