Vous êtes sur la page 1sur 26

PRACTICAL: 01

/ / 201 AIM:

TRANSCRIPTION AND TRANSLATION USING PERL

To write a PERL program to find transcription/translation/complement/reverse complement of a DNA/RNA/Protein sequence from users choice. SOFTWARE USED: Perl 5.16.2 SOURCE CODE:
x: system("cls"); print "\nCentral Dogma Menu:-\n"; print "------------------\n"; print "0. Exit\n"; print "1. Complement\n"; print "2. Reverse Complement\n"; print "3. Transcription\n"; print "4. Translation\n"; print "\nEnter your choice: "; $choice = <>; if ($choice == 1) { &Complement; } elsif ($choice == 2) { &RevComplement; } elsif ($choice == 3) { &Transcription; } elsif ($choice == 4) { &Translation; } elsif ($choice == 0) { exit; } else { print "Enter a valid number !!!\n"; <>; goto x; } sub Complement() { system("cls"); print "Enter the DNA sequence:\n"; $seq = <>; chomp($seq); $seq =~s/[^actg]//ig; $seq =~ tr/ATCGatcg/TAGCtagc/; print "\nComplement of the DNA sequence is:\n$seq"; Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil 629 180 2.1

I M.Sc. Bioinformatics (2012 2014) <>; goto x; }

Lab in Programming in C, PERL and R

sub RevComplement() { system("cls"); print "Enter the DNA sequence:\n"; $seq = <>; chomp($seq); $seq =~s/[^actg]//ig; $seq =~ tr/ATCGatcg/TAGCtagc/; $seq = reverse($seq); print "\nReverse complement of the DNA sequence is:\n$seq"; <>; goto x; } sub Transcription() { system("cls"); print "Enter the DNA sequence:\n"; $seq = <>; chomp($seq); $seq =~s/[^actg]//ig; $seq =~ tr/Tt/Uu/; print "\nTranscribed RNA sequence is:\n$seq"; <>; goto x; } sub Translation() { system("cls"); print "Enter the DNA sequence:\n"; $seq = <>; chomp($seq); $seq =~s/[^actg]//ig; $seq =~ tr/Tt/Uu/; my $seq = uc($seq); my %CodonMap = ( 'GCA'=>'A', 'GCC'=>'A', 'GCG'=>'A', 'GCU'=>'A', 'UGC'=>'C', 'UGU'=>'C', 'GAC'=>'D', 'GAU'=>'D', 'GAA'=>'E', 'GAG'=>'E', 'UUC'=>'F', 'UUU'=>'F', 'GGA'=>'G', 'GGC'=>'G', 'GGG'=>'G', 'GGU'=>'G', 'CAC'=>'H', 'CAU'=>'H', 'AUA'=>'I', 'AUC'=>'I', 'AUU'=>'I', 'AAA'=>'K', 'AAG'=>'K', 'UUA'=>'L', 'UUG'=>'L', 'CUA'=>'L', 'CUC'=>'L', 'CUG'=>'L', 'CUU'=>'L', 'AUG'=>'M', 'AAC'=>'N', 'AAU'=>'N', 'CCA'=>'P', 'CCC'=>'P', 'CCG'=>'P', 'CCU'=>'P', 'CAA'=>'Q', 'CAG'=>'Q', 'CGA'=>'R', 'CGC'=>'R', 'CGG'=>'R', 'CGU'=>'R', 'AGA'=>'R', 'AGG'=>'R', 'UCA'=>'S', 'UCC'=>'S', 'UCG'=>'S', 'UCU'=>'S', 'AGC'=>'S', 'AGU'=>'S', 'ACA'=>'T', 'ACC'=>'T', 'ACG'=>'T', 'ACU'=>'T', 'GUA'=>'V', 'GUC'=>'V', 'GUG'=>'V', 'GUU'=>'V', 'UGG'=>'W', 'UAC'=>'Y', 'UAU'=>'Y', 'UAA'=>'_', 'UAG'=>'_', 'UGA'=>'_'); my $protein = ""; for (my $i=0; $i<length($seq)-2; $i+=3) { Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil 6291 80 2.2

I M.Sc. Bioinformatics (2012 2014) Lab in Programming in C, PERL and R $codon = substr($seq,$i,3); $protein .= $CodonMap{$codon}; } print "\nTranslated protein sequence is:\n$protein"; <>; goto x; }

INPUT/OUTPUT:
Central Dogma Menu:-----------------0. Exit 1. Complement 2. Reverse Complement 3. Transcription 4. Translation Enter your choice: 4

Enter the DNA sequence: ACCGCCGTCTCCATTCTTCCAGGATCCGGCGTAATGGTGCACCACCAGTTTTCGCCCAGTCTTCTTGTCT Translated protein sequence is: TAVSILPGSGVMVHHQFSPSLLV

RESULT: A program in PERL is written to find transcription/translation/complement/reverse complement of a DNA/RNA/Protein sequence from users choice and executed successfully.

Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil 6291 80

2.3

PRACTICAL: 02

/ / 201 AIM:

SIX READING FRAMES USING PERL

To write a PERL program to translate a DNA sequence in all six reading frames. SOFTWARE USED: Perl 5.16.2 SOURCE CODE:
system("cls"); print "Six Reading Frames:-\n"; print "------------------\n\n"; print "Enter the DNA sequence:\n"; $seq = <>; chomp($seq); $seq =~s/[^actg]//ig; $seq =~ tr/Tt/Uu/; my $seq = uc($seq); my %CodonMap = ( 'GCA'=>'A', 'GCC'=>'A', 'GCG'=>'A', 'GCU'=>'A', 'UGC'=>'C', 'UGU'=>'C', 'GAC'=>'D', 'GAU'=>'D', 'GAA'=>'E', 'GAG'=>'E', 'UUC'=>'F', 'UUU'=>'F', 'GGA'=>'G', 'GGC'=>'G', 'GGG'=>'G', 'GGU'=>'G', 'CAC'=>'H', 'CAU'=>'H', 'AUA'=>'I', 'AUC'=>'I', 'AUU'=>'I', 'AAA'=>'K', 'AAG'=>'K', 'UUA'=>'L', 'UUG'=>'L', 'CUA'=>'L', 'CUC'=>'L', 'CUG'=>'L', 'CUU'=>'L', 'AUG'=>'M', 'AAC'=>'N', 'AAU'=>'N', 'CCA'=>'P', 'CCC'=>'P', 'CCG'=>'P', 'CCU'=>'P', 'CAA'=>'Q', 'CAG'=>'Q', 'CGA'=>'R', 'CGC'=>'R', 'CGG'=>'R', 'CGU'=>'R', 'AGA'=>'R', 'AGG'=>'R', 'UCA'=>'S', 'UCC'=>'S', 'UCG'=>'S', 'UCU'=>'S', 'AGC'=>'S', 'AGU'=>'S', 'ACA'=>'T', 'ACC'=>'T', 'ACG'=>'T', 'ACU'=>'T', 'GUA'=>'V', 'GUC'=>'V', 'GUG'=>'V', 'GUU'=>'V', 'UGG'=>'W', 'UAC'=>'Y', 'UAU'=>'Y', 'UAA'=>'_', 'UAG'=>'_', 'UGA'=>'_'); my $protein = ""; for (my $i=0; $i<length($seq)-2; $i+=3) { $codon = substr($seq,$i,3); $protein .= $CodonMap{$codon}; } print "\nForward Frame 1:\n$protein\n"; my $protein = ""; for (my $i=1; $i<length($seq)-2; $i+=3) { $codon = substr($seq,$i,3); $protein .= $CodonMap{$codon}; } print "\nForward Frame 2:\n$protein\n"; my $protein = ""; for (my $i=2; $i<length($seq)-2; $i+=3) { $codon = substr($seq,$i,3); Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil 629 180 2.4

I M.Sc. Bioinformatics (2012 2014) $protein .= $CodonMap{$codon}; } print "\nForward Frame 3:\n$protein\n"; my $protein = ""; $rev_seq = reverse($seq); for (my $i=0; $i<length($rev_seq)-2; $i+=3) { $codon = substr($rev_seq,$i,3); $protein .= $CodonMap{$codon}; } print "\nReverse Frame 1:\n$protein\n"; my $protein = ""; $rev_seq = reverse($seq); for (my $i=1; $i<length($rev_seq)-2; $i+=3) { $codon = substr($rev_seq,$i,3); $protein .= $CodonMap{$codon}; } print "\nReverse Frame 2:\n$protein\n"; my $protein = ""; $rev_seq = reverse($seq); for (my $i=2; $i<length($rev_seq)-2; $i+=3) { $codon = substr($rev_seq,$i,3); $protein .= $CodonMap{$codon}; } print "\nReverse Frame 3:\n$protein\n"; <>;

Lab in Programming in C, PERL and R

INPUT/OUTPUT:
Six Reading Frames:-----------------Enter the DNA sequence: ACCGCCGTCTCCATTCTTCCAGGATCCGGCGTAATGGTGCACCACCAGTTTTCGCCCAGTCTTCTTGTCT Forward Frame 1: TAVSILPGSGVMVHHQFSPSLLV Forward Frame 2: PPSPFFQDPA_WCTTSFRPVFLS Forward Frame 3: RRLHSSRIRRNGAPPVFAQSSC Reverse Frame 1: SVLLTRF_PPRGNAA_DLLTSAA Reverse Frame 2: LFF_PAFDHHVVMRPRTFLPLPP Reverse Frame 3: CSSDPLLTTTW_CGLGPSYLCR

RESULT: A program in PERL is written to translate a DNA sequence in all six reading frames and executed successfully.

Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil 6291 80

2.5

PRACTICAL: 03

/ / 201 AIM:

DOWNLOAD SEQUENCE FROM DATABASE USING BIOPERL

To write a BioPERL program to download a nucleotide/protein sequence from a biological sequence database. SOFTWARE USED: Perl 5.16.2 BioPerl 1.6.1 SOURCE CODE: Gene sequence retrieval from GenBank database
system("cls"); use strict; use Bio::SeqIO; use Bio::DB::GenBank; my $genBank = Bio::DB::GenBank->new; print "\nGenBank Sequence Download:-"; print "\n-------------------------\n"; print "\nAccession No. (AF060485):\n"; my $acc = <>; chomp($acc); my $seq = $genBank->get_Seq_by_acc($acc); my $seqOut = Bio::SeqIO->new(-file => ">$acc.fasta", -format => 'fasta'); $seqOut->write_seq($seq); print "\nDownloaded Successfuly!"; <>;

INPUT/OUTPUT: Gene sequence retrieval from GenBank database (AF060490.fasta)


GenBank Sequence Download:------------------------Accession No. (AF060485): AF060490 Downloaded Successfuly!

>AF060490 Mus musculus TLS-associated protein TASR-2 mRNA, complete cds. GTGTGGTGTGAGTGGATGTGAGCCGCCGCCGGAGCTGCGGACGGTTTGCCCGAGCCCGTT AGCGCCGCCGGCCCAGAGTCCCGCCGCCACCATGTCCCGATACCTGCGCCCCCCTAACAC GTCTCTGTTCGTCAGGAACGTGGCGGACGACACCAGGTCTGAAGATTTACGTCGGGAATT TGGTCGTTATGGTCCAATAGTAGATGTTTATGTCCCACTTGATTTCTACACTCGGCGTCC AAGAGGATTTGCATATGTTCAATTTGAGGATGTTCGTGATGCTGAAGACGCTTTACATAA TTTGGACAGAAAATGGATTTGTGGGCGTCAGATTGAAATCCAGTTCGCACAGGGGGATCG GAAGACACCAAATCAAATGAAAGCCAAGGAAGGGAGGAATGTATACAGCTCTTCACGATA TGACGATTATGACCGATATAGACGCTCTCGAAGCCGGAGTTATGAAAGGAGAAGATCGAG GAGTCGCTCCTTTGATTATAACTATAGGAGATCTTACAGTCCTAGAAACAGTAGACCGAC TGGAAGACCACGGCGTAGCCGAAGCCATTCCGACAATGATAGATTCAAACACCGAAATCG ATCTTTTTCAAGATCTAAATCCAATTCAAGATCACGGTCCAAGTCCCAGCCCAAGAAAGA AATGAAGGCTAAATCACGTTCTAGGTCTGCATCTCACACCAAAACTAGAGGCACCTCTAA AACAGATTCCAAAACACATTATAAGTCTGGCTCAAGATATGAAAAGGAATCAAGGAAAAA AGAACCACCTAGATCCAAATCTCAGTCAAGATCACAGTCTAGGTCTAGGTCAAAATCTAG GTCAAGGTCTTGGACTAGTCCCAAGTCCAGTGGCCACTGATAGTATAAATTATGATACTT CTAGGCATGTATCATTCATTTACTCATAGTTTGGTATACTTAAATTATCAGGAATACAAT GTTGCAATGATGCGTTTTAAAAACAAACAAACTTAACTTGTTAGTTTTCCCTGTACTGGG

Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil 629 180

2.6

I M.Sc. Bioinformatics (2012 2014)

Lab in Programming in C, PERL and R

CAATGGTTATAATTAAAAAGATGCGCTGTTGAGAAGCCACTCTTAAGAGTCCAGTTTGTT TAATGTTATGGGCAGCTACCAATTTGTGGTGTCTCTGTATATTTTTGTAAAGATTCTCAT TTTTTATGCTTGAAGTATTTGGTGAAAAGATGTTGGTTGACCATAATTTGCAACATTGTC TTATTAGAAATAAATTTTCATATCCATATTTGGTAGAACTGTTAACCTAGAAATGTAGCT TGCTAATAAGATAGAATGATACAGAAGTGAAGTGGTAGCCACATTACAACACTGACTGCT CAGACACATTTAGGTTCAGGGTGGACTTTATGTCTTGTCAAGATGTCTAAGCCCATGATG ATTATTTATGATGCAATGTGGAATAGTTCTTTTGTTAAATCCACCATCTGGGGATTGATG CCAACTGGGTTAAATAGCGTTTTCAGGGAGAGTGCCCTTTTCACTGAAACATGGAGCCTT CACTGCTTTCCCCACCTCAATCCCTGCTGGTTTCTAAGATATGGAACATTAAAGCATAAG GGAAAACCCTCCCCCTTAAGTTGTGAGTGAGTCAGTGATCACAGAAACCATTGTAAGGGG AAAAGACTGTTCTTAGCATAGTTGCTCTAAATTTAACTATTGTTGATCATTGTTATTTAG GGGTTTTGTTTTGTTGTTTGTTTTTTCTGTTAGAAACAAGTGAACTGTTTGAAAATACAT TTTTGTTTGTTTATATGCATAGTGTAAAACAAACTGAATTTTGATGCTCACAGCACTTAC CATGTGCGTTTGTATCAAAATCTGCCTGTTCTTCATAGGGGAGGCTTGCTCTTCACACCT CAGTTTATTCATGTGAGACAGGCTGAGAAGATAACACTCCTAGGTGATTTTGTGGTGCCG TGGATTTTTGGGGAAAGTTGAGTTTTAAGCAAAAGCCACATCACTTAGTTTTTGGTAATG TAGGACATGACTAAAAAATAACGAAATGATACCCTTAAATATTTATAATTTCTAGTATTT CAAGATTGTTTTGGAGGCAATAAAATGACTTGAAATGTCCGGTGTCATTTCAGAATACAA AGCTAGTGTCTCTAAGATCTTAGATTCGTTGCTTACAGATGTGAGTGAAGATACTGTGGG GGACGATCCTCCTGGAGGATTACCTTATTTTTTTCCTTTCGATTTTGTTTTTAGAAATTT AGTCCTTGCTTGTAGACAACAAAAGATGGTTTTAAGAACTGTTTGTGGAATGTGTTTGGA GGGTTAATTCTAGAACCTTTGTATATTTAATAGTATTTCTAACTTTTATTTCTTTACTGT TTGCAGTTAATGTTCTTGTTCTGCTATGCAATCATTTATATGCACGTTTCTTTAATTTTT TTAGATTTTCCTGGATGTATAGTTTAAACAAAGTCTATTTAAAACTGTAGCGGTAGTTTG CAGTTCTAGCAAAGAGGAAAGTTGTGGGGTTAAACTTTGTATTTTCTTTCTTATAGAAGC TTCTAAAAAGGTATTTTTATATGTTCTTTTTAACAAATATTGTGTACAACCTTTAAAACA TCAATGTTTGGATCAAAACAAGACCCAGCTTATTTTCTGCTTGCTGTAAATTAAGCAAAG ATGCTATAATAAAAACAAAATGAAGGAAAAAAAAAAAAAAAAAAAAAAAAAAA

RESULT: A program using BioPERL is written to download a nucleotide/protein sequence from a biological sequence database and executed successfully.

Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil 6291 80

2.7

PRACTICAL: 04

/ / 201 AIM:

REMOTEBLAST USING BIOPERL

To write a program to find homologous sequences for a query sequence, from biological sequence database using RemoteBLAST using BioPERL. SOFTWARE USED: Perl 5.16.2 BioPerl 1.6.1 SOURCE CODE:
use Bio::Tools::Run::RemoteBlast; use strict; system("cls"); print "+------------------------------------+\n"; print "| Remote BLAST Program |\n"; print "+------------------------------------+\n"; print "\nEnter the following details:-\n"; print "\nProgram (blastn|blastp|blastx|tblastn|tblastx):\n"; my $prog = <>; chomp($prog); print "\nDataBase (nr|swissprot|pdb|month):\n"; my $db = <>; chomp($db); print "\nE-value (Example: 1e-10):\n"; my $e_val = <>; chomp($e_val); my @params = ('-prog' => $prog, '-data' => $db, '-expect' => $e_val, '-readmethod' => 'SearchIO'); my $factory = Bio::Tools::Run::RemoteBlast->new(@params); print "\nFile name (.fasta format):\n"; my $fname = <>; chomp($fname); my $r = $factory->submit_blast($fname); while ( my @rids = $factory->each_rid ) { for my $rid ( @rids ) { my $rc = $factory->retrieve_blast($rid); my $result = $rc->next_result(); $factory->save_output("Blast\ Output.txt"); $factory->remove_rid($rid); } } print "\nBlast output is generated successfully!"; <>;

Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil 629 180

2.8

I M.Sc. Bioinformatics (2012 2014)

Lab in Programming in C, PERL and R

INPUT:
+------------------------------------+ | Remote BLAST Program | +------------------------------------+ Enter the following details:Program (blastn|blastp|blastx|tblastn|tblastx): blastn DataBase (nr|swissprot|pdb|month): nr E-value (Example: 1e-10): 1e-5 File name (.fasta format): dna.fasta Blast output is generated successfully!

OUTPUT:
BLASTN 2.2.27+ Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402.

RID: F5FR6GCG015

Database: Nucleotide collection (nt) 17,084,706 sequences; 43,890,479,962 total letters Query= gi|440487466|gb|JH795076.1| Magnaporthe oryzae P131 unplaced genomic scaffold P131_scaffold00326, whole genome shotgun sequence Length=980

Sequences producing significant alignments: ref|XM_003721193.1| Magnaporthe oryzae 70-15 initiation-speci... ref|XM_003711036.1| Magnaporthe oryzae 70-15 initiation-speci... ref|XM_003660234.1| Myceliophthora thermophila ATCC 42464 gly... gb|CP003002.1| Myceliophthora thermophila ATCC 42464 chromoso... ref|XM_001935551.1| Pyrenophora tritici-repentis Pt-1C-BFP al... ref|XM_003306105.1| Pyrenophora teres f. teres 0-1 hypothetic... ref|XM_003300282.1| Pyrenophora teres f. teres 0-1 hypothetic...

Score (Bits) 1768 277 93.3 93.3 87.8 66.2 64.4

E Value 0.0 1e-70 3e-15 3e-15 1e-13 5e-07 2e-06

ALIGNMENTS >ref|XM_003721193.1| Magnaporthe oryzae 70-15 initiation-specific alpha-1,6mannosyltransferase (MGG_02562) mRNA, complete cds Length=1412 Score = 1768 bits (1960), Expect = 0.0 Identities = 980/980 (100%), Gaps = 0/980 (0%) Strand=Plus/Minus Query Sbjct 1 1144 ACCGCCGTCTCCATTCTTCCAGGATCCGGCGTAATGGTGCACCACCAGTTTTCGCCCAGT |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| ACCGCCGTCTCCATTCTTCCAGGATCCGGCGTAATGGTGCACCACCAGTTTTCGCCCAGT 60 1085

Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil 6291 80

2.9

I M.Sc. Bioinformatics (2012 2014)


Query Sbjct Query Sbjct Query Sbjct Query Sbjct Query Sbjct Query Sbjct Query Sbjct Query Sbjct Query Sbjct Query Sbjct Query Sbjct Query Sbjct Query Sbjct Query Sbjct Query Sbjct Query Sbjct 61 1084 121 1024 181 964 241 904 301 844 361 784 421 724 481 664 541 604 601 544 661 484 721 424 781 364 841 304 901 244 961 184

Lab in Programming in C, PERL and R


120 1025 180 965 240 905 300 845 360 785 420 725 480 665 540 605 600 545 660 485 720 425 780 365 840 305 900 245 960 185

CTTCTTGTCTCCATGCTGTTGATTCATCGTGTCCGCAAAGGCGTAGTCGGGCAACACCAG |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| CTTCTTGTCTCCATGCTGTTGATTCATCGTGTCCGCAAAGGCGTAGTCGGGCAACACCAG CACGTCGCCCAACAGCCTGGGCTCTTTGACGTTGGCTATCTCGCCGTCGCCCACGGTCTC |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| CACGTCGCCCAACAGCCTGGGCTCTTTGACGTTGGCTATCTCGCCGTCGCCCACGGTCTC ATTCAGCGTGTTGCTCAGACTCTTCAAGATGCCCCTCGTCAACCTGCGCGGGCCCGAAAC |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| ATTCAGCGTGTTGCTCAGACTCTTCAAGATGCCCCTCGTCAACCTGCGCGGGCCCGAAAC ATCGACAATGTCGTCAACCATGTCGAGCCTGAGGTCCTGGATCCCGCCGACCTTGCGCTC |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| ATCGACAATGTCGTCAACCATGTCGAGCCTGAGGTCCTGGATCCCGCCGACCTTGCGCTC CTTGGCCTTGGCGACCAGTCCCTCGAGACCATCTTGGACGGCCATCATCATGTGCGGCGA |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| CTTGGCCTTGGCGACCAGTCCCTCGAGACCATCTTGGACGGCCATCATCATGTGCGGCGA TTTTGGTTTCGCCATGATAGTCCAACTGGCGAACTGCCGAACCCACTGGTCCACATCGAA |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| TTTTGGTTTCGCCATGATAGTCCAACTGGCGAACTGCCGAACCCACTGGTCCACATCGAA CTCCAGTCCAACAACAATTTTGGCTTGGTCTTTGTACTGCTTGGGAACCCATTCGCTGAT |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| CTCCAGTCCAACAACAATTTTGGCTTGGTCTTTGTACTGCTTGGGAACCCATTCGCTGAT CGGTGCCTCGCACGAGACGTCCAGGTCGTTCCACACCCCGCCGAACTCGTACAGGATCAG |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| CGGTGCCTCGCACGAGACGTCCAGGTCGTTCCACACCCCGCCGAACTCGTACAGGATCAG GTAGCGGAGAAGATCGGCCTTGATGATTGGAATCCTAATGGCGAGGAAATTGGCGACCAC |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| GTAGCGGAGAAGATCGGCCTTGATGATTGGAATCCTAATGGCGAGGAAATTGGCGACCAC GTCAGGGCGCAAGGCGGCAAAGTGTCTCTTGACAAAGGCGTCGCCTGATTCGTCCGTCAG |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| GTCAGGGCGCAAGGCGGCAAAGTGTCTCTTGACAAAGGCGTCGCCTGATTCGTCCGTCAG GAACTCGACCTTGAAGCCGGGGTTCTTGGACACACAGGAGTCGACGTGGGGCTTGAGGTC |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| GAACTCGACCTTGAAGCCGGGGTTCTTGGACACACAGGAGTCGACGTGGGGCTTGAGGTC GTCCTTCAAGCCTGCAGGCCCGAGTTTGTACCACAGCCTTTGTGGTAGTGCTGCGACGGC |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| GTCCTTCAAGCCTGCAGGCCCGAGTTTGTACCACAGCCTTTGTGGTAGTGCTGCGACGGC AGTCGGCCCCGAGCTCGACGCGGCAGACGTGGTGGTTTCTTGTGCCAGCAGCGGCGCGGG |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| AGTCGGCCCCGAGCTCGACGCGGCAGACGTGGTGGTTTCTTGTGCCAGCAGCGGCGCGGG CTTCATCCGGGGTGTCGCAAAGGTGGGGCCGGCCTTCCATTCCGAAGGCCTGTGGAAATT |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| CTTCATCCGGGGTGTCGCAAAGGTGGGGCCGGCCTTCCATTCCGAAGGCCTGTGGAAATT GAGAATGAGGAAGCATATTGTGAGAAAGCTCAAGGCAGCCGGCACTTTGGCTGTCAAACG |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| GAGAATGAGGAAGCATATTGTGAGAAAGCTCAAGGCAGCCGGCACTTTGGCTGTCAAACG ATTGTGAAATGCCAAAATCA |||||||||||||||||||| ATTGTGAAATGCCAAAATCA 980 165

>ref|XM_003711036.1| Magnaporthe oryzae 70-15 initiation-specific alpha-1,6mannosyltransferase (MGG_08652) mRNA, complete cds Length=984 Score = 277 bits (306), Expect = 1e-70 Identities = 524/760 (69%), Gaps = 17/760 (2%)

Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil 6291 80

2.10

I M.Sc. Bioinformatics (2012 2014)


Strand=Plus/Minus Query Sbjct Query Sbjct Query Sbjct Query Sbjct Query Sbjct Query Sbjct Query Sbjct Query Sbjct Query Sbjct Query Sbjct Query Sbjct Query Sbjct Query Sbjct 12 967 72 907 132 847 191 788 251 728 310 672 370 612 430 552 488 494 548 434 608 374 668 317 728 260

Lab in Programming in C, PERL and R

CATTCTTCCAGGATCCGGCGTAATGGTGCACCACCAGTTTTCGCCCAGTCTTCTTGTCTC |||| |||||||| || |||||||||||||| || || || ||||| | | |||||| | CATTTTTCCAGGACCCTGCGTAATGGTGCACTACGAGCTTCCGCCCTGGCACCTTGTCCC CATGCTGTTGATTCATCGTGTCCGCAAAGGCGTAGTCGGGCAACACCAGCACGTCGCCCA | | ||||| | || || || | || ||| |||| | | || | ||||| CCCACCACATGTTCATGGAATCTGCGAACGAGTGGTCCGGCAGTATTAAAACATTGCCCA ACAGCCTGGGCTCTTTGACGTTGGCTATCTCGCCGTCGCCCACGGTCTC-ATTCAGCGTG ||||| || || | || ||| |||| || | ||||| |||| || || | GAAGCCTCGGTTCCGTAACATTGACTATGTCATTATTCCCCAC-CTCTCGGTTGAGTGAT TTGCTCAGACTCTTCAAGATGCCCCTCGTCAACCTGCGCGGGCCCGAAACATCGACAATG ||||| || ||||| || || ||||||||||| | ||||||||| | |||| || TTGCTGAGGCTCTTGAAAATCGACCTCGTCAACCGTCTCGGGCCCGACAAGTCGATAACA TCGTCAACCATGTCGAGCCTGAGGTCCTGGATCCCGCCG-ACCTTGCGCTCCTTGGCCTT ||||| | | || | || | || ||| ||| | | || || || ||||| | TCGTCGAGCTGGTTGCGCTTCAGCTCC--GATATTGGGGTTCCAAGC-CT-TTTGGCAGT GGCGACCAGTCCCTCGAGACCATCTTGGACGGCCATCATCATGTGCGGCGATTTTGGTTT ||| ||| | |||| | | |||| ||| ||||| | | |||||| ||| || || GGCCGCCACTTCCTCCAAGCAGTCTTCGACCGCCATTAAGAAATGCGGCTGTTTGGGCTT CGCCATGATAGTCCAACTGGCGAACTGCCGAACCCACTGGTCCACATCGAACTCCAGTCC |||||||| ||||| ||||||| ||| || | |||| |||| |||||||||| || GGCCATGATGGTCCAGCTGGCGATCTGTCGCAACCACCTGTCCGTGTCGAACTCCATCCC AACAACAATTTTGGCTTGGT--CTTTGTACTGCTTGGGAACCCATTCGCTGATCGGTGCC || || | || ||| ||||||||||| ||| | ||||||| ||| ||||| GACGACGGTAGCAGC--GGTAGATTTGTACTGCTCGGGGATCCATTCGTCGATGGGTGCT TCGCACGAGACGTCCAGGTCGTTCCACACCCCGCCGAACTCGTACAGGATCAGGTAGCGG ||||| || |||||||| |||||||| | || || ||||| ||||| ||||||||| TCGCAAGACACGTCCAGATCGTTCCAGATTCCACCCTTTTCGTAGAGGATGAGGTAGCGG AGAAGATCGGCCTTGATGATTGGAATCCTAATGGCGAGGAAATTGGCGACCACGTCAGGG || || |||||||||||||||||||| || |||| |||||| || || | | || AGGAGGTCGGCCTTGATGATTGGAATGCTGATGGGGAGGAACCTGTTGATTATATTTGGA CGCAAGGCGGCAAAGTGTCTCTTGACAAAGGCGTCGCCTGATTCGTCCGTCAGGAACTCG | | | | |||| |||||||||| ||||||| || |||| ||||||||||| TTCCACGAG---TAGTGCTTCTTGACAAACTCGTCGCCCGAGACGTCGGTCAGGAACTCA ACCTTGAAGCCGGGGTTCTTGGACACACAGGAGTCGACGTGGGGCTTGAGGTCGTCCTTC || | | |||| || |||||| | || || | | || ||||||| | | | || ACATCGTAGCCTGGATTCTTG---AGGCAAGATTTTATGTATGGCTTGATATTCTTCCTC AAGCCTGCAGGCCCGAGTTTGTACCACAGCCTTTGTGGTA | || ||| | |||| ||| ||||||||| ||| ||||| ACCCCCGCACGTCCGACTTTATACCACAGCTTTTTTGGTA 767 221

71 908 131 848 190 789 250 729 309 673 369 613 429 553 487 495 547 435 607 375 667 318 727 261

>ref|XM_003660234.1| Myceliophthora thermophila ATCC 42464 glycosyltransferase family 32 protein (MYCTH_97899) mRNA, complete cds Length=699 Score = 93.3 bits (102), Expect = 3e-15 Identities = 158/229 (69%), Gaps = 9/229 (4%) Strand=Plus/Minus Query Sbjct Query Sbjct 353 344 409 284 TGCGGCGATTTTGGTTTCGCCATGATAGTCCAACTGGCGAACTGCCGAACCCA----CTG |||||||| |||||||||||||| |||||| | |||| ||| | || | | | TGCGGCGACCCCGGTTTCGCCATGATGGTCCAAATCGCGAGCTGGTGGACAAAAGGCCGG GTCCA-----CATCGAACTCCAGTCCAACAACAATTTTGGCTTGGTCTTTGTACTGCTTG | ||| ||| |||||| || || || | |||| | ||| | |||||| GGCCAGCCGACATTAAACTCCCAGCCCACGACGACGTTGGTCTCGTCCTCATACTGCGGA 408 285 463 225

Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil 6291 80

2.11

I M.Sc. Bioinformatics (2012 2014)


Query Sbjct Query Sbjct 464 224 524 164

Lab in Programming in C, PERL and R


523 165

GGAACCCATTCGCTGATCGGTGCCTCGCACGAGACGTCCAGGTCGTTCCACACCCCGCCG |||| |||||||| || || | ||||||||||||||||||||| | || || || GGAATCCATTCGCCGAAGGGCGTGTCGCACGAGACGTCCAGGTCGCAGTAGACGCCTCCT AACTCGTACAGGATCAGGTAGCGGAGAAGATCGGCCTTGATGATTGGAA | | |||| |||||||||||||| ||| | |||| |||||||| TCGGAGAAGAGGAGGAGGTAGCGGAGAAGGTCGACTTTGAGGATTGGAA 572 116

>gb|CP003002.1| Myceliophthora thermophila ATCC 42464 chromosome 1, complete sequence Length=10931058 Features in this part of subject sequence: glycosyltransferase family 32 protein Score = 93.3 bits (102), Expect = 3e-15 Identities = 158/229 (69%), Gaps = 9/229 (4%) Strand=Plus/Plus Query Sbjct Query Sbjct Query Sbjct Query Sbjct 353 8013281 409 8013341 464 8013401 524 8013461 TGCGGCGATTTTGGTTTCGCCATGATAGTCCAACTGGCGAACTGCCGAACCCA----CTG |||||||| |||||||||||||| |||||| | |||| ||| | || | | | TGCGGCGACCCCGGTTTCGCCATGATGGTCCAAATCGCGAGCTGGTGGACAAAAGGCCGG GTCCA-----CATCGAACTCCAGTCCAACAACAATTTTGGCTTGGTCTTTGTACTGCTTG | ||| ||| |||||| || || || | |||| | ||| | |||||| GGCCAGCCGACATTAAACTCCCAGCCCACGACGACGTTGGTCTCGTCCTCATACTGCGGA GGAACCCATTCGCTGATCGGTGCCTCGCACGAGACGTCCAGGTCGTTCCACACCCCGCCG |||| |||||||| || || | ||||||||||||||||||||| | || || || GGAATCCATTCGCCGAAGGGCGTGTCGCACGAGACGTCCAGGTCGCAGTAGACGCCTCCT AACTCGTACAGGATCAGGTAGCGGAGAAGATCGGCCTTGATGATTGGAA | | |||| |||||||||||||| ||| | |||| |||||||| TCGGAGAAGAGGAGGAGGTAGCGGAGAAGGTCGACTTTGAGGATTGGAA 572 8013509 408 8013340 463 8013400 523 8013460

>ref|XM_001935551.1| Pyrenophora tritici-repentis Pt-1C-BFP alpha-1,6mannosyltransferase Och1, mRNA Length=846 Score = 87.8 bits (96), Expect = 1e-13 Identities = 217/327 (66%), Gaps = 12/327 (4%) Strand=Plus/Minus Query Sbjct Query Sbjct Query Sbjct Query Sbjct Query Sbjct Query Sbjct 346 585 402 525 457 465 517 405 577 345 637 288 CATCATGTGCGGCGATTTTGGTTTCGCCATGATAGTCCAACTGGCGAA----CTGCCGAA |||||| || || || |||||| |||||||||||||| || ||||| | | || | CATCATATGTGGGGACCGTGGTTTAGCCATGATAGTCCAGCTAGCGAATTGTCGGACGTA CCCACTGGTCCA-----CATCGAACTCCAGTCCAACAACAATTTTGGCTTGGTCTTTGTA | ||| || ||| ||||||||| || ||||| | ||| | | | | ||| CACACCGGGCCAGCCTTGGTCGAACTCCCACCCTACAACGAGCGAGGCGTTGGCCTGGTA CTGCTTGGGAACCCATTCGCTGATCGGTGCCTCGCACGAGACGTCCAGGTCGTTCCACAC | || |||||| || || |||| |||||| || ||||| || ||| ||| | TTCAGACGGCACCCATGTGCCAATAGGTGTCTCGCAGGATACGTCTAGATCGGACCATAT CCCGCCGAACTCGTACAGGATCAGGTAGCGGAGAAGATCGGCCTTGATGATTGGAATCCT |||||| || | |||| |||||||| || | ||| || ||| ||| || || ACCGCCGCGGTCCCAGAGGAGGAGGTAGCGCAGGAAATCTGCTTTGTAGATGGGGATGGG AATGGCGAGGAAATTGGCGACCACGTCAGGGCGCAAGGCGGCAAAGTGTCTCTTGACAAA | ||||||| | || ||||| | ||| ||||| | ||| |||| | | ||| | GAGGGCGAGGTAGTTTGCGACGATGTCCGGGCGGGAAGCG--AAAG-CTGTACGGACGTA GGCGTCGCCTGATTCGTCCGTCAGGAA | |||||| |||||| |||| ||| GTCGTCGCTGCTTTCGTCGGTCATGAA 663 262 401 526 456 466 516 406 576 346 636 289

>ref|XM_003306105.1| Pyrenophora teres f. teres 0-1 hypothetical protein, mRNA

Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil 6291 80

2.12

I M.Sc. Bioinformatics (2012 2014)


Length=1044 Score = 66.2 bits (72), Expect = 5e-07 Identities = 167/251 (67%), Gaps = 6/251 (2%) Strand=Plus/Minus Query Sbjct Query Sbjct Query Sbjct Query Sbjct Query Sbjct 416 566 476 506 533 446 593 386 653 329

Lab in Programming in C, PERL and R

TCGAACTCCAGTCCAACAACAATTTTGGCTTGGTCTTTGTACTGCTTGGGAACCCATTCG ||||||||| ||| || |||| |||| | ||||||| ||| || ||||||| | TCGAACTCCCATCCCACGACAACACTGGCGTTTGCTTTGTATCGCTCCGGGACCCATTGG CTGATCGGTGCC---TCGCACGAGACGTCCAGGTCGTTCCACACCCCGCCGAACTCGTAC || ||| | || || |||||||| |||||| | || || || ||| | TCCATGGGTACTCCTTCACAGGAGACGTCGAGGTCGGCGTAGACGCCACCCTGGTCGAAG AGGATCAGGTAGCGGAGAAGATCGGCCTTGATGATTGGAATCCTAATGGCGAGGAAATTG |||| |||||||||||| | ||||| || | ||| |||||| || |||| | || AGGAGCAGGTAGCGGAGCATGTCGGCTTTCAGGATGGGAATCGGAAGACCGAGATAGTTC GCGACCACGTCAGGGCGCAAGGCGGCAAAGTGTCTCTTGACAAAGGCGTCGCCTGATTCG |||| | || || |||| || | | | |||| || | ||||| | | |||| TCGACGATATCCGGACGCATCACG--TATGCCT-TCTTTACGTATTCGTCGGCAGTTTCG TCCGTCAGGAA ||||||| ||| TCCGTCATGAA 663 319

475 507 532 447 592 387 652 330

>ref|XM_003300282.1| Pyrenophora teres f. teres 0-1 hypothetical protein, mRNA Length=939 Score = 64.4 bits (70), Expect = 2e-06 Identities = 211/327 (65%), Gaps = 12/327 (4%) Strand=Plus/Minus Query Sbjct Query Sbjct Query Sbjct Query Sbjct Query Sbjct Query Sbjct 346 582 404 522 457 462 517 402 577 342 637 285 CATCATGTGCGGCGATTTTGGTTTCGCCATGATAGTCCAACTGGCGAACTGCCGAACC-||||||||| || || ||||| || | ||||||||| || || ||||| ||||| CATCATGTGTGGGGACCGGGGTTTGGCTAGGATAGTCCAGCTAGCAAACTGACGAACGTA --CACTGGTCCACA-----TCGAACTCCAGTCCAACAACAATTTTGGCTTGGTCTTTGTA ||| || ||| ||||| ||| || || || | || | | | | ||| GACACCGGGCCAGCCTTGGTCGAATTCCCAGCCTACCACCAGCGACGCGTTGGCCTGGTA CTGCTTGGGAACCCATTCGCTGATCGGTGCCTCGCACGAGACGTCCAGGTCGTTCCACAC | || ||||| | ||| || ||||||| || ||||| || ||| ||| | TTCGGGCGGCACCCACGAGTCGATGGGCACCTCGCAGGATACGTCTAGATCGGACCATAT CCCGCCGAACTCGTACAGGATCAGGTAGCGGAGAAGATCGGCCTTGATGATTGGAATCCT ||||| || | |||| ||||||||||| | |||||| || ||| || | GCCGCCTTGGTCCCAGAGGAGGAGGTAGCGGAGGAAATCGGCTTTATAGATGGGGACGGG AATGGCGAGGAAATTGGCGACCACGTCAGGGCGCAAGGCGGCAAAGTGTCTCTTGACAAA | ||||||| | |||| ||| | ||| ||||| || |||||| | | ||| | GAGGGCGAGGTAGTTGGAGACGATGTCGGGGCGGAA---TGCAAAGGCTGTACGGACGTA GGCGTCGCCTGATTCGTCCGTCAGGAA | |||||| ||||||||||| ||| GCCGTCGCTGCTTTCGTCCGTCATGAA 663 259 403 523 456 463 516 403 576 343 636 286

Database: Nucleotide collection (nt) Posted date: Jan 12, 2013 4:14 PM Number of letters in database: 43,890,479,962 Number of sequences in database: 17,084,706 Lambda K H 0.634 0.408 0.912 Gapped Lambda K H 0.625 0.410 0.780 Matrix: blastn matrix:2 -3 Gap Penalties: Existence: 5, Extension: 2 Number of Sequences: 17084706

Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil 6291 80

2.13

I M.Sc. Bioinformatics (2012 2014)


Number of Hits to DB: 15849831 Number of extensions: 96625 Number of successful extensions: 96625 Number of sequences better than 1e-05: 0 Number of HSP's better than 1e-05 without gapping: 0 Number of HSP's gapped: 96625 Number of HSP's successfully gapped: 0 Length of query: 980 Length of database: 43890479962 Length adjustment: 36 Effective length of query: 944 Effective length of database: 43275430546 Effective search space: 40852006435424 Effective search space used: 40852006435424 A: 0 X1: 22 (20.1 bits) X2: 33 (29.8 bits) X3: 110 (99.2 bits) S1: 25 (23.8 bits) S2: 68 (62.6 bits)

Lab in Programming in C, PERL and R

RESULT: A program using BioPERL is written to find homologous sequences for a query sequence, from biological sequence database using RemoteBLAST and executed successfully.

Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil 6291 80

2.14

PRACTICAL: 05

/ / 201 AIM:

SECONDARY STRUCTURE PREDICTION USING BIOPERL

To write a BioPERL program to predict secondary structure of a protein sequence. SOFTWARE USED: Perl 5.16.2 BioPerl 1.6.1 SOURCE CODE:
system("cls"); use Bio::PrimarySeq; use Bio::Tools::Analysis::Protein::Sopma; print "Protein Secondary Structure Prediction (SOPMA):-"; print "\n----------------------------------------------\n"; print "\nEnter your query sequence:\n"; $query = <>; my $seqs = Bio::PrimarySeq->new(-seq => $query); $tool = Bio::Tools::Analysis::Protein::Sopma->new( -seq => $seqs, -window_width => 15); $tool->run(); my $raw = $tool->result(''); my @fts = $tool->result(Bio::SeqFeatureI); print "\n Predicted Regions are below:\n"; for my $ft (@fts) { print "From ", $ft->start, " to ",$ft->end, " struc: " , ($ft->each_tag_value('type'))[0],"\n"; } <>;

INPUT/OUTPUT:
Protein Secondary Structure Prediction (SOPMA):---------------------------------------------Enter your query sequence: EHIMELLIMVDALKRASAKTINIVIPYYGYARQDRKARSREPITAKLFANLLETAGATRVIALDLHAPQI Predicted Regions are below: From 1 to 20 struc: H From 43 to 54 struc: H From 55 to 56 struc: T From 25 to 42 struc: C From 21 to 24 struc: E From 59 to 64 struc: E

RESULT: A program using BioPERL is written to predict secondary structure of a protein sequence and executed successfully.

Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil 629 180

2.15

PRACTICAL: 06

/ / 201 AIM:

GLOBAL ALIGNMENT USING R

To write a R program to align pair of sequences using Needleman-Wunsch algorithm. SOFTWARE USED: R 2.15.2 Biostrings 2.6.6: Module for string objects representing biological sequences, and matching algorithms in R SOURCE CODE:
library("seqinr") library("Biostrings") leprae <- read.fasta(file = "E:/R\ Practical/Q9CD83.fasta") ulcerans <- read.fasta(file = "E:/R\ Practical/A0PQ23.fasta") lepraeseq <- leprae[[1]] ulceransseq <- ulcerans[[1]] lepraeseqstring <- c2s(lepraeseq) ulceransseqstring <- c2s(ulceransseq) lepraeseqstring <- toupper(lepraeseqstring) ulceransseqstring <- toupper(ulceransseqstring) globalAlignLepraeUlcerans <- pairwiseAlignment(lepraeseqstring, ulceransseqstring, substitutionMatrix = "BLOSUM50", gapOpening = -2, gapExtension = -8, scoreOnly = FALSE) printPairwiseAlignment <- function(alignment, chunksize=60, returnlist=FALSE) { require(Biostrings) # This function requires the Biostrings package seq1aln <- pattern(alignment) # Get the alignment for the first sequence seq2aln <- subject(alignment) # Get the alignment for the second sequence alnlen <- nchar(seq1aln) # Find the number of columns in the alignment starts <- seq(1, alnlen, by=chunksize) n <- length(starts) seq1alnresidues <- 0 seq2alnresidues <- 0 for (i in 1:n) { chunkseq1aln <- substring(seq1aln, starts[i], starts[i]+chunksize-1) chunkseq2aln <- substring(seq2aln, starts[i], starts[i]+chunksize-1) # Find out how many gaps there are in chunkseq1aln: gaps1 <- countPattern("-",chunkseq1aln) # countPattern() is from Biostrings package # Find out how many gaps there are in chunkseq2aln: gaps2 <- countPattern("-",chunkseq2aln) # countPattern() is from Biostrings package # Calculate how many residues of the first sequence we have printed so far in the alignment: seq1alnresidues <- seq1alnresidues + chunksize - gaps1 # Calculate how many residues of the second sequence we have printed so far in the alignment: seq2alnresidues <- seq2alnresidues + chunksize - gaps2 if (returnlist == 'FALSE') { print(paste(chunkseq1aln,seq1alnresidues)) print(paste(chunkseq2aln,seq2alnresidues)) print(paste(' ')) } } if (returnlist == 'TRUE') Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil 629 180 2.16

I M.Sc. Bioinformatics (2012 2014) Lab in Programming in C, PERL and R { vector1 <- s2c(substring(seq1aln, 1, nchar(seq1aln))) vector2 <- s2c(substring(seq2aln, 1, nchar(seq2aln))) mylist <- list(vector1, vector2) return(mylist) } } printPairwiseAlignment(globalAlignLepraeUlcerans, 60)

INPUT: File 1: (E:\R Practical\Q9CD83.fasta)


>sp|Q9CD83|PHBS_MYCLE Chorismate--pyruvate lyase OS=Mycobacterium (strain TN) GN=ML0133 PE=3 SV=1 MTNRTLSREEIRKLDRDLRILVATNGTLTRVLNVVANEEIVVDIINQQLLDVAPKIPELE NLKIGRILQRDILLKGQKSGILFVAAESLIVIDLLPTAITTYLTKTHHPIGEIMAASRIE TYKEDAQVWIGDLPCWLADYGYWDLPKRAVGRRYRIIAGGQPVIITTEYFLRSVFQDTPR EELDRCQYSNDIDTRSGDRFVLHGRVFKNL leprae

File 2: (E:\R Practical\A0PQ23.fasta)


>tr|A0PQ23|A0PQ23_MYCUA Chorismate pyruvate-lyase OS=Mycobacterium ulcerans (strain Agy99) GN=MUL_2003 PE=4 SV=1 MLAVLPEKREMTECHLSDEEIRKLNRDLRILIATNGTLTRILNVLANDEIVVEIVKQQIQ DAAPEMDGCDHSSIGRVLRRDIVLKGRRSGIPFVAAESFIAIDLLPPEIVASLLETHRPI GEVMAASCIETFKEEAKVWAGESPAWLELDRRRNLPPKVVGRQYRVIAEGRPVIIITEYF LRSVFEDNSREEPIRHQRSVGTSARSGRSICT

OUTPUT:
[1] [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] "MT-----NR--T---LSREEIRKLDRDLRILVATNGTLTRVLNVVANEEIVVDIINQQLL "MLAVLPEKREMTECHLSDEEIRKLNRDLRILIATNGTLTRILNVLANDEIVVEIVKQQIQ " " "DVAPKIPELENLKIGRILQRDILLKGQKSGILFVAAESLIVIDLLPTAITTYLTKTHHPI "DAAPEMDGCDHSSIGRVLRRDIVLKGRRSGIPFVAAESFIAIDLLPPEIVASLLETHRPI " " "GEIMAASRIETYKEDAQVWIGDLPCWLADYGYWDLPKRAVGRRYRIIAGGQPVIITTEYF "GEVMAASCIETFKEEAKVWAGESPAWLELDRRRNLPPKVVGRQYRVIAEGRPVIIITEYF " " "LRSVFQDTPREELDRCQYSNDIDTRSGDRFVLHGRVFKN 230" "LRSVFEDNSREEPIRHQRS--VGT-SA-R---SGRSICT 233" " " 50" 60" 110" 120" 170" 180"

RESULT: A program using R is written to align pair of sequences using Needleman-Wunsch algorithm and executed successfully.

Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil 6291 80

2.17

PRACTICAL: 07

/ / 201 AIM:

DOTPLOT USING R

To write a R program to display DotPlot from the pair of sequences.

SOFTWARE USED: R 2.15.2 Seqinr 3.0-7: Biological Sequences Retrieval and Analysis module of R

SOURCE CODE: Online:


library("seqinr") choosebank("swissprot") query("leprae", "AC=Q9CD83") lepraeseq <- getSequence(leprae$req[[1]]) query("ulcerans", "AC=A0PQ23") ulceransseq <- getSequence(ulcerans$req[[1]]) closebank() dotPlot(lepraeseq, ulceransseq)

Offline:
library("seqinr") leprae <- read.fasta(file = "C:/Users/Ashok\ Kumar/Desktop/Q9CD83.fasta") ulcerans <- read.fasta(file = "C:/Users/Ashok\ Kumar/Desktop/A0PQ23.fasta") lepraeseq <- leprae[[1]] ulceransseq <- ulcerans[[1]] dotPlot(lepraeseq, ulceransseq)

INPUT: Sequence 1: (SwissProt ID: Q9CD83)


>sp|Q9CD83|PHBS_MYCLE Chorismate--pyruvate lyase OS=Mycobacterium (strain TN) GN=ML0133 PE=3 SV=1 MTNRTLSREEIRKLDRDLRILVATNGTLTRVLNVVANEEIVVDIINQQLLDVAPKIPELE NLKIGRILQRDILLKGQKSGILFVAAESLIVIDLLPTAITTYLTKTHHPIGEIMAASRIE TYKEDAQVWIGDLPCWLADYGYWDLPKRAVGRRYRIIAGGQPVIITTEYFLRSVFQDTPR EELDRCQYSNDIDTRSGDRFVLHGRVFKNL leprae

Sequence 2: (SwissProt ID: A0PQ23)


>tr|A0PQ23|A0PQ23_MYCUA Chorismate pyruvate-lyase OS=Mycobacterium ulcerans (strain Agy99) GN=MUL_2003 PE=4 SV=1 MLAVLPEKREMTECHLSDEEIRKLNRDLRILIATNGTLTRILNVLANDEIVVEIVKQQIQ DAAPEMDGCDHSSIGRVLRRDIVLKGRRSGIPFVAAESFIAIDLLPPEIVASLLETHRPI GEVMAASCIETFKEEAKVWAGESPAWLELDRRRNLPPKVVGRQYRVIAEGRPVIIITEYF LRSVFEDNSREEPIRHQRSVGTSARSGRSICT

Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil 629 180

2.18

I M.Sc. Bioinformatics (2012 2014)

Lab in Programming in C, PERL and R

OUTPUT:

RESULT: A program using R is written to display DotPlot from the pair of sequences and executed successfully.

Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil 6291 80

2.19

PRACTICAL: 08

/ / 201 AIM:

FILE FORMAT CONVERSION USING R

To write a program to convert a file in GenBank file format to FASTA file format using R. SOFTWARE USED: R 2.15.2 Seqinr 3.0-7: Biological Sequences Retrieval and Analysis module of R SOURCE CODE:
library("seqinr") gb2fasta(source.file = "E:/R\ Practical/AF060490.gb", destination.file = "E:/R\ Practical/AF060490.fasta")

INPUT:
File Name: AF060490.gb
LOCUS DEFINITION ACCESSION VERSION KEYWORDS SOURCE ORGANISM AF060490 2693 bp mRNA linear ROD 02-MAY-2000 Mus musculus TLS-associated protein TASR-2 mRNA, complete cds. AF060490 AF060490.1 GI:3327956 . Mus musculus (house mouse) Mus musculus Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Sciurognathi; Muroidea; Muridae; Murinae; Mus; Mus. REFERENCE 1 (bases 1 to 2693) AUTHORS Yang,L., Embree,L.J. and Hickstein,D.D. TITLE TLS-ERG leukemia fusion protein inhibits RNA splicing mediated by serine-arginine proteins JOURNAL Mol. Cell. Biol. 20 (10), 3345-3354 (2000) PUBMED 10779324 REFERENCE 2 (bases 1 to 2693) AUTHORS Yang,L., Embree,L., Tsai,S. and Hickstein,D.D. TITLE Molecular cloning of TASR-2, a TLS-associated protein with Ser-Arg repeats JOURNAL Unpublished REFERENCE 3 (bases 1 to 2693) AUTHORS Yang,L., Embree,L., Tsai,S. and Hickstein,D.D. TITLE Direct Submission JOURNAL Submitted (17-APR-1998) Medicine/Oncology, University of Washington, 1660 S. Columbian Way, GMR 151, Seattle, WA 98108, USA FEATURES Location/Qualifiers source 1..2693 /mol_type="mRNA" /db_xref="taxon:10090" /cell_line="EML" /cell_type="hematopoietic" /organism="Mus musculus" CDS 92..880 /db_xref="GI:3327957" /codon_start=1 /protein_id="AAC26715.1" /translation="MSRYLRPPNTSLFVRNVADDTRSEDLRREFGRYGPIVDVYVPLD FYTRRPRGFAYVQFEDVRDAEDALHNLDRKWICGRQIEIQFAQGDRKTPNQMKAKEGR NVYSSSRYDDYDRYRRSRSRSYERRRSRSRSFDYNYRRSYSPRNSRPTGRPRRSRSHS DNDRFKHRNRSFSRSKSNSRSRSKSQPKKEMKAKSRSRSASHTKTRGTSKTDSKTHYK SGSRYEKESRKKEPPRSKSQSRSQSRSRSKSRSRSWTSPKSSGH" /product="TLS-associated protein TASR-2" /note="contains Ser-Arg (SR) repeats" ORIGIN

Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil 629 180

2.20

I M.Sc. Bioinformatics (2012 2014)


1 61 121 181 241 301 361 421 481 541 601 661 721 781 841 901 961 1021 1081 1141 1201 1261 1321 1381 1441 1501 1561 1621 1681 1741 1801 1861 1921 1981 2041 2101 2161 2221 2281 2341 2401 2461 2521 2581 2641 // gtgtggtgtg agcgccgccg gtctctgttc tggtcgttat aagaggattt tttggacaga gaagacacca tgacgattat gagtcgctcc tggaagacca atctttttca aatgaaggct aacagattcc agaaccacct gtcaaggtct ctaggcatgt gttgcaatga caatggttat taatgttatg tttttatgct ttattagaaa tgctaataag cagacacatt attatttatg ccaactgggt cactgctttc ggaaaaccct aaaagactgt gggttttgtt ttttgtttgt catgtgcgtt cagtttattc tggatttttg taggacatga caagattgtt agctagtgtc ggacgatcct agtccttgct gggttaattc ttgcagttaa ttagattttc cagttctagc ttctaaaaag tcaatgtttg atgctataat agtggatgtg gcccagagtc gtcaggaacg ggtccaatag gcatatgttc aaatggattt aatcaaatga gaccgatata tttgattata cggcgtagcc agatctaaat aaatcacgtt aaaacacatt agatccaaat tggactagtc atcattcatt tgcgttttaa aattaaaaag ggcagctacc tgaagtattt taaattttca atagaatgat taggttcagg atgcaatgtg taaatagcgt cccacctcaa cccccttaag tcttagcata ttgttgtttg ttatatgcat tgtatcaaaa atgtgagaca gggaaagttg ctaaaaaata ttggaggcaa tctaagatct cctggaggat tgtagacaac tagaaccttt tgttcttgtt ctggatgtat aaagaggaaa gtatttttat gatcaaaaca aaaaacaaaa agccgccgcc ccgccgccac tggcggacga tagatgttta aatttgagga gtgggcgtca aagccaagga gacgctctcg actataggag gaagccattc ccaattcaag ctaggtctgc ataagtctgg ctcagtcaag ccaagtccag tactcatagt aaacaaacaa atgcgctgtt aatttgtggt ggtgaaaaga tatccatatt acagaagtga gtggacttta gaatagttct tttcagggag tccctgctgg ttgtgagtga gttgctctaa ttttttctgt agtgtaaaac tctgcctgtt ggctgagaag agttttaagc acgaaatgat taaaatgact tagattcgtt taccttattt aaaagatggt gtatatttaa ctgctatgca agtttaaaca gttgtggggt atgttctttt agacccagct tgaaggaaaa ggagctgcgg catgtcccga caccaggtct tgtcccactt tgttcgtgat gattgaaatc agggaggaat aagccggagt atcttacagt cgacaatgat atcacggtcc atctcacacc ctcaagatat atcacagtct tggccactga ttggtatact acttaacttg gagaagccac gtctctgtat tgttggttga tggtagaact agtggtagcc tgtcttgtca tttgttaaat agtgcccttt tttctaagat gtcagtgatc atttaactat tagaaacaag aaactgaatt cttcataggg ataacactcc aaaagccaca acccttaaat tgaaatgtcc gcttacagat ttttcctttc tttaagaact tagtatttct atcatttata aagtctattt taaactttgt taacaaatat tattttctgc aaaaaaaaaa

Lab in Programming in C, PERL and R


acggtttgcc tacctgcgcc gaagatttac gatttctaca gctgaagacg cagttcgcac gtatacagct tatgaaagga cctagaaaca agattcaaac aagtcccagc aaaactagag gaaaaggaat aggtctaggt tagtataaat taaattatca ttagttttcc tcttaagagt atttttgtaa ccataatttg gttaacctag acattacaac agatgtctaa ccaccatctg tcactgaaac atggaacatt acagaaacca tgttgatcat tgaactgttt ttgatgctca gaggcttgct taggtgattt tcacttagtt atttataatt ggtgtcattt gtgagtgaag gattttgttt gtttgtggaa aacttttatt tgcacgtttc aaaactgtag attttctttc tgtgtacaac ttgctgtaaa aaaaaaaaaa cgagcccgtt cccctaacac gtcgggaatt ctcggcgtcc ctttacataa agggggatcg cttcacgata gaagatcgag gtagaccgac accgaaatcg ccaagaaaga gcacctctaa caaggaaaaa caaaatctag tatgatactt ggaatacaat ctgtactggg ccagtttgtt agattctcat caacattgtc aaatgtagct actgactgct gcccatgatg gggattgatg atggagcctt aaagcataag ttgtaagggg tgttatttag gaaaatacat cagcacttac cttcacacct tgtggtgccg tttggtaatg tctagtattt cagaatacaa atactgtggg ttagaaattt tgtgtttgga tctttactgt tttaattttt cggtagtttg ttatagaagc ctttaaaaca ttaagcaaag aaa

OUTPUT:
File Name: AF060490.fasta
>AF060490 2693 bp gtgtggtgtgagtggatgtgagccgccgccggagctgcggacggtttgcccgagcccgtt agcgccgccggcccagagtcccgccgccaccatgtcccgatacctgcgcccccctaacac gtctctgttcgtcaggaacgtggcggacgacaccaggtctgaagatttacgtcgggaatt tggtcgttatggtccaatagtagatgtttatgtcccacttgatttctacactcggcgtcc aagaggatttgcatatgttcaatttgaggatgttcgtgatgctgaagacgctttacataa tttggacagaaaatggatttgtgggcgtcagattgaaatccagttcgcacagggggatcg gaagacaccaaatcaaatgaaagccaaggaagggaggaatgtatacagctcttcacgata tgacgattatgaccgatatagacgctctcgaagccggagttatgaaaggagaagatcgag gagtcgctcctttgattataactataggagatcttacagtcctagaaacagtagaccgac tggaagaccacggcgtagccgaagccattccgacaatgatagattcaaacaccgaaatcg atctttttcaagatctaaatccaattcaagatcacggtccaagtcccagcccaagaaaga aatgaaggctaaatcacgttctaggtctgcatctcacaccaaaactagaggcacctctaa aacagattccaaaacacattataagtctggctcaagatatgaaaaggaatcaaggaaaaa agaaccacctagatccaaatctcagtcaagatcacagtctaggtctaggtcaaaatctag gtcaaggtcttggactagtcccaagtccagtggccactgatagtataaattatgatactt ctaggcatgtatcattcatttactcatagtttggtatacttaaattatcaggaatacaat gttgcaatgatgcgttttaaaaacaaacaaacttaacttgttagttttccctgtactggg caatggttataattaaaaagatgcgctgttgagaagccactcttaagagtccagtttgtt taatgttatgggcagctaccaatttgtggtgtctctgtatatttttgtaaagattctcat tttttatgcttgaagtatttggtgaaaagatgttggttgaccataatttgcaacattgtc

Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil 6291 80

2.21

I M.Sc. Bioinformatics (2012 2014)

Lab in Programming in C, PERL and R

ttattagaaataaattttcatatccatatttggtagaactgttaacctagaaatgtagct tgctaataagatagaatgatacagaagtgaagtggtagccacattacaacactgactgct cagacacatttaggttcagggtggactttatgtcttgtcaagatgtctaagcccatgatg attatttatgatgcaatgtggaatagttcttttgttaaatccaccatctggggattgatg ccaactgggttaaatagcgttttcagggagagtgcccttttcactgaaacatggagcctt cactgctttccccacctcaatccctgctggtttctaagatatggaacattaaagcataag ggaaaaccctcccccttaagttgtgagtgagtcagtgatcacagaaaccattgtaagggg aaaagactgttcttagcatagttgctctaaatttaactattgttgatcattgttatttag gggttttgttttgttgtttgttttttctgttagaaacaagtgaactgtttgaaaatacat ttttgtttgtttatatgcatagtgtaaaacaaactgaattttgatgctcacagcacttac catgtgcgtttgtatcaaaatctgcctgttcttcataggggaggcttgctcttcacacct cagtttattcatgtgagacaggctgagaagataacactcctaggtgattttgtggtgccg tggatttttggggaaagttgagttttaagcaaaagccacatcacttagtttttggtaatg taggacatgactaaaaaataacgaaatgatacccttaaatatttataatttctagtattt caagattgttttggaggcaataaaatgacttgaaatgtccggtgtcatttcagaatacaa agctagtgtctctaagatcttagattcgttgcttacagatgtgagtgaagatactgtggg ggacgatcctcctggaggattaccttatttttttcctttcgattttgtttttagaaattt agtccttgcttgtagacaacaaaagatggttttaagaactgtttgtggaatgtgtttgga gggttaattctagaacctttgtatatttaatagtatttctaacttttatttctttactgt ttgcagttaatgttcttgttctgctatgcaatcatttatatgcacgtttctttaattttt ttagattttcctggatgtatagtttaaacaaagtctatttaaaactgtagcggtagtttg cagttctagcaaagaggaaagttgtggggttaaactttgtattttctttcttatagaagc ttctaaaaaggtatttttatatgttctttttaacaaatattgtgtacaacctttaaaaca tcaatgtttggatcaaaacaagacccagcttattttctgcttgctgtaaattaagcaaag atgctataataaaaacaaaatgaaggaaaaaaaaaaaaaaaaaaaaaaaaaaa

RESULT: A program using R is written to convert a file in GenBank file format to FASTA file format and executed successfully.

Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil 6291 80

2.22

PRACTICAL: 09

/ / 201 AIM:

HYPOTHESIS t-TEST USING R

To write a R program to compute t-test value from two variables and conclude the hypothesis. SOFTWARE USED: R 2.15.2 PROBLEM/SOURCE CODE: 1. One sample t-test Problem: An outbreak of Salmonella related illness was attributed to ice cream produced at a certain factory. Scientists measured the level of Salmonella in 9 randomly sampled batches of ice cream. The levels (in MPN/g) were: 0.593 0.142 0.329 0.691 0.231 0.793 0.519 0.392 0.418 Is there evidence that the mean level of Salmonella in the ice cream is greater than 0.3 MPN/g? SourceCode:
x = c(0.593, 0.142, 0.329, 0.691, 0.231, 0.793, 0.519, 0.392, 0.418) t.test(x, alternative="greater", mu=0.3)

Output:
One Sample t-test data: x t = 2.2051, df = 8, p-value = 0.02927 alternative hypothesis: true mean is greater than 0.3 95 percent confidence interval: 0.3245133 Inf sample estimates: mean of x 0.4564444

Conclusion: From the output we see that the p-value = 0.029. Hence, there is moderately strong evidence that the mean Salmonella level in the ice cream is above 0.3 MPN/g.

2. Two sample t-test Problem: Subjects were given a drug (treatment group) and an additional 6 subjects a placebo (control group). Their reaction time to a stimulus was measured (in ms). We want to perform a twosample t-test for comparing the means of the treatment and control groups. Control (x) Treatment (y) 91 101 87 110 99 103 77 93 88 99 91 104
2.23

Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil 629 180

I M.Sc. Bioinformatics (2012 2014)

Lab in Programming in C, PERL and R

SourceCode:
Control = c(91, 87, 99, 77, 88, 91) Treat = c(101, 110, 103, 93, 99, 104) t.test(Control,Treat,alternative="less", var.equal=TRUE) t.test(Control,Treat,alternative="less")

Output:
Two Sample t-test data: Control and Treat t = -3.4456, df = 10, p-value = 0.003136 alternative hypothesis: true difference in means is less than 0 95 percent confidence interval: -Inf -6.082744 sample estimates: mean of x mean of y 88.83333 101.66667

Welch Two Sample t-test data: Control and Treat t = -3.4456, df = 9.48, p-value = 0.003391 alternative hypothesis: true difference in means is less than 0 95 percent confidence interval: -Inf -6.044949 sample estimates: mean of x mean of y 88.83333 101.66667

Conclusion: Here the pooled t-test and the Welsh t-test give roughly the same results (p-value = 0.00313 and 0.00339, respectively).

3. Paired t-test Problem: A study was performed to test whether cars get better mileage on premium gas than on regular gas. Each of 10 cars was first filled with either regular or premium gas, decided by a coin toss, and the mileage for that tank was recorded. The mileage was recorded again for the same cars using the other kind of gasoline. We use a paired t-test to determine whether cars get significantly better mileage with premium gas. Regular (x) Premium (y) 16 19 20 22 21 24 22 24 23 25 22 25 27 26 25 26 27 28 28 32

SourceCode:
reg = c(16, 20, 21, 22, 23, 22, 27, 25, 27, 28) prem = c(19, 22, 24, 24, 25, 25, 26, 26, 28, 32) t.test(prem,reg,alternative="greater", paired=TRUE)

Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil 6291 80

2.24

I M.Sc. Bioinformatics (2012 2014)

Lab in Programming in C, PERL and R

Output:
Paired t-test data: prem and reg t = 4.4721, df = 9, p-value = 0.0007749 alternative hypothesis: true difference in means is greater than 0 95 percent confidence interval: 1.180207 Inf sample estimates: mean of the differences 2

Conclusion: The results show that the t-statistic is equal to 4.47 and the p-value is 0.00075. Since the p-value is very low, we reject the null hypothesis. There is strong evidence of a mean increase in gas mileage between regular and premium gasoline. RESULT: A program using R is written to compute t-test value from two variables and concluded the hypothesis and executed successfully.

Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil 6291 80

2.25

PRACTICAL: 10

/ / 201 AIM:

RETRIEVE SEQUENCE FROM DATABASE USING R

To write a R program to download a nucleotide/protein sequence from a biological sequence database. SOFTWARE USED: R 2.15.2 Seqinr 3.0-7: Biological Sequences Retrieval and Analysis module of R SOURCE CODE:
library("seqinr") choosebank("swissprot") query("seq_id", "AC=Q9CD82") seqs <- getSequence(seq_id$req[[1]]) closebank() write.fasta(names="Q9CD82", sequences=seqs, file.out="E:/R\ Practical/Q9CD82.fasta")

INPUT/OUTPUT:
>Q9CD82 MRSENLAALLARQAAEAGWYDKPAYFAPDVVTHGQIHDGAVRLGEVLRNRGLSAGDRVLL CLPDSPDLVQLLLACLARGIMAFLANPELHRDDYAFPERDTAAALVITNGSLRDRFQSSN VVEPAELLSDATRVEPSDYEPVSGDAYAFATYTSGTTGKPKAAIHRHADPFTFVDAMCRK ALRLTPQDIGLCSARMYFAYGLGNSVWFPLATGGSAVISSVPVSAESAAMLSTRFEPSVL YGVPSFFARVVGACSPDSFRSLRCVVTAGEALEPALAERLVEFFGGIPILDGIGSSEVGQ TFVSNSVDDWRVGTLGKVLPPYEIRVVAPDGATAGSGIEGNLWVRGPSIAQSYWNRPDSL LENGDWLNTRDRVRIDGDGWVTYGCRADDTEIVGGVNINPREVERLIIEADAVAEAAVVG VREFTGASTLQAFLVPAVGAFIDESVMRDVHRRLLTQLTAFKVPHRFAIIERLPRSTNGK LLRNVLRAQSPTKPIWELSLTESQSATKAQLDGRPASNAHAQAAVGHAAGATLKQRLSAL QQERERLVVEAVCAEAVKMLGESDPGLINRDLAFSDLGFDSQMTVTLCNRLAVVTGLRLP ETVGWDYGSISGLSRYLEAELSGVRSRPETPLSANSGAKGLSPIDEELKKVEEMVVAIGA SEKQRVADRLRALLGIIVDGEAGLSKRIQAASTPDEIFQLIDSELCE

RESULT: A program using R is written to download a nucleotide/protein sequence from a biological sequence database and executed successfully.

Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil 629 180

2.26

Vous aimerez peut-être aussi