Académique Documents
Professionnel Documents
Culture Documents
/ / 201 AIM:
To write a PERL program to find transcription/translation/complement/reverse complement of a DNA/RNA/Protein sequence from users choice. SOFTWARE USED: Perl 5.16.2 SOURCE CODE:
x: system("cls"); print "\nCentral Dogma Menu:-\n"; print "------------------\n"; print "0. Exit\n"; print "1. Complement\n"; print "2. Reverse Complement\n"; print "3. Transcription\n"; print "4. Translation\n"; print "\nEnter your choice: "; $choice = <>; if ($choice == 1) { &Complement; } elsif ($choice == 2) { &RevComplement; } elsif ($choice == 3) { &Transcription; } elsif ($choice == 4) { &Translation; } elsif ($choice == 0) { exit; } else { print "Enter a valid number !!!\n"; <>; goto x; } sub Complement() { system("cls"); print "Enter the DNA sequence:\n"; $seq = <>; chomp($seq); $seq =~s/[^actg]//ig; $seq =~ tr/ATCGatcg/TAGCtagc/; print "\nComplement of the DNA sequence is:\n$seq"; Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil 629 180 2.1
sub RevComplement() { system("cls"); print "Enter the DNA sequence:\n"; $seq = <>; chomp($seq); $seq =~s/[^actg]//ig; $seq =~ tr/ATCGatcg/TAGCtagc/; $seq = reverse($seq); print "\nReverse complement of the DNA sequence is:\n$seq"; <>; goto x; } sub Transcription() { system("cls"); print "Enter the DNA sequence:\n"; $seq = <>; chomp($seq); $seq =~s/[^actg]//ig; $seq =~ tr/Tt/Uu/; print "\nTranscribed RNA sequence is:\n$seq"; <>; goto x; } sub Translation() { system("cls"); print "Enter the DNA sequence:\n"; $seq = <>; chomp($seq); $seq =~s/[^actg]//ig; $seq =~ tr/Tt/Uu/; my $seq = uc($seq); my %CodonMap = ( 'GCA'=>'A', 'GCC'=>'A', 'GCG'=>'A', 'GCU'=>'A', 'UGC'=>'C', 'UGU'=>'C', 'GAC'=>'D', 'GAU'=>'D', 'GAA'=>'E', 'GAG'=>'E', 'UUC'=>'F', 'UUU'=>'F', 'GGA'=>'G', 'GGC'=>'G', 'GGG'=>'G', 'GGU'=>'G', 'CAC'=>'H', 'CAU'=>'H', 'AUA'=>'I', 'AUC'=>'I', 'AUU'=>'I', 'AAA'=>'K', 'AAG'=>'K', 'UUA'=>'L', 'UUG'=>'L', 'CUA'=>'L', 'CUC'=>'L', 'CUG'=>'L', 'CUU'=>'L', 'AUG'=>'M', 'AAC'=>'N', 'AAU'=>'N', 'CCA'=>'P', 'CCC'=>'P', 'CCG'=>'P', 'CCU'=>'P', 'CAA'=>'Q', 'CAG'=>'Q', 'CGA'=>'R', 'CGC'=>'R', 'CGG'=>'R', 'CGU'=>'R', 'AGA'=>'R', 'AGG'=>'R', 'UCA'=>'S', 'UCC'=>'S', 'UCG'=>'S', 'UCU'=>'S', 'AGC'=>'S', 'AGU'=>'S', 'ACA'=>'T', 'ACC'=>'T', 'ACG'=>'T', 'ACU'=>'T', 'GUA'=>'V', 'GUC'=>'V', 'GUG'=>'V', 'GUU'=>'V', 'UGG'=>'W', 'UAC'=>'Y', 'UAU'=>'Y', 'UAA'=>'_', 'UAG'=>'_', 'UGA'=>'_'); my $protein = ""; for (my $i=0; $i<length($seq)-2; $i+=3) { Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil 6291 80 2.2
I M.Sc. Bioinformatics (2012 2014) Lab in Programming in C, PERL and R $codon = substr($seq,$i,3); $protein .= $CodonMap{$codon}; } print "\nTranslated protein sequence is:\n$protein"; <>; goto x; }
INPUT/OUTPUT:
Central Dogma Menu:-----------------0. Exit 1. Complement 2. Reverse Complement 3. Transcription 4. Translation Enter your choice: 4
Enter the DNA sequence: ACCGCCGTCTCCATTCTTCCAGGATCCGGCGTAATGGTGCACCACCAGTTTTCGCCCAGTCTTCTTGTCT Translated protein sequence is: TAVSILPGSGVMVHHQFSPSLLV
RESULT: A program in PERL is written to find transcription/translation/complement/reverse complement of a DNA/RNA/Protein sequence from users choice and executed successfully.
Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil 6291 80
2.3
PRACTICAL: 02
/ / 201 AIM:
To write a PERL program to translate a DNA sequence in all six reading frames. SOFTWARE USED: Perl 5.16.2 SOURCE CODE:
system("cls"); print "Six Reading Frames:-\n"; print "------------------\n\n"; print "Enter the DNA sequence:\n"; $seq = <>; chomp($seq); $seq =~s/[^actg]//ig; $seq =~ tr/Tt/Uu/; my $seq = uc($seq); my %CodonMap = ( 'GCA'=>'A', 'GCC'=>'A', 'GCG'=>'A', 'GCU'=>'A', 'UGC'=>'C', 'UGU'=>'C', 'GAC'=>'D', 'GAU'=>'D', 'GAA'=>'E', 'GAG'=>'E', 'UUC'=>'F', 'UUU'=>'F', 'GGA'=>'G', 'GGC'=>'G', 'GGG'=>'G', 'GGU'=>'G', 'CAC'=>'H', 'CAU'=>'H', 'AUA'=>'I', 'AUC'=>'I', 'AUU'=>'I', 'AAA'=>'K', 'AAG'=>'K', 'UUA'=>'L', 'UUG'=>'L', 'CUA'=>'L', 'CUC'=>'L', 'CUG'=>'L', 'CUU'=>'L', 'AUG'=>'M', 'AAC'=>'N', 'AAU'=>'N', 'CCA'=>'P', 'CCC'=>'P', 'CCG'=>'P', 'CCU'=>'P', 'CAA'=>'Q', 'CAG'=>'Q', 'CGA'=>'R', 'CGC'=>'R', 'CGG'=>'R', 'CGU'=>'R', 'AGA'=>'R', 'AGG'=>'R', 'UCA'=>'S', 'UCC'=>'S', 'UCG'=>'S', 'UCU'=>'S', 'AGC'=>'S', 'AGU'=>'S', 'ACA'=>'T', 'ACC'=>'T', 'ACG'=>'T', 'ACU'=>'T', 'GUA'=>'V', 'GUC'=>'V', 'GUG'=>'V', 'GUU'=>'V', 'UGG'=>'W', 'UAC'=>'Y', 'UAU'=>'Y', 'UAA'=>'_', 'UAG'=>'_', 'UGA'=>'_'); my $protein = ""; for (my $i=0; $i<length($seq)-2; $i+=3) { $codon = substr($seq,$i,3); $protein .= $CodonMap{$codon}; } print "\nForward Frame 1:\n$protein\n"; my $protein = ""; for (my $i=1; $i<length($seq)-2; $i+=3) { $codon = substr($seq,$i,3); $protein .= $CodonMap{$codon}; } print "\nForward Frame 2:\n$protein\n"; my $protein = ""; for (my $i=2; $i<length($seq)-2; $i+=3) { $codon = substr($seq,$i,3); Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil 629 180 2.4
I M.Sc. Bioinformatics (2012 2014) $protein .= $CodonMap{$codon}; } print "\nForward Frame 3:\n$protein\n"; my $protein = ""; $rev_seq = reverse($seq); for (my $i=0; $i<length($rev_seq)-2; $i+=3) { $codon = substr($rev_seq,$i,3); $protein .= $CodonMap{$codon}; } print "\nReverse Frame 1:\n$protein\n"; my $protein = ""; $rev_seq = reverse($seq); for (my $i=1; $i<length($rev_seq)-2; $i+=3) { $codon = substr($rev_seq,$i,3); $protein .= $CodonMap{$codon}; } print "\nReverse Frame 2:\n$protein\n"; my $protein = ""; $rev_seq = reverse($seq); for (my $i=2; $i<length($rev_seq)-2; $i+=3) { $codon = substr($rev_seq,$i,3); $protein .= $CodonMap{$codon}; } print "\nReverse Frame 3:\n$protein\n"; <>;
INPUT/OUTPUT:
Six Reading Frames:-----------------Enter the DNA sequence: ACCGCCGTCTCCATTCTTCCAGGATCCGGCGTAATGGTGCACCACCAGTTTTCGCCCAGTCTTCTTGTCT Forward Frame 1: TAVSILPGSGVMVHHQFSPSLLV Forward Frame 2: PPSPFFQDPA_WCTTSFRPVFLS Forward Frame 3: RRLHSSRIRRNGAPPVFAQSSC Reverse Frame 1: SVLLTRF_PPRGNAA_DLLTSAA Reverse Frame 2: LFF_PAFDHHVVMRPRTFLPLPP Reverse Frame 3: CSSDPLLTTTW_CGLGPSYLCR
RESULT: A program in PERL is written to translate a DNA sequence in all six reading frames and executed successfully.
Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil 6291 80
2.5
PRACTICAL: 03
/ / 201 AIM:
To write a BioPERL program to download a nucleotide/protein sequence from a biological sequence database. SOFTWARE USED: Perl 5.16.2 BioPerl 1.6.1 SOURCE CODE: Gene sequence retrieval from GenBank database
system("cls"); use strict; use Bio::SeqIO; use Bio::DB::GenBank; my $genBank = Bio::DB::GenBank->new; print "\nGenBank Sequence Download:-"; print "\n-------------------------\n"; print "\nAccession No. (AF060485):\n"; my $acc = <>; chomp($acc); my $seq = $genBank->get_Seq_by_acc($acc); my $seqOut = Bio::SeqIO->new(-file => ">$acc.fasta", -format => 'fasta'); $seqOut->write_seq($seq); print "\nDownloaded Successfuly!"; <>;
>AF060490 Mus musculus TLS-associated protein TASR-2 mRNA, complete cds. GTGTGGTGTGAGTGGATGTGAGCCGCCGCCGGAGCTGCGGACGGTTTGCCCGAGCCCGTT AGCGCCGCCGGCCCAGAGTCCCGCCGCCACCATGTCCCGATACCTGCGCCCCCCTAACAC GTCTCTGTTCGTCAGGAACGTGGCGGACGACACCAGGTCTGAAGATTTACGTCGGGAATT TGGTCGTTATGGTCCAATAGTAGATGTTTATGTCCCACTTGATTTCTACACTCGGCGTCC AAGAGGATTTGCATATGTTCAATTTGAGGATGTTCGTGATGCTGAAGACGCTTTACATAA TTTGGACAGAAAATGGATTTGTGGGCGTCAGATTGAAATCCAGTTCGCACAGGGGGATCG GAAGACACCAAATCAAATGAAAGCCAAGGAAGGGAGGAATGTATACAGCTCTTCACGATA TGACGATTATGACCGATATAGACGCTCTCGAAGCCGGAGTTATGAAAGGAGAAGATCGAG GAGTCGCTCCTTTGATTATAACTATAGGAGATCTTACAGTCCTAGAAACAGTAGACCGAC TGGAAGACCACGGCGTAGCCGAAGCCATTCCGACAATGATAGATTCAAACACCGAAATCG ATCTTTTTCAAGATCTAAATCCAATTCAAGATCACGGTCCAAGTCCCAGCCCAAGAAAGA AATGAAGGCTAAATCACGTTCTAGGTCTGCATCTCACACCAAAACTAGAGGCACCTCTAA AACAGATTCCAAAACACATTATAAGTCTGGCTCAAGATATGAAAAGGAATCAAGGAAAAA AGAACCACCTAGATCCAAATCTCAGTCAAGATCACAGTCTAGGTCTAGGTCAAAATCTAG GTCAAGGTCTTGGACTAGTCCCAAGTCCAGTGGCCACTGATAGTATAAATTATGATACTT CTAGGCATGTATCATTCATTTACTCATAGTTTGGTATACTTAAATTATCAGGAATACAAT GTTGCAATGATGCGTTTTAAAAACAAACAAACTTAACTTGTTAGTTTTCCCTGTACTGGG
Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil 629 180
2.6
CAATGGTTATAATTAAAAAGATGCGCTGTTGAGAAGCCACTCTTAAGAGTCCAGTTTGTT TAATGTTATGGGCAGCTACCAATTTGTGGTGTCTCTGTATATTTTTGTAAAGATTCTCAT TTTTTATGCTTGAAGTATTTGGTGAAAAGATGTTGGTTGACCATAATTTGCAACATTGTC TTATTAGAAATAAATTTTCATATCCATATTTGGTAGAACTGTTAACCTAGAAATGTAGCT TGCTAATAAGATAGAATGATACAGAAGTGAAGTGGTAGCCACATTACAACACTGACTGCT CAGACACATTTAGGTTCAGGGTGGACTTTATGTCTTGTCAAGATGTCTAAGCCCATGATG ATTATTTATGATGCAATGTGGAATAGTTCTTTTGTTAAATCCACCATCTGGGGATTGATG CCAACTGGGTTAAATAGCGTTTTCAGGGAGAGTGCCCTTTTCACTGAAACATGGAGCCTT CACTGCTTTCCCCACCTCAATCCCTGCTGGTTTCTAAGATATGGAACATTAAAGCATAAG GGAAAACCCTCCCCCTTAAGTTGTGAGTGAGTCAGTGATCACAGAAACCATTGTAAGGGG AAAAGACTGTTCTTAGCATAGTTGCTCTAAATTTAACTATTGTTGATCATTGTTATTTAG GGGTTTTGTTTTGTTGTTTGTTTTTTCTGTTAGAAACAAGTGAACTGTTTGAAAATACAT TTTTGTTTGTTTATATGCATAGTGTAAAACAAACTGAATTTTGATGCTCACAGCACTTAC CATGTGCGTTTGTATCAAAATCTGCCTGTTCTTCATAGGGGAGGCTTGCTCTTCACACCT CAGTTTATTCATGTGAGACAGGCTGAGAAGATAACACTCCTAGGTGATTTTGTGGTGCCG TGGATTTTTGGGGAAAGTTGAGTTTTAAGCAAAAGCCACATCACTTAGTTTTTGGTAATG TAGGACATGACTAAAAAATAACGAAATGATACCCTTAAATATTTATAATTTCTAGTATTT CAAGATTGTTTTGGAGGCAATAAAATGACTTGAAATGTCCGGTGTCATTTCAGAATACAA AGCTAGTGTCTCTAAGATCTTAGATTCGTTGCTTACAGATGTGAGTGAAGATACTGTGGG GGACGATCCTCCTGGAGGATTACCTTATTTTTTTCCTTTCGATTTTGTTTTTAGAAATTT AGTCCTTGCTTGTAGACAACAAAAGATGGTTTTAAGAACTGTTTGTGGAATGTGTTTGGA GGGTTAATTCTAGAACCTTTGTATATTTAATAGTATTTCTAACTTTTATTTCTTTACTGT TTGCAGTTAATGTTCTTGTTCTGCTATGCAATCATTTATATGCACGTTTCTTTAATTTTT TTAGATTTTCCTGGATGTATAGTTTAAACAAAGTCTATTTAAAACTGTAGCGGTAGTTTG CAGTTCTAGCAAAGAGGAAAGTTGTGGGGTTAAACTTTGTATTTTCTTTCTTATAGAAGC TTCTAAAAAGGTATTTTTATATGTTCTTTTTAACAAATATTGTGTACAACCTTTAAAACA TCAATGTTTGGATCAAAACAAGACCCAGCTTATTTTCTGCTTGCTGTAAATTAAGCAAAG ATGCTATAATAAAAACAAAATGAAGGAAAAAAAAAAAAAAAAAAAAAAAAAAA
RESULT: A program using BioPERL is written to download a nucleotide/protein sequence from a biological sequence database and executed successfully.
Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil 6291 80
2.7
PRACTICAL: 04
/ / 201 AIM:
To write a program to find homologous sequences for a query sequence, from biological sequence database using RemoteBLAST using BioPERL. SOFTWARE USED: Perl 5.16.2 BioPerl 1.6.1 SOURCE CODE:
use Bio::Tools::Run::RemoteBlast; use strict; system("cls"); print "+------------------------------------+\n"; print "| Remote BLAST Program |\n"; print "+------------------------------------+\n"; print "\nEnter the following details:-\n"; print "\nProgram (blastn|blastp|blastx|tblastn|tblastx):\n"; my $prog = <>; chomp($prog); print "\nDataBase (nr|swissprot|pdb|month):\n"; my $db = <>; chomp($db); print "\nE-value (Example: 1e-10):\n"; my $e_val = <>; chomp($e_val); my @params = ('-prog' => $prog, '-data' => $db, '-expect' => $e_val, '-readmethod' => 'SearchIO'); my $factory = Bio::Tools::Run::RemoteBlast->new(@params); print "\nFile name (.fasta format):\n"; my $fname = <>; chomp($fname); my $r = $factory->submit_blast($fname); while ( my @rids = $factory->each_rid ) { for my $rid ( @rids ) { my $rc = $factory->retrieve_blast($rid); my $result = $rc->next_result(); $factory->save_output("Blast\ Output.txt"); $factory->remove_rid($rid); } } print "\nBlast output is generated successfully!"; <>;
Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil 629 180
2.8
INPUT:
+------------------------------------+ | Remote BLAST Program | +------------------------------------+ Enter the following details:Program (blastn|blastp|blastx|tblastn|tblastx): blastn DataBase (nr|swissprot|pdb|month): nr E-value (Example: 1e-10): 1e-5 File name (.fasta format): dna.fasta Blast output is generated successfully!
OUTPUT:
BLASTN 2.2.27+ Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402.
RID: F5FR6GCG015
Database: Nucleotide collection (nt) 17,084,706 sequences; 43,890,479,962 total letters Query= gi|440487466|gb|JH795076.1| Magnaporthe oryzae P131 unplaced genomic scaffold P131_scaffold00326, whole genome shotgun sequence Length=980
Sequences producing significant alignments: ref|XM_003721193.1| Magnaporthe oryzae 70-15 initiation-speci... ref|XM_003711036.1| Magnaporthe oryzae 70-15 initiation-speci... ref|XM_003660234.1| Myceliophthora thermophila ATCC 42464 gly... gb|CP003002.1| Myceliophthora thermophila ATCC 42464 chromoso... ref|XM_001935551.1| Pyrenophora tritici-repentis Pt-1C-BFP al... ref|XM_003306105.1| Pyrenophora teres f. teres 0-1 hypothetic... ref|XM_003300282.1| Pyrenophora teres f. teres 0-1 hypothetic...
ALIGNMENTS >ref|XM_003721193.1| Magnaporthe oryzae 70-15 initiation-specific alpha-1,6mannosyltransferase (MGG_02562) mRNA, complete cds Length=1412 Score = 1768 bits (1960), Expect = 0.0 Identities = 980/980 (100%), Gaps = 0/980 (0%) Strand=Plus/Minus Query Sbjct 1 1144 ACCGCCGTCTCCATTCTTCCAGGATCCGGCGTAATGGTGCACCACCAGTTTTCGCCCAGT |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| ACCGCCGTCTCCATTCTTCCAGGATCCGGCGTAATGGTGCACCACCAGTTTTCGCCCAGT 60 1085
Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil 6291 80
2.9
CTTCTTGTCTCCATGCTGTTGATTCATCGTGTCCGCAAAGGCGTAGTCGGGCAACACCAG |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| CTTCTTGTCTCCATGCTGTTGATTCATCGTGTCCGCAAAGGCGTAGTCGGGCAACACCAG CACGTCGCCCAACAGCCTGGGCTCTTTGACGTTGGCTATCTCGCCGTCGCCCACGGTCTC |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| CACGTCGCCCAACAGCCTGGGCTCTTTGACGTTGGCTATCTCGCCGTCGCCCACGGTCTC ATTCAGCGTGTTGCTCAGACTCTTCAAGATGCCCCTCGTCAACCTGCGCGGGCCCGAAAC |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| ATTCAGCGTGTTGCTCAGACTCTTCAAGATGCCCCTCGTCAACCTGCGCGGGCCCGAAAC ATCGACAATGTCGTCAACCATGTCGAGCCTGAGGTCCTGGATCCCGCCGACCTTGCGCTC |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| ATCGACAATGTCGTCAACCATGTCGAGCCTGAGGTCCTGGATCCCGCCGACCTTGCGCTC CTTGGCCTTGGCGACCAGTCCCTCGAGACCATCTTGGACGGCCATCATCATGTGCGGCGA |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| CTTGGCCTTGGCGACCAGTCCCTCGAGACCATCTTGGACGGCCATCATCATGTGCGGCGA TTTTGGTTTCGCCATGATAGTCCAACTGGCGAACTGCCGAACCCACTGGTCCACATCGAA |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| TTTTGGTTTCGCCATGATAGTCCAACTGGCGAACTGCCGAACCCACTGGTCCACATCGAA CTCCAGTCCAACAACAATTTTGGCTTGGTCTTTGTACTGCTTGGGAACCCATTCGCTGAT |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| CTCCAGTCCAACAACAATTTTGGCTTGGTCTTTGTACTGCTTGGGAACCCATTCGCTGAT CGGTGCCTCGCACGAGACGTCCAGGTCGTTCCACACCCCGCCGAACTCGTACAGGATCAG |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| CGGTGCCTCGCACGAGACGTCCAGGTCGTTCCACACCCCGCCGAACTCGTACAGGATCAG GTAGCGGAGAAGATCGGCCTTGATGATTGGAATCCTAATGGCGAGGAAATTGGCGACCAC |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| GTAGCGGAGAAGATCGGCCTTGATGATTGGAATCCTAATGGCGAGGAAATTGGCGACCAC GTCAGGGCGCAAGGCGGCAAAGTGTCTCTTGACAAAGGCGTCGCCTGATTCGTCCGTCAG |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| GTCAGGGCGCAAGGCGGCAAAGTGTCTCTTGACAAAGGCGTCGCCTGATTCGTCCGTCAG GAACTCGACCTTGAAGCCGGGGTTCTTGGACACACAGGAGTCGACGTGGGGCTTGAGGTC |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| GAACTCGACCTTGAAGCCGGGGTTCTTGGACACACAGGAGTCGACGTGGGGCTTGAGGTC GTCCTTCAAGCCTGCAGGCCCGAGTTTGTACCACAGCCTTTGTGGTAGTGCTGCGACGGC |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| GTCCTTCAAGCCTGCAGGCCCGAGTTTGTACCACAGCCTTTGTGGTAGTGCTGCGACGGC AGTCGGCCCCGAGCTCGACGCGGCAGACGTGGTGGTTTCTTGTGCCAGCAGCGGCGCGGG |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| AGTCGGCCCCGAGCTCGACGCGGCAGACGTGGTGGTTTCTTGTGCCAGCAGCGGCGCGGG CTTCATCCGGGGTGTCGCAAAGGTGGGGCCGGCCTTCCATTCCGAAGGCCTGTGGAAATT |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| CTTCATCCGGGGTGTCGCAAAGGTGGGGCCGGCCTTCCATTCCGAAGGCCTGTGGAAATT GAGAATGAGGAAGCATATTGTGAGAAAGCTCAAGGCAGCCGGCACTTTGGCTGTCAAACG |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| GAGAATGAGGAAGCATATTGTGAGAAAGCTCAAGGCAGCCGGCACTTTGGCTGTCAAACG ATTGTGAAATGCCAAAATCA |||||||||||||||||||| ATTGTGAAATGCCAAAATCA 980 165
>ref|XM_003711036.1| Magnaporthe oryzae 70-15 initiation-specific alpha-1,6mannosyltransferase (MGG_08652) mRNA, complete cds Length=984 Score = 277 bits (306), Expect = 1e-70 Identities = 524/760 (69%), Gaps = 17/760 (2%)
Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil 6291 80
2.10
CATTCTTCCAGGATCCGGCGTAATGGTGCACCACCAGTTTTCGCCCAGTCTTCTTGTCTC |||| |||||||| || |||||||||||||| || || || ||||| | | |||||| | CATTTTTCCAGGACCCTGCGTAATGGTGCACTACGAGCTTCCGCCCTGGCACCTTGTCCC CATGCTGTTGATTCATCGTGTCCGCAAAGGCGTAGTCGGGCAACACCAGCACGTCGCCCA | | ||||| | || || || | || ||| |||| | | || | ||||| CCCACCACATGTTCATGGAATCTGCGAACGAGTGGTCCGGCAGTATTAAAACATTGCCCA ACAGCCTGGGCTCTTTGACGTTGGCTATCTCGCCGTCGCCCACGGTCTC-ATTCAGCGTG ||||| || || | || ||| |||| || | ||||| |||| || || | GAAGCCTCGGTTCCGTAACATTGACTATGTCATTATTCCCCAC-CTCTCGGTTGAGTGAT TTGCTCAGACTCTTCAAGATGCCCCTCGTCAACCTGCGCGGGCCCGAAACATCGACAATG ||||| || ||||| || || ||||||||||| | ||||||||| | |||| || TTGCTGAGGCTCTTGAAAATCGACCTCGTCAACCGTCTCGGGCCCGACAAGTCGATAACA TCGTCAACCATGTCGAGCCTGAGGTCCTGGATCCCGCCG-ACCTTGCGCTCCTTGGCCTT ||||| | | || | || | || ||| ||| | | || || || ||||| | TCGTCGAGCTGGTTGCGCTTCAGCTCC--GATATTGGGGTTCCAAGC-CT-TTTGGCAGT GGCGACCAGTCCCTCGAGACCATCTTGGACGGCCATCATCATGTGCGGCGATTTTGGTTT ||| ||| | |||| | | |||| ||| ||||| | | |||||| ||| || || GGCCGCCACTTCCTCCAAGCAGTCTTCGACCGCCATTAAGAAATGCGGCTGTTTGGGCTT CGCCATGATAGTCCAACTGGCGAACTGCCGAACCCACTGGTCCACATCGAACTCCAGTCC |||||||| ||||| ||||||| ||| || | |||| |||| |||||||||| || GGCCATGATGGTCCAGCTGGCGATCTGTCGCAACCACCTGTCCGTGTCGAACTCCATCCC AACAACAATTTTGGCTTGGT--CTTTGTACTGCTTGGGAACCCATTCGCTGATCGGTGCC || || | || ||| ||||||||||| ||| | ||||||| ||| ||||| GACGACGGTAGCAGC--GGTAGATTTGTACTGCTCGGGGATCCATTCGTCGATGGGTGCT TCGCACGAGACGTCCAGGTCGTTCCACACCCCGCCGAACTCGTACAGGATCAGGTAGCGG ||||| || |||||||| |||||||| | || || ||||| ||||| ||||||||| TCGCAAGACACGTCCAGATCGTTCCAGATTCCACCCTTTTCGTAGAGGATGAGGTAGCGG AGAAGATCGGCCTTGATGATTGGAATCCTAATGGCGAGGAAATTGGCGACCACGTCAGGG || || |||||||||||||||||||| || |||| |||||| || || | | || AGGAGGTCGGCCTTGATGATTGGAATGCTGATGGGGAGGAACCTGTTGATTATATTTGGA CGCAAGGCGGCAAAGTGTCTCTTGACAAAGGCGTCGCCTGATTCGTCCGTCAGGAACTCG | | | | |||| |||||||||| ||||||| || |||| ||||||||||| TTCCACGAG---TAGTGCTTCTTGACAAACTCGTCGCCCGAGACGTCGGTCAGGAACTCA ACCTTGAAGCCGGGGTTCTTGGACACACAGGAGTCGACGTGGGGCTTGAGGTCGTCCTTC || | | |||| || |||||| | || || | | || ||||||| | | | || ACATCGTAGCCTGGATTCTTG---AGGCAAGATTTTATGTATGGCTTGATATTCTTCCTC AAGCCTGCAGGCCCGAGTTTGTACCACAGCCTTTGTGGTA | || ||| | |||| ||| ||||||||| ||| ||||| ACCCCCGCACGTCCGACTTTATACCACAGCTTTTTTGGTA 767 221
71 908 131 848 190 789 250 729 309 673 369 613 429 553 487 495 547 435 607 375 667 318 727 261
>ref|XM_003660234.1| Myceliophthora thermophila ATCC 42464 glycosyltransferase family 32 protein (MYCTH_97899) mRNA, complete cds Length=699 Score = 93.3 bits (102), Expect = 3e-15 Identities = 158/229 (69%), Gaps = 9/229 (4%) Strand=Plus/Minus Query Sbjct Query Sbjct 353 344 409 284 TGCGGCGATTTTGGTTTCGCCATGATAGTCCAACTGGCGAACTGCCGAACCCA----CTG |||||||| |||||||||||||| |||||| | |||| ||| | || | | | TGCGGCGACCCCGGTTTCGCCATGATGGTCCAAATCGCGAGCTGGTGGACAAAAGGCCGG GTCCA-----CATCGAACTCCAGTCCAACAACAATTTTGGCTTGGTCTTTGTACTGCTTG | ||| ||| |||||| || || || | |||| | ||| | |||||| GGCCAGCCGACATTAAACTCCCAGCCCACGACGACGTTGGTCTCGTCCTCATACTGCGGA 408 285 463 225
Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil 6291 80
2.11
GGAACCCATTCGCTGATCGGTGCCTCGCACGAGACGTCCAGGTCGTTCCACACCCCGCCG |||| |||||||| || || | ||||||||||||||||||||| | || || || GGAATCCATTCGCCGAAGGGCGTGTCGCACGAGACGTCCAGGTCGCAGTAGACGCCTCCT AACTCGTACAGGATCAGGTAGCGGAGAAGATCGGCCTTGATGATTGGAA | | |||| |||||||||||||| ||| | |||| |||||||| TCGGAGAAGAGGAGGAGGTAGCGGAGAAGGTCGACTTTGAGGATTGGAA 572 116
>gb|CP003002.1| Myceliophthora thermophila ATCC 42464 chromosome 1, complete sequence Length=10931058 Features in this part of subject sequence: glycosyltransferase family 32 protein Score = 93.3 bits (102), Expect = 3e-15 Identities = 158/229 (69%), Gaps = 9/229 (4%) Strand=Plus/Plus Query Sbjct Query Sbjct Query Sbjct Query Sbjct 353 8013281 409 8013341 464 8013401 524 8013461 TGCGGCGATTTTGGTTTCGCCATGATAGTCCAACTGGCGAACTGCCGAACCCA----CTG |||||||| |||||||||||||| |||||| | |||| ||| | || | | | TGCGGCGACCCCGGTTTCGCCATGATGGTCCAAATCGCGAGCTGGTGGACAAAAGGCCGG GTCCA-----CATCGAACTCCAGTCCAACAACAATTTTGGCTTGGTCTTTGTACTGCTTG | ||| ||| |||||| || || || | |||| | ||| | |||||| GGCCAGCCGACATTAAACTCCCAGCCCACGACGACGTTGGTCTCGTCCTCATACTGCGGA GGAACCCATTCGCTGATCGGTGCCTCGCACGAGACGTCCAGGTCGTTCCACACCCCGCCG |||| |||||||| || || | ||||||||||||||||||||| | || || || GGAATCCATTCGCCGAAGGGCGTGTCGCACGAGACGTCCAGGTCGCAGTAGACGCCTCCT AACTCGTACAGGATCAGGTAGCGGAGAAGATCGGCCTTGATGATTGGAA | | |||| |||||||||||||| ||| | |||| |||||||| TCGGAGAAGAGGAGGAGGTAGCGGAGAAGGTCGACTTTGAGGATTGGAA 572 8013509 408 8013340 463 8013400 523 8013460
>ref|XM_001935551.1| Pyrenophora tritici-repentis Pt-1C-BFP alpha-1,6mannosyltransferase Och1, mRNA Length=846 Score = 87.8 bits (96), Expect = 1e-13 Identities = 217/327 (66%), Gaps = 12/327 (4%) Strand=Plus/Minus Query Sbjct Query Sbjct Query Sbjct Query Sbjct Query Sbjct Query Sbjct 346 585 402 525 457 465 517 405 577 345 637 288 CATCATGTGCGGCGATTTTGGTTTCGCCATGATAGTCCAACTGGCGAA----CTGCCGAA |||||| || || || |||||| |||||||||||||| || ||||| | | || | CATCATATGTGGGGACCGTGGTTTAGCCATGATAGTCCAGCTAGCGAATTGTCGGACGTA CCCACTGGTCCA-----CATCGAACTCCAGTCCAACAACAATTTTGGCTTGGTCTTTGTA | ||| || ||| ||||||||| || ||||| | ||| | | | | ||| CACACCGGGCCAGCCTTGGTCGAACTCCCACCCTACAACGAGCGAGGCGTTGGCCTGGTA CTGCTTGGGAACCCATTCGCTGATCGGTGCCTCGCACGAGACGTCCAGGTCGTTCCACAC | || |||||| || || |||| |||||| || ||||| || ||| ||| | TTCAGACGGCACCCATGTGCCAATAGGTGTCTCGCAGGATACGTCTAGATCGGACCATAT CCCGCCGAACTCGTACAGGATCAGGTAGCGGAGAAGATCGGCCTTGATGATTGGAATCCT |||||| || | |||| |||||||| || | ||| || ||| ||| || || ACCGCCGCGGTCCCAGAGGAGGAGGTAGCGCAGGAAATCTGCTTTGTAGATGGGGATGGG AATGGCGAGGAAATTGGCGACCACGTCAGGGCGCAAGGCGGCAAAGTGTCTCTTGACAAA | ||||||| | || ||||| | ||| ||||| | ||| |||| | | ||| | GAGGGCGAGGTAGTTTGCGACGATGTCCGGGCGGGAAGCG--AAAG-CTGTACGGACGTA GGCGTCGCCTGATTCGTCCGTCAGGAA | |||||| |||||| |||| ||| GTCGTCGCTGCTTTCGTCGGTCATGAA 663 262 401 526 456 466 516 406 576 346 636 289
Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil 6291 80
2.12
TCGAACTCCAGTCCAACAACAATTTTGGCTTGGTCTTTGTACTGCTTGGGAACCCATTCG ||||||||| ||| || |||| |||| | ||||||| ||| || ||||||| | TCGAACTCCCATCCCACGACAACACTGGCGTTTGCTTTGTATCGCTCCGGGACCCATTGG CTGATCGGTGCC---TCGCACGAGACGTCCAGGTCGTTCCACACCCCGCCGAACTCGTAC || ||| | || || |||||||| |||||| | || || || ||| | TCCATGGGTACTCCTTCACAGGAGACGTCGAGGTCGGCGTAGACGCCACCCTGGTCGAAG AGGATCAGGTAGCGGAGAAGATCGGCCTTGATGATTGGAATCCTAATGGCGAGGAAATTG |||| |||||||||||| | ||||| || | ||| |||||| || |||| | || AGGAGCAGGTAGCGGAGCATGTCGGCTTTCAGGATGGGAATCGGAAGACCGAGATAGTTC GCGACCACGTCAGGGCGCAAGGCGGCAAAGTGTCTCTTGACAAAGGCGTCGCCTGATTCG |||| | || || |||| || | | | |||| || | ||||| | | |||| TCGACGATATCCGGACGCATCACG--TATGCCT-TCTTTACGTATTCGTCGGCAGTTTCG TCCGTCAGGAA ||||||| ||| TCCGTCATGAA 663 319
>ref|XM_003300282.1| Pyrenophora teres f. teres 0-1 hypothetical protein, mRNA Length=939 Score = 64.4 bits (70), Expect = 2e-06 Identities = 211/327 (65%), Gaps = 12/327 (4%) Strand=Plus/Minus Query Sbjct Query Sbjct Query Sbjct Query Sbjct Query Sbjct Query Sbjct 346 582 404 522 457 462 517 402 577 342 637 285 CATCATGTGCGGCGATTTTGGTTTCGCCATGATAGTCCAACTGGCGAACTGCCGAACC-||||||||| || || ||||| || | ||||||||| || || ||||| ||||| CATCATGTGTGGGGACCGGGGTTTGGCTAGGATAGTCCAGCTAGCAAACTGACGAACGTA --CACTGGTCCACA-----TCGAACTCCAGTCCAACAACAATTTTGGCTTGGTCTTTGTA ||| || ||| ||||| ||| || || || | || | | | | ||| GACACCGGGCCAGCCTTGGTCGAATTCCCAGCCTACCACCAGCGACGCGTTGGCCTGGTA CTGCTTGGGAACCCATTCGCTGATCGGTGCCTCGCACGAGACGTCCAGGTCGTTCCACAC | || ||||| | ||| || ||||||| || ||||| || ||| ||| | TTCGGGCGGCACCCACGAGTCGATGGGCACCTCGCAGGATACGTCTAGATCGGACCATAT CCCGCCGAACTCGTACAGGATCAGGTAGCGGAGAAGATCGGCCTTGATGATTGGAATCCT ||||| || | |||| ||||||||||| | |||||| || ||| || | GCCGCCTTGGTCCCAGAGGAGGAGGTAGCGGAGGAAATCGGCTTTATAGATGGGGACGGG AATGGCGAGGAAATTGGCGACCACGTCAGGGCGCAAGGCGGCAAAGTGTCTCTTGACAAA | ||||||| | |||| ||| | ||| ||||| || |||||| | | ||| | GAGGGCGAGGTAGTTGGAGACGATGTCGGGGCGGAA---TGCAAAGGCTGTACGGACGTA GGCGTCGCCTGATTCGTCCGTCAGGAA | |||||| ||||||||||| ||| GCCGTCGCTGCTTTCGTCCGTCATGAA 663 259 403 523 456 463 516 403 576 343 636 286
Database: Nucleotide collection (nt) Posted date: Jan 12, 2013 4:14 PM Number of letters in database: 43,890,479,962 Number of sequences in database: 17,084,706 Lambda K H 0.634 0.408 0.912 Gapped Lambda K H 0.625 0.410 0.780 Matrix: blastn matrix:2 -3 Gap Penalties: Existence: 5, Extension: 2 Number of Sequences: 17084706
Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil 6291 80
2.13
RESULT: A program using BioPERL is written to find homologous sequences for a query sequence, from biological sequence database using RemoteBLAST and executed successfully.
Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil 6291 80
2.14
PRACTICAL: 05
/ / 201 AIM:
To write a BioPERL program to predict secondary structure of a protein sequence. SOFTWARE USED: Perl 5.16.2 BioPerl 1.6.1 SOURCE CODE:
system("cls"); use Bio::PrimarySeq; use Bio::Tools::Analysis::Protein::Sopma; print "Protein Secondary Structure Prediction (SOPMA):-"; print "\n----------------------------------------------\n"; print "\nEnter your query sequence:\n"; $query = <>; my $seqs = Bio::PrimarySeq->new(-seq => $query); $tool = Bio::Tools::Analysis::Protein::Sopma->new( -seq => $seqs, -window_width => 15); $tool->run(); my $raw = $tool->result(''); my @fts = $tool->result(Bio::SeqFeatureI); print "\n Predicted Regions are below:\n"; for my $ft (@fts) { print "From ", $ft->start, " to ",$ft->end, " struc: " , ($ft->each_tag_value('type'))[0],"\n"; } <>;
INPUT/OUTPUT:
Protein Secondary Structure Prediction (SOPMA):---------------------------------------------Enter your query sequence: EHIMELLIMVDALKRASAKTINIVIPYYGYARQDRKARSREPITAKLFANLLETAGATRVIALDLHAPQI Predicted Regions are below: From 1 to 20 struc: H From 43 to 54 struc: H From 55 to 56 struc: T From 25 to 42 struc: C From 21 to 24 struc: E From 59 to 64 struc: E
RESULT: A program using BioPERL is written to predict secondary structure of a protein sequence and executed successfully.
Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil 629 180
2.15
PRACTICAL: 06
/ / 201 AIM:
To write a R program to align pair of sequences using Needleman-Wunsch algorithm. SOFTWARE USED: R 2.15.2 Biostrings 2.6.6: Module for string objects representing biological sequences, and matching algorithms in R SOURCE CODE:
library("seqinr") library("Biostrings") leprae <- read.fasta(file = "E:/R\ Practical/Q9CD83.fasta") ulcerans <- read.fasta(file = "E:/R\ Practical/A0PQ23.fasta") lepraeseq <- leprae[[1]] ulceransseq <- ulcerans[[1]] lepraeseqstring <- c2s(lepraeseq) ulceransseqstring <- c2s(ulceransseq) lepraeseqstring <- toupper(lepraeseqstring) ulceransseqstring <- toupper(ulceransseqstring) globalAlignLepraeUlcerans <- pairwiseAlignment(lepraeseqstring, ulceransseqstring, substitutionMatrix = "BLOSUM50", gapOpening = -2, gapExtension = -8, scoreOnly = FALSE) printPairwiseAlignment <- function(alignment, chunksize=60, returnlist=FALSE) { require(Biostrings) # This function requires the Biostrings package seq1aln <- pattern(alignment) # Get the alignment for the first sequence seq2aln <- subject(alignment) # Get the alignment for the second sequence alnlen <- nchar(seq1aln) # Find the number of columns in the alignment starts <- seq(1, alnlen, by=chunksize) n <- length(starts) seq1alnresidues <- 0 seq2alnresidues <- 0 for (i in 1:n) { chunkseq1aln <- substring(seq1aln, starts[i], starts[i]+chunksize-1) chunkseq2aln <- substring(seq2aln, starts[i], starts[i]+chunksize-1) # Find out how many gaps there are in chunkseq1aln: gaps1 <- countPattern("-",chunkseq1aln) # countPattern() is from Biostrings package # Find out how many gaps there are in chunkseq2aln: gaps2 <- countPattern("-",chunkseq2aln) # countPattern() is from Biostrings package # Calculate how many residues of the first sequence we have printed so far in the alignment: seq1alnresidues <- seq1alnresidues + chunksize - gaps1 # Calculate how many residues of the second sequence we have printed so far in the alignment: seq2alnresidues <- seq2alnresidues + chunksize - gaps2 if (returnlist == 'FALSE') { print(paste(chunkseq1aln,seq1alnresidues)) print(paste(chunkseq2aln,seq2alnresidues)) print(paste(' ')) } } if (returnlist == 'TRUE') Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil 629 180 2.16
I M.Sc. Bioinformatics (2012 2014) Lab in Programming in C, PERL and R { vector1 <- s2c(substring(seq1aln, 1, nchar(seq1aln))) vector2 <- s2c(substring(seq2aln, 1, nchar(seq2aln))) mylist <- list(vector1, vector2) return(mylist) } } printPairwiseAlignment(globalAlignLepraeUlcerans, 60)
OUTPUT:
[1] [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] "MT-----NR--T---LSREEIRKLDRDLRILVATNGTLTRVLNVVANEEIVVDIINQQLL "MLAVLPEKREMTECHLSDEEIRKLNRDLRILIATNGTLTRILNVLANDEIVVEIVKQQIQ " " "DVAPKIPELENLKIGRILQRDILLKGQKSGILFVAAESLIVIDLLPTAITTYLTKTHHPI "DAAPEMDGCDHSSIGRVLRRDIVLKGRRSGIPFVAAESFIAIDLLPPEIVASLLETHRPI " " "GEIMAASRIETYKEDAQVWIGDLPCWLADYGYWDLPKRAVGRRYRIIAGGQPVIITTEYF "GEVMAASCIETFKEEAKVWAGESPAWLELDRRRNLPPKVVGRQYRVIAEGRPVIIITEYF " " "LRSVFQDTPREELDRCQYSNDIDTRSGDRFVLHGRVFKN 230" "LRSVFEDNSREEPIRHQRS--VGT-SA-R---SGRSICT 233" " " 50" 60" 110" 120" 170" 180"
RESULT: A program using R is written to align pair of sequences using Needleman-Wunsch algorithm and executed successfully.
Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil 6291 80
2.17
PRACTICAL: 07
/ / 201 AIM:
DOTPLOT USING R
SOFTWARE USED: R 2.15.2 Seqinr 3.0-7: Biological Sequences Retrieval and Analysis module of R
Offline:
library("seqinr") leprae <- read.fasta(file = "C:/Users/Ashok\ Kumar/Desktop/Q9CD83.fasta") ulcerans <- read.fasta(file = "C:/Users/Ashok\ Kumar/Desktop/A0PQ23.fasta") lepraeseq <- leprae[[1]] ulceransseq <- ulcerans[[1]] dotPlot(lepraeseq, ulceransseq)
Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil 629 180
2.18
OUTPUT:
RESULT: A program using R is written to display DotPlot from the pair of sequences and executed successfully.
Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil 6291 80
2.19
PRACTICAL: 08
/ / 201 AIM:
To write a program to convert a file in GenBank file format to FASTA file format using R. SOFTWARE USED: R 2.15.2 Seqinr 3.0-7: Biological Sequences Retrieval and Analysis module of R SOURCE CODE:
library("seqinr") gb2fasta(source.file = "E:/R\ Practical/AF060490.gb", destination.file = "E:/R\ Practical/AF060490.fasta")
INPUT:
File Name: AF060490.gb
LOCUS DEFINITION ACCESSION VERSION KEYWORDS SOURCE ORGANISM AF060490 2693 bp mRNA linear ROD 02-MAY-2000 Mus musculus TLS-associated protein TASR-2 mRNA, complete cds. AF060490 AF060490.1 GI:3327956 . Mus musculus (house mouse) Mus musculus Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Sciurognathi; Muroidea; Muridae; Murinae; Mus; Mus. REFERENCE 1 (bases 1 to 2693) AUTHORS Yang,L., Embree,L.J. and Hickstein,D.D. TITLE TLS-ERG leukemia fusion protein inhibits RNA splicing mediated by serine-arginine proteins JOURNAL Mol. Cell. Biol. 20 (10), 3345-3354 (2000) PUBMED 10779324 REFERENCE 2 (bases 1 to 2693) AUTHORS Yang,L., Embree,L., Tsai,S. and Hickstein,D.D. TITLE Molecular cloning of TASR-2, a TLS-associated protein with Ser-Arg repeats JOURNAL Unpublished REFERENCE 3 (bases 1 to 2693) AUTHORS Yang,L., Embree,L., Tsai,S. and Hickstein,D.D. TITLE Direct Submission JOURNAL Submitted (17-APR-1998) Medicine/Oncology, University of Washington, 1660 S. Columbian Way, GMR 151, Seattle, WA 98108, USA FEATURES Location/Qualifiers source 1..2693 /mol_type="mRNA" /db_xref="taxon:10090" /cell_line="EML" /cell_type="hematopoietic" /organism="Mus musculus" CDS 92..880 /db_xref="GI:3327957" /codon_start=1 /protein_id="AAC26715.1" /translation="MSRYLRPPNTSLFVRNVADDTRSEDLRREFGRYGPIVDVYVPLD FYTRRPRGFAYVQFEDVRDAEDALHNLDRKWICGRQIEIQFAQGDRKTPNQMKAKEGR NVYSSSRYDDYDRYRRSRSRSYERRRSRSRSFDYNYRRSYSPRNSRPTGRPRRSRSHS DNDRFKHRNRSFSRSKSNSRSRSKSQPKKEMKAKSRSRSASHTKTRGTSKTDSKTHYK SGSRYEKESRKKEPPRSKSQSRSQSRSRSKSRSRSWTSPKSSGH" /product="TLS-associated protein TASR-2" /note="contains Ser-Arg (SR) repeats" ORIGIN
Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil 629 180
2.20
OUTPUT:
File Name: AF060490.fasta
>AF060490 2693 bp gtgtggtgtgagtggatgtgagccgccgccggagctgcggacggtttgcccgagcccgtt agcgccgccggcccagagtcccgccgccaccatgtcccgatacctgcgcccccctaacac gtctctgttcgtcaggaacgtggcggacgacaccaggtctgaagatttacgtcgggaatt tggtcgttatggtccaatagtagatgtttatgtcccacttgatttctacactcggcgtcc aagaggatttgcatatgttcaatttgaggatgttcgtgatgctgaagacgctttacataa tttggacagaaaatggatttgtgggcgtcagattgaaatccagttcgcacagggggatcg gaagacaccaaatcaaatgaaagccaaggaagggaggaatgtatacagctcttcacgata tgacgattatgaccgatatagacgctctcgaagccggagttatgaaaggagaagatcgag gagtcgctcctttgattataactataggagatcttacagtcctagaaacagtagaccgac tggaagaccacggcgtagccgaagccattccgacaatgatagattcaaacaccgaaatcg atctttttcaagatctaaatccaattcaagatcacggtccaagtcccagcccaagaaaga aatgaaggctaaatcacgttctaggtctgcatctcacaccaaaactagaggcacctctaa aacagattccaaaacacattataagtctggctcaagatatgaaaaggaatcaaggaaaaa agaaccacctagatccaaatctcagtcaagatcacagtctaggtctaggtcaaaatctag gtcaaggtcttggactagtcccaagtccagtggccactgatagtataaattatgatactt ctaggcatgtatcattcatttactcatagtttggtatacttaaattatcaggaatacaat gttgcaatgatgcgttttaaaaacaaacaaacttaacttgttagttttccctgtactggg caatggttataattaaaaagatgcgctgttgagaagccactcttaagagtccagtttgtt taatgttatgggcagctaccaatttgtggtgtctctgtatatttttgtaaagattctcat tttttatgcttgaagtatttggtgaaaagatgttggttgaccataatttgcaacattgtc
Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil 6291 80
2.21
ttattagaaataaattttcatatccatatttggtagaactgttaacctagaaatgtagct tgctaataagatagaatgatacagaagtgaagtggtagccacattacaacactgactgct cagacacatttaggttcagggtggactttatgtcttgtcaagatgtctaagcccatgatg attatttatgatgcaatgtggaatagttcttttgttaaatccaccatctggggattgatg ccaactgggttaaatagcgttttcagggagagtgcccttttcactgaaacatggagcctt cactgctttccccacctcaatccctgctggtttctaagatatggaacattaaagcataag ggaaaaccctcccccttaagttgtgagtgagtcagtgatcacagaaaccattgtaagggg aaaagactgttcttagcatagttgctctaaatttaactattgttgatcattgttatttag gggttttgttttgttgtttgttttttctgttagaaacaagtgaactgtttgaaaatacat ttttgtttgtttatatgcatagtgtaaaacaaactgaattttgatgctcacagcacttac catgtgcgtttgtatcaaaatctgcctgttcttcataggggaggcttgctcttcacacct cagtttattcatgtgagacaggctgagaagataacactcctaggtgattttgtggtgccg tggatttttggggaaagttgagttttaagcaaaagccacatcacttagtttttggtaatg taggacatgactaaaaaataacgaaatgatacccttaaatatttataatttctagtattt caagattgttttggaggcaataaaatgacttgaaatgtccggtgtcatttcagaatacaa agctagtgtctctaagatcttagattcgttgcttacagatgtgagtgaagatactgtggg ggacgatcctcctggaggattaccttatttttttcctttcgattttgtttttagaaattt agtccttgcttgtagacaacaaaagatggttttaagaactgtttgtggaatgtgtttgga gggttaattctagaacctttgtatatttaatagtatttctaacttttatttctttactgt ttgcagttaatgttcttgttctgctatgcaatcatttatatgcacgtttctttaattttt ttagattttcctggatgtatagtttaaacaaagtctatttaaaactgtagcggtagtttg cagttctagcaaagaggaaagttgtggggttaaactttgtattttctttcttatagaagc ttctaaaaaggtatttttatatgttctttttaacaaatattgtgtacaacctttaaaaca tcaatgtttggatcaaaacaagacccagcttattttctgcttgctgtaaattaagcaaag atgctataataaaaacaaaatgaaggaaaaaaaaaaaaaaaaaaaaaaaaaaa
RESULT: A program using R is written to convert a file in GenBank file format to FASTA file format and executed successfully.
Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil 6291 80
2.22
PRACTICAL: 09
/ / 201 AIM:
To write a R program to compute t-test value from two variables and conclude the hypothesis. SOFTWARE USED: R 2.15.2 PROBLEM/SOURCE CODE: 1. One sample t-test Problem: An outbreak of Salmonella related illness was attributed to ice cream produced at a certain factory. Scientists measured the level of Salmonella in 9 randomly sampled batches of ice cream. The levels (in MPN/g) were: 0.593 0.142 0.329 0.691 0.231 0.793 0.519 0.392 0.418 Is there evidence that the mean level of Salmonella in the ice cream is greater than 0.3 MPN/g? SourceCode:
x = c(0.593, 0.142, 0.329, 0.691, 0.231, 0.793, 0.519, 0.392, 0.418) t.test(x, alternative="greater", mu=0.3)
Output:
One Sample t-test data: x t = 2.2051, df = 8, p-value = 0.02927 alternative hypothesis: true mean is greater than 0.3 95 percent confidence interval: 0.3245133 Inf sample estimates: mean of x 0.4564444
Conclusion: From the output we see that the p-value = 0.029. Hence, there is moderately strong evidence that the mean Salmonella level in the ice cream is above 0.3 MPN/g.
2. Two sample t-test Problem: Subjects were given a drug (treatment group) and an additional 6 subjects a placebo (control group). Their reaction time to a stimulus was measured (in ms). We want to perform a twosample t-test for comparing the means of the treatment and control groups. Control (x) Treatment (y) 91 101 87 110 99 103 77 93 88 99 91 104
2.23
Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil 629 180
SourceCode:
Control = c(91, 87, 99, 77, 88, 91) Treat = c(101, 110, 103, 93, 99, 104) t.test(Control,Treat,alternative="less", var.equal=TRUE) t.test(Control,Treat,alternative="less")
Output:
Two Sample t-test data: Control and Treat t = -3.4456, df = 10, p-value = 0.003136 alternative hypothesis: true difference in means is less than 0 95 percent confidence interval: -Inf -6.082744 sample estimates: mean of x mean of y 88.83333 101.66667
Welch Two Sample t-test data: Control and Treat t = -3.4456, df = 9.48, p-value = 0.003391 alternative hypothesis: true difference in means is less than 0 95 percent confidence interval: -Inf -6.044949 sample estimates: mean of x mean of y 88.83333 101.66667
Conclusion: Here the pooled t-test and the Welsh t-test give roughly the same results (p-value = 0.00313 and 0.00339, respectively).
3. Paired t-test Problem: A study was performed to test whether cars get better mileage on premium gas than on regular gas. Each of 10 cars was first filled with either regular or premium gas, decided by a coin toss, and the mileage for that tank was recorded. The mileage was recorded again for the same cars using the other kind of gasoline. We use a paired t-test to determine whether cars get significantly better mileage with premium gas. Regular (x) Premium (y) 16 19 20 22 21 24 22 24 23 25 22 25 27 26 25 26 27 28 28 32
SourceCode:
reg = c(16, 20, 21, 22, 23, 22, 27, 25, 27, 28) prem = c(19, 22, 24, 24, 25, 25, 26, 26, 28, 32) t.test(prem,reg,alternative="greater", paired=TRUE)
Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil 6291 80
2.24
Output:
Paired t-test data: prem and reg t = 4.4721, df = 9, p-value = 0.0007749 alternative hypothesis: true difference in means is greater than 0 95 percent confidence interval: 1.180207 Inf sample estimates: mean of the differences 2
Conclusion: The results show that the t-statistic is equal to 4.47 and the p-value is 0.00075. Since the p-value is very low, we reject the null hypothesis. There is strong evidence of a mean increase in gas mileage between regular and premium gasoline. RESULT: A program using R is written to compute t-test value from two variables and concluded the hypothesis and executed successfully.
Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil 6291 80
2.25
PRACTICAL: 10
/ / 201 AIM:
To write a R program to download a nucleotide/protein sequence from a biological sequence database. SOFTWARE USED: R 2.15.2 Seqinr 3.0-7: Biological Sequences Retrieval and Analysis module of R SOURCE CODE:
library("seqinr") choosebank("swissprot") query("seq_id", "AC=Q9CD82") seqs <- getSequence(seq_id$req[[1]]) closebank() write.fasta(names="Q9CD82", sequences=seqs, file.out="E:/R\ Practical/Q9CD82.fasta")
INPUT/OUTPUT:
>Q9CD82 MRSENLAALLARQAAEAGWYDKPAYFAPDVVTHGQIHDGAVRLGEVLRNRGLSAGDRVLL CLPDSPDLVQLLLACLARGIMAFLANPELHRDDYAFPERDTAAALVITNGSLRDRFQSSN VVEPAELLSDATRVEPSDYEPVSGDAYAFATYTSGTTGKPKAAIHRHADPFTFVDAMCRK ALRLTPQDIGLCSARMYFAYGLGNSVWFPLATGGSAVISSVPVSAESAAMLSTRFEPSVL YGVPSFFARVVGACSPDSFRSLRCVVTAGEALEPALAERLVEFFGGIPILDGIGSSEVGQ TFVSNSVDDWRVGTLGKVLPPYEIRVVAPDGATAGSGIEGNLWVRGPSIAQSYWNRPDSL LENGDWLNTRDRVRIDGDGWVTYGCRADDTEIVGGVNINPREVERLIIEADAVAEAAVVG VREFTGASTLQAFLVPAVGAFIDESVMRDVHRRLLTQLTAFKVPHRFAIIERLPRSTNGK LLRNVLRAQSPTKPIWELSLTESQSATKAQLDGRPASNAHAQAAVGHAAGATLKQRLSAL QQERERLVVEAVCAEAVKMLGESDPGLINRDLAFSDLGFDSQMTVTLCNRLAVVTGLRLP ETVGWDYGSISGLSRYLEAELSGVRSRPETPLSANSGAKGLSPIDEELKKVEEMVVAIGA SEKQRVADRLRALLGIIVDGEAGLSKRIQAASTPDEIFQLIDSELCE
RESULT: A program using R is written to download a nucleotide/protein sequence from a biological sequence database and executed successfully.
Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil 629 180
2.26