Application in Establishing Epidemiology and Variability: Genome & Protein " Sequence Analysis Programs"

Genome & Protein “ Sequence Analysis Programs”
application in establishing Epidemiology and

Variability
RAJESH KUMAR
Ph.D 1st yr
Dairy Microbiology Division
N.D.R.I
Introduction
Bio-informatics/Computational Biology:-
Proteomics:- Large-scale study of proteins.
Genomics:- study of an organism’s genome and use of

genes.
Comparative Genomics:- comparison of genomes.
Structural Genomics:- determination of

tridimensional structure of all proteins of a given
organism.
Major Research efforts of Bio-informatics:-
Sequence analysis / alignment.
gene finding.
genome assembly.
protein structure alignment.
protein structure prediction.
prediction of gene expression and protein-protein interactions.
modeling of evolution.
Sequence Analysis
Encompasses the use of various bioinformatic methods to

determine the biological function and structure of genes
and the proteins.
DNA sequences  Decoded  Stored in electronic databases

Analysis

Phylogenetic Tree 
Comparative Genomics
Shotgun Sequencing
Used in genetics for sequencing long DNA strands.
DNA  small segments  sequenced

Computer programs
Sequence Alignment:-
arrangement of two or more sequences & highlighting
their similarity.
tcctctgcctctgccatcat---caaccccaaagt
|||| ||| ||||| ||||| ||||||||||||
tcctgtgcatctgcaatcatgggcaaccccaaagt
Structural Alignment
More reliable over long evolutionary distances.
Useful in identifying structurally-conserved regions.
Multiple Alignment
extension of pairwise alignment to incorporate more than

two sequences into an alignment.
help in the identification of common regions between the

sequences.
Programs
Clustal is used in cladistics to build phylogenetic trees
Framesearch
It is extension of Smith-Waterman, for pairwise

alignment between a protein sequence and a nucleotide
sequence.
It dynamically considers every possible single-nucleotide

insertion or deletion to generate the translation that
best matches the protein sequence.
Software:-
Ssearch
Smith-Waterman remains the gold standard for protein-

protein or nucleotide-nucleotide pairwise alignment.
BLAST
An algorithm for comparing biological sequences.
Widely used tools for searching protein and DNA

databases for sequence similarities.
It gives answers of following questions:-
 Which bacterial species have a protein that is related in lineage

to a certain protein whose amino-acid sequence I know?
 Where does the DNA that I've just sequenced come from?
. What other genes encode proteins that exhibit structures or

motifs such as the one I've just determined?
To run, BLAST requires two sequences as input:
a query sequence or target sequence

a sequence database.
Search for high scoring sequence alignments.
Three stages of BLAST:-
 1st stage, BLAST searches for exact matches of a small fixed

length W between the query and sequences in the database.
 2nd stage, BLAST tries to extend the match in both directions,

starting at the seed.
If a high-scoring ungapped alignment is found, the database

sequence is passed on to 3rd stage .
 In 3rd stage BLAST performs a gapped alignment
between the query sequence and the database sequence
Alternative to BLAST is BLAT (Blast Like Alignment Tool).
FASTA:-
Slower but more sensitive than BLAST.
DNA and Protein sequence alignment software package.
The original FASTP program was designed for protein

sequence similarity searching.
FASTA provided a more sophisticated shuffling program

for evaluating statistical significance.
Programs in this package:-
"FAST-Aye", and stands for "FAST-All“.

"FAST-P" (protein) alignment.
"FAST-N" (nucleotide) alignment.
Current FASTA package contains programs for:-
protein:protein
DNA:DNA.
Protein:translated DNA
Ordered or unordered peptide searches.
Recent versions of the FASTA package include special

translated search algorithms that correctly handle
frameshift errors when comparing nucleotide to protein
sequence data.
Clustal
Clustal is a widely used multiple alignment computer

program.
i) ClustalW ii) ClustalX
Sequence Analysis Programmes:-
EMBOSS
European Molecular Biology Open Software Suite (EMBOSS) is a

program suite for nucleic acid and protein sequence analysis.
EMBOSS programs manipulate, analyze, and display nucleic acid and

protein sequences.
Similar in functionality to the commercial GCG Wisconsin Software.

PhyloGibbs
Designed to identify where these regulatory molecules bind to

DNA.
PhyloGibbs compares DNA from multiple species in order to

identify areas in which the genetic code is statistically similar
and filter segments that are most likely to be of interest to
scientists.
AutoEditor : Automated correction of sequencing and

basecaller errors
a tool for correcting sequencing and basecaller errors using

sequence alignment and chromatogram data.
On average AutoEditor corrects 80% of erroneous base calls.
It also greatly improves our ability to discover SNPs between

closely related strains and isolates of the same species.
MUMmer
System for aligning whole genome sequences. Using an efficient
data structure called a suffix tree, the system is able rapidly to
align sequences containing millions of nucleotides.
MUMmer 3.0
Open source.
Improved efficiency.
Ability to find non-unique, repetitive matches as well as unique

matches.
New graphical output modules.
Applications:-
MUMmer 1.0 was used to detect numerous large-scale inversions

in bacterial genomes.
 MUMmer 2.1 was used to align all human
chromosomes to one another and to detect numerous
large-scale.
 PROmer was used to compare the human and mouse

malaria parasites P.falciparium and P.yoelii.
Current use of MUMmer 3.0:-
8) Identifying SNPs and other mutations in a large

collection of Bacillus anthracis strains.
2) Comparing different assemblies of the same genome

at different stages of sequencing and finishing.
E.coli K12 vs. E.coli O157:H7
S.cerevisiae vs. S.pombe
A.fumigatus vs. A.nidulans
 P.falciparum vs.P.yoelii
PSORT WWW Server

PSORT is a computer program for the prediction of protein localization
sites in cells.
WoLF PSORT
WoLF PSORT Prediction
PSORT II (Recommended for animal/yeast sequences)
PSORT II Users' Manual
PSORT II Prediction
PSORT (Old version; for bacterial/plant sequences)
PSORT-B (Recommended for Gram-negative bacteria)
PSORT-B Prediction
PSORT-B, a program applicable to the sequences of Gram-negative
bacteria.
PSORT Prediction
Source of Input Sequence:
Gram-positive bacterium
Gram-negative bacterium
yeast
animal
plant
Sequence ID (Default is MYSEQ):
Enter your Amino Acid sequence below (by copy & paste):
Characters except the standard 20 codes will be removed off

To submit the query, press this button: Submit
PHIRE
This Visual Basic program performs an algorithmic string-based search

on bacteriophage genome sequences.
Discovering and extracting blocks displaying sequence similarity,

without any prior experimental or predictive knowledge.
MB Advanced DNA Analysis
MB is relatively small and easy to use program.
Main features of MB are:
restriction analysis
amino acids analysis
multiple sequence alignment tool
dot plot
calculation of molecular weights and chemical properties of proteins
prediction of 3D structures for small amino acids sequences.
UniPro DPview
This is a tool for finding and analyzing matches between
genomes.
SEQtools
Program package for routine handling and analysis of DNA
and protein sequences.
The package includes general facilities for sequence and
contig editing, restriction enzyme mapping, translation, and
repeat identification.
DNA Club
DNA analysis software,
Features:- remove vector sequence, find ORF, sequence
editing, translate to protein sequence, protein sequence
editing, RE Map, RE Map with translation, PCR primer
selection, primer or probe evaluation.
ZCURVE
New highly accurate system for recognizing protein coding
genes in bacterial and archaeal genomes based on the Z
curve theory of DNA sequence.
DNA for Windows

is a compact, easy to use DNA analysis program, ideal for
small-scale sequencing projects.

Webcutter
is a free on-line tool to help restriction map nucleotide
sequences.
Features:-
 a simple, customizable interface
 worldwide platform-independent accessibility via the web
 seamless interfaces to NCBI's GenBank
 DNA sequence database
 restriction enzyme database.
Multilocus sequence typing (MLST)
Compares sequence variation in numerous housekeeping

gene targets.
Developed for Neisseria gonorrhoeae, Streptococcus

pneumoniae, and S. aureus.
Based on the classic multilocus enzyme electrophoresis

(MLEE) method used to study the genetic variability of a
species.
Drawbacks:-
labor-intensive, time-consuming, and costly.
Single-locus sequence typing(SLST)
compares sequence variation of a single target.
provides an inexpensive, rapid, objective, and portable

genotyping method to subspeciate bacteria.
Using a single target depends on finding a region for

sequencing that is sufficiently polymorphic to provide useful
strain resolution.
Loci with short sequence repeat (SSR) regions may have

suitable variability for discriminating outbreaks.
Two S. aureus genes conserved within the species, protein A (spa) and
coagulase (coa), have variable SSR regions constructed from closely
related 24- and 81-bp tandem repeat units, respectively.
The genetic alterations in SSR regions include both point mutations and
intragenic recombination that arise by slipped-strand mispairing during
chromosomal replication and that result in a high degree of
polymorphism.

Application in Establishing Epidemiology and Variability: Genome & Protein " Sequence Analysis Programs"

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Application in Establishing Epidemiology and Variability: Genome & Protein " Sequence Analysis Programs"

Transféré par

Droits d'auteur :

Formats disponibles

Genome & Protein “ Sequence Analysis Programs”

application in establishing Epidemiology and

Proteomics:- Large-scale study of proteins.

Genomics:- study of an organism’s genome and use of

Comparative Genomics:- comparison of genomes.

Structural Genomics:- determination of

Sequence analysis / alignment.

protein structure alignment.

protein structure prediction.

prediction of gene expression and protein-protein interactions.

Encompasses the use of various bioinformatic methods to

DNA sequences  Decoded  Stored in electronic databases

Used in genetics for sequencing long DNA strands.

DNA  small segments  sequenced

More reliable over long evolutionary distances.

Useful in identifying structurally-conserved regions.

extension of pairwise alignment to incorporate more than

help in the identification of common regions between the

It is extension of Smith-Waterman, for pairwise

It dynamically considers every possible single-nucleotide

Smith-Waterman remains the gold standard for protein-

An algorithm for comparing biological sequences.

Widely used tools for searching protein and DNA

It gives answers of following questions:-

 Which bacterial species have a protein that is related in lineage

. What other genes encode proteins that exhibit structures or

a query sequence or target sequence

Search for high scoring sequence alignments.

Three stages of BLAST:-

 1st stage, BLAST searches for exact matches of a small fixed

 2nd stage, BLAST tries to extend the match in both directions,

If a high-scoring ungapped alignment is found, the database

Alternative to BLAST is BLAT (Blast Like Alignment Tool).

Slower but more sensitive than BLAST.

DNA and Protein sequence alignment software package.

The original FASTP program was designed for protein

FASTA provided a more sophisticated shuffling program

"FAST-Aye", and stands for "FAST-All“.

Current FASTA package contains programs for:-

Recent versions of the FASTA package include special

Clustal is a widely used multiple alignment computer

Sequence Analysis Programmes:-

European Molecular Biology Open Software Suite (EMBOSS) is a

EMBOSS programs manipulate, analyze, and display nucleic acid and

Similar in functionality to the commercial GCG Wisconsin Software.

Designed to identify where these regulatory molecules bind to

PhyloGibbs compares DNA from multiple species in order to

AutoEditor : Automated correction of sequencing and

a tool for correcting sequencing and basecaller errors using

On average AutoEditor corrects 80% of erroneous base calls.

It also greatly improves our ability to discover SNPs between

Ability to find non-unique, repetitive matches as well as unique

New graphical output modules.

MUMmer 1.0 was used to detect numerous large-scale inversions

 PROmer was used to compare the human and mouse

Current use of MUMmer 3.0:-

8) Identifying SNPs and other mutations in a large

2) Comparing different assemblies of the same genome

PSORT WWW Server

Sequence ID (Default is MYSEQ):

Characters except the standard 20 codes will be removed off

This Visual Basic program performs an algorithmic string-based search

Discovering and extracting blocks displaying sequence similarity,