Vous êtes sur la page 1sur 23

Genome & Protein “ Sequence Analysis Programs”

application in establishing Epidemiology and


Variability

RAJESH KUMAR
Ph.D 1st yr
Dairy Microbiology Division
N.D.R.I
Introduction
Bio-informatics/Computational Biology:-

Proteomics:- Large-scale study of proteins.

Genomics:- study of an organism’s genome and use of


genes.

Comparative Genomics:- comparison of genomes.

Structural Genomics:- determination of


tridimensional structure of all proteins of a given
organism.
Major Research efforts of Bio-informatics:-

Sequence analysis / alignment.

gene finding.

genome assembly.

protein structure alignment.

protein structure prediction.

prediction of gene expression and protein-protein interactions.

modeling of evolution.
Sequence Analysis

Encompasses the use of various bioinformatic methods to


determine the biological function and structure of genes
and the proteins.

DNA sequences  Decoded  Stored in electronic databases


Analysis


Phylogenetic Tree 
Comparative Genomics
Shotgun Sequencing

Used in genetics for sequencing long DNA strands.

DNA  small segments  sequenced


Computer programs
Sequence Alignment:-
arrangement of two or more sequences & highlighting
their similarity.

tcctctgcctctgccatcat---caaccccaaagt
|||| ||| ||||| ||||| ||||||||||||
tcctgtgcatctgcaatcatgggcaaccccaaagt
Structural Alignment

More reliable over long evolutionary distances.

Useful in identifying structurally-conserved regions.

Multiple Alignment

extension of pairwise alignment to incorporate more than


two sequences into an alignment.

help in the identification of common regions between the


sequences.

Programs
Clustal is used in cladistics to build phylogenetic trees
Framesearch

It is extension of Smith-Waterman, for pairwise


alignment between a protein sequence and a nucleotide
sequence.

It dynamically considers every possible single-nucleotide


insertion or deletion to generate the translation that
best matches the protein sequence.

Software:-

Ssearch

Smith-Waterman remains the gold standard for protein-


protein or nucleotide-nucleotide pairwise alignment.
BLAST

An algorithm for comparing biological sequences.

Widely used tools for searching protein and DNA


databases for sequence similarities.

It gives answers of following questions:-

 Which bacterial species have a protein that is related in lineage


to a certain protein whose amino-acid sequence I know?

 Where does the DNA that I've just sequenced come from?

. What other genes encode proteins that exhibit structures or


motifs such as the one I've just determined?
To run, BLAST requires two sequences as input:

a query sequence or target sequence


a sequence database.

Search for high scoring sequence alignments.

Three stages of BLAST:-

 1st stage, BLAST searches for exact matches of a small fixed


length W between the query and sequences in the database.

 2nd stage, BLAST tries to extend the match in both directions,


starting at the seed.

If a high-scoring ungapped alignment is found, the database


sequence is passed on to 3rd stage .
 In 3rd stage BLAST performs a gapped alignment
between the query sequence and the database sequence

Alternative to BLAST is BLAT (Blast Like Alignment Tool).

FASTA:-

Slower but more sensitive than BLAST.

DNA and Protein sequence alignment software package.

The original FASTP program was designed for protein


sequence similarity searching.

FASTA provided a more sophisticated shuffling program


for evaluating statistical significance.
Programs in this package:-

"FAST-Aye", and stands for "FAST-All“.


"FAST-P" (protein) alignment.
"FAST-N" (nucleotide) alignment.

Current FASTA package contains programs for:-

protein:protein
DNA:DNA.
Protein:translated DNA
Ordered or unordered peptide searches.

Recent versions of the FASTA package include special


translated search algorithms that correctly handle
frameshift errors when comparing nucleotide to protein
sequence data.
Clustal

Clustal is a widely used multiple alignment computer


program.
i) ClustalW ii) ClustalX

Sequence Analysis Programmes:-

EMBOSS

European Molecular Biology Open Software Suite (EMBOSS) is a


program suite for nucleic acid and protein sequence analysis.

EMBOSS programs manipulate, analyze, and display nucleic acid and


protein sequences.

Similar in functionality to the commercial GCG Wisconsin Software.


PhyloGibbs

Designed to identify where these regulatory molecules bind to


DNA.

PhyloGibbs compares DNA from multiple species in order to


identify areas in which the genetic code is statistically similar
and filter segments that are most likely to be of interest to
scientists.

AutoEditor : Automated correction of sequencing and


basecaller errors

a tool for correcting sequencing and basecaller errors using


sequence alignment and chromatogram data.

On average AutoEditor corrects 80% of erroneous base calls.

It also greatly improves our ability to discover SNPs between


closely related strains and isolates of the same species.
MUMmer
System for aligning whole genome sequences. Using an efficient
data structure called a suffix tree, the system is able rapidly to
align sequences containing millions of nucleotides.

MUMmer 3.0

Open source.

Improved efficiency.

Ability to find non-unique, repetitive matches as well as unique


matches.

New graphical output modules.

Applications:-

MUMmer 1.0 was used to detect numerous large-scale inversions


in bacterial genomes.
 MUMmer 2.1 was used to align all human
chromosomes to one another and to detect numerous
large-scale.

 PROmer was used to compare the human and mouse


malaria parasites P.falciparium and P.yoelii.

Current use of MUMmer 3.0:-

8) Identifying SNPs and other mutations in a large


collection of Bacillus anthracis strains.

2) Comparing different assemblies of the same genome


at different stages of sequencing and finishing.
E.coli K12 vs. E.coli O157:H7
S.cerevisiae vs. S.pombe
A.fumigatus vs. A.nidulans
 P.falciparum vs.P.yoelii

PSORT WWW Server


PSORT is a computer program for the prediction of protein localization
sites in cells.

WoLF PSORT
WoLF PSORT Prediction
PSORT II (Recommended for animal/yeast sequences)
PSORT II Users' Manual
PSORT II Prediction
PSORT (Old version; for bacterial/plant sequences)
PSORT-B (Recommended for Gram-negative bacteria)
PSORT-B Prediction
PSORT-B, a program applicable to the sequences of Gram-negative
bacteria.
PSORT Prediction
Source of Input Sequence:

Gram-positive bacterium
Gram-negative bacterium
yeast
animal
plant

Sequence ID (Default is MYSEQ):

Enter your Amino Acid sequence below (by copy & paste):

Characters except the standard 20 codes will be removed off


To submit the query, press this button: Submit
PHIRE

This Visual Basic program performs an algorithmic string-based search


on bacteriophage genome sequences.

Discovering and extracting blocks displaying sequence similarity,


without any prior experimental or predictive knowledge.

MB Advanced DNA Analysis

MB is relatively small and easy to use program.

Main features of MB are:

restriction analysis
amino acids analysis
multiple sequence alignment tool
dot plot
calculation of molecular weights and chemical properties of proteins
prediction of 3D structures for small amino acids sequences.
UniPro DPview
This is a tool for finding and analyzing matches between
genomes.

SEQtools
Program package for routine handling and analysis of DNA
and protein sequences.
The package includes general facilities for sequence and
contig editing, restriction enzyme mapping, translation, and
repeat identification.

DNA Club
DNA analysis software,
Features:- remove vector sequence, find ORF, sequence
editing, translate to protein sequence, protein sequence
editing, RE Map, RE Map with translation, PCR primer
selection, primer or probe evaluation.
ZCURVE
New highly accurate system for recognizing protein coding
genes in bacterial and archaeal genomes based on the Z
curve theory of DNA sequence.

DNA for Windows


is a compact, easy to use DNA analysis program, ideal for
small-scale sequencing projects. 
 
Webcutter
is a free on-line tool to help restriction map nucleotide
sequences.
Features:-
 a simple, customizable interface
 worldwide platform-independent accessibility via the web
 seamless interfaces to NCBI's GenBank
 DNA sequence database
 restriction enzyme database.
Multilocus sequence typing (MLST)

Compares sequence variation in numerous housekeeping


gene targets.

Developed for Neisseria gonorrhoeae, Streptococcus


pneumoniae, and S. aureus.

Based on the classic multilocus enzyme electrophoresis


(MLEE) method used to study the genetic variability of a
species.

Drawbacks:-
labor-intensive, time-consuming, and costly.
Single-locus sequence typing(SLST)

compares sequence variation of a single target.

provides an inexpensive, rapid, objective, and portable


genotyping method to subspeciate bacteria.

Using a single target depends on finding a region for


sequencing that is sufficiently polymorphic to provide useful
strain resolution.

Loci with short sequence repeat (SSR) regions may have


suitable variability for discriminating outbreaks.
Two S. aureus genes conserved within the species, protein A (spa) and
coagulase (coa), have variable SSR regions constructed from closely
related 24- and 81-bp tandem repeat units, respectively.

The genetic alterations in SSR regions include both point mutations and
intragenic recombination that arise by slipped-strand mispairing during
chromosomal replication and that result in a high degree of
polymorphism.

Vous aimerez peut-être aussi