Vous êtes sur la page 1sur 30

BIOINFORMATICS :

Focus on Primer designing and Sequence analysis


(BLAST)

G. Ravi Kumar,
Senior scientist
Division of Animal Biotechnology
Indian Veterinary Research Institute
Bioinformatics - Definition

Bioinformatics is the use of databases and computers to ask and


answer biological questions.
Bioinformatics thus comprises two main activities:
• computerized annotation of genomic and biological information
and data(databases).
• Transformation and manipulation of the data (software tools).
Overall Aim of Bioinformatics:
•Provide biologically important predictions from annotated data and
transformation / manipulation of these data.

Division of Animal Biotechnology


Indian Veterinary Research Institute
Bioinformatics - Need and Use

• Bioinformatics plays a key role in modern biology and is especially


important in:
• Molecular biology
• Genomics
• Functional genomics
• Systems biology
• Protein design and engineering
• Pharmaceutical development
• Medicine
• Ecology / population genetic

Division of Animal Biotechnology


Indian Veterinary Research Institute
The trend of data growth

21st century is a century of biotechnology:


• Genomics: New sequence information is being produced at
increasing rates. (The contents of Gen Bank double every year)

• Microarray: Global expression analysis: RNA levels of every


gene in the genome analyzed in parallel.

• Proteomics: Global protein analysis generates by large mass


spectra libraries.

• Metabolomics: Global metabolite analysis: 25,000 secondary


metabolites characterized

• Glycomics: Global sugar metabolism analysis

Division of Animal Biotechnology


Indian Veterinary Research Institute
DNA (nucleotide sequences) databases

• They are big databases and searching either one should produce
similar results because they exchange information routinely.
• GenBank (NCBI): http://www.ncbi.nlm.nih.gov
• DDBJ (DNA DataBase of Japan): http://www.ddbj.nig.ac.jp
• TIGR: http://tigr.org/tdb/tgi
• Yeast: http://yeastgenome.org
• E. coli: http://colibase.bham.ac.uk/blast/
• Specialized databases: Tissues, species…
• ESTs (Expressed Sequence Tags)
~ at NCBI http://www.ncbi.nlm.nih.gov/dbEST
~ at TIGR http://tigr.org/tdb/tgi & ...many more!
Division of Animal Biotechnology
Indian Veterinary Research Institute
Protein (amino acid) databases

• They are big databases too:


• Swiss-Prot (very high level of annotation) http://au.expasy.org/
• PIR (protein identification resource): the world's most
comprehensive catalog of information on proteins
http://www.pir.uniprot.org/
• Translated databases:
• TREMBL (translated EMBL): includes entries that have not been
annotated yet into Swiss-Prot.
http://www.ebi.ac.uk/trembl/access.html
• GenPept (translation of coding regions in GenBank)
• pdb (sequences derived from the 3D structure
Brookhaven PDB) http://www.rcsb.org/pdb/

Division of Animal Biotechnology


Indian Veterinary Research Institute
Bioinformatics tools sites

• Textbook website - http://www.bioinformaticsonline.org/


• Has useful links to material discussed in text
• Must purchase textbook to get password
• The following are large sites with many links to servers and software:
• CMS Molecular Biology Resource - http://restools.sdsc.edu/
• Baylor College of Medicine - http://searchlauncher.bcm.tmc.edu/
• European Bioinformatics Institute - http://www.ebi.ac.uk/
• National Center for Biotechnology Information –
http://ncbi.nlm.nih.gov

Division of Animal Biotechnology


Indian Veterinary Research Institute
Analysis of genomics data

Is it already in databases?
(Accession #?; Annotation?)
Other information
Protein characteristics?
(Expression profile;
(Sub-localization; Soluble?
mutants)
3D fold)

You have just cloned a gene

Are there conserved regions? Evolutionary relationship?


(Alignments;Domains) (Phylogenetic tree)
Is there a similar sequence?
(% identity?;Family member?)

A critical failure of current bioinformatics is the lack of a single software


package that can perform all of these functions.
Division of Animal Biotechnology
Indian Veterinary Research Institute
Primer Designing :
Primary requisite for any mol bio Exp

Division of Animal Biotechnology


Indian Veterinary Research Institute
Primer Designing

• Design your PCR primers to be 18-30 oligo nucleotides in length.


The longer end of this range allows higher specificity and gives
you space to add restriction enzyme sites to the primer end for
cloning.
• Make sure the melting temperature (Tm) of the primers used are
not more than 5°C different from each other. You can calculate Tm
with this formula: Tm = 4(G + C) + 2(A + T)°C
• Aim for a Tm between 55 and 65°C for each primer over the region
of hybridization
• Use an annealing temperature (Ta) of 5°C lower than the Tm. The
GC content of each primer should be in the range of 40-60% for
optimum PCR efficiency.

Division of Animal Biotechnology


Indian Veterinary Research Institute
Primer Designing

• Try to have uniform distribution of G and C nucleotides, as


clusters of G’s or C’s can cause non-specific priming.
• Avoid long runs of the same nucleotide.
• Check that primers are not self-complementary or complementary
to the other primer in the reaction mixture, as this will encourage
formation of hairpins and primer dimers and will compete with the
template for the use of primer and reagent.
• If you can, make the 3′ end terminate in C. This will increases the
specificity.
• Δ G (Gibbs free energy) is the most important factor that is to be
taken into consideration.

Division of Animal Biotechnology


Indian Veterinary Research Institute
Primer Designing and Δ G (Gibbs free energy)

• Δ G (Gibbs free energy) gives us an idea about the stability of


any reaction.
• Greater the negative value of Δ G, the stable the product

Δ G in PCR Amplification
•Primer dimer formation : Δ G for primer dimer should not be
greater then -6.0 kcal/mol
•Efficiency of the binding of the primer to the template : Δ G for
the primer template binding should be between -8.0 kcal/mol and
-10.0 kcal/mol
•Hairpins: Optimally a 3' end hairpin with a ΔG of -2 kcal/mol and
an internal hairpin with a ΔG of -3 kcal/mol is tolerated generally.
Division of Animal Biotechnology
Indian Veterinary Research Institute
Addressing a specific problem
• Sequence length in hand – 1.2 kb
• Requirement need to have a forward primer before the 54
nucleotide
• Minimum product length should be between 200-500bp
Approach to the problem
• Initially select the set of primers using DNA star software
(within the region of your interest)
• Check whether the primers are meeting all the requirements
• If not, go to the gene tool and walk forward and backward with
the desired primer and select the set of primers
• The selected primers are then checked and authenticated
using oligoanalyzer 1.0.2
Division of Animal Biotechnology
Indian Veterinary Research Institute
Approach to the problem contd…( primer select)

• Selection of primers within the first 54 nucleotides

Division of Animal Biotechnology


Indian Veterinary Research Institute
Approach to the problem contd…

Against the requirement: Huge difference in


the annealing temperature of the primers

Forward primer before 54 nucleotides as


desired

Division of Animal Biotechnology


Indian Veterinary Research Institute
Approach to the problem contd…
Gene tool : Facilitates walking up and down. By doing so the Tm is
brought close.

Any problem between


the primers will be
indicated here

Division of Animal Biotechnology


Indian Veterinary Research Institute
Approach to the problem contd…

Division of Animal Biotechnology


Indian Veterinary Research Institute
Approach to the problem contd…

Oligo Analyzer Main window

Division of Animal Biotechnology


Indian Veterinary Research Institute
Approach to the problem contd…

Δ G between
primers and
template

Division of Animal Biotechnology


Indian Veterinary Research Institute
Approach to the problem contd…
Δ G of primer dimers

Division of Animal Biotechnology


Indian Veterinary Research Institute
Sequence Analysis :
BLAST- Multiple sequence Alignment – Phylogenetic Tree

Division of Animal Biotechnology


Indian Veterinary Research Institute
Tools to search databases

The dilemma: DNA or protein?


Search by similarity

Using nucleotide seq. Using amino acid seq.

• Is the comparison of two nucleotide sequences accurate?


• By translating into amino acid sequence, are we losing information?
• The genetic code is degenerate (Two or more codons can represent
the same amino acid)
• Very different DNA sequences may code for similar protein
sequences. We certainly do not want to miss those cases!

Division of Animal Biotechnology


Indian Veterinary Research Institute
Reasons for translating
• Comparing DNA sequences gives more random matches:
A good alignment with end-gaps A very poor alignment

• Conservation of protein in evolution (DNA similarity decays faster!)


Conclusion:
• It is almost always better to compare coding sequences in their
amino acid form, especially if they are very divergent.
• Very highly similar nucleotide sequences may give better results.
Division of Animal Biotechnology
Indian Veterinary Research Institute
BLAST variants

• BLASTN: Compares a DNA query to DNA database.


• BLASTP: Compares a protein query to protein database.
• BLASTX: Compares the 6-frame translations of DNA query to
protein database.
• TBLASTN: Compares a protein query to the 6-frame translations of
a
DNA database.
• TBLASTX: Compares the 6-frame translations of DNA query to the
6-
frame translations of a DNA database (each sequence is
comparable to BLASTP searches!)
• PSI-BLAST: Performs iterative database searches. The results from
each round are incorporated intoDivision
a 'position
of Animalspecific'
Biotechnology
Indian Veterinary Research Institute
score matrix, which is used for further searching
Database search methods: Sequence Alignment
• Two broad classes of sequence alignments exist:

QKESGPSSSYC
• Global alignment: not sensitive
VQQESGLVRTTC

• Local alignment: ESG


faster

ESG

• The most widely used local similarity algorithms are:


•Basic Local Alignment Search Tool (BLAST)
(http://www.ncbi.nih.gov)
Division of Animal Biotechnology
Indian Veterinary Research Institute
A practical example of sequence alignment

http://www.ncbi.nlm.nih.gov
BLAST results

Division of Animal Biotechnology


Indian Veterinary Research Institute
Detailed BLAST results

E < 0.05 is statistically


significant, usually
biologically interesting

E value: is the expectation value or probability to find by chance


hits similar to your sequence. The lower the E, the more significant
the score.
Division of Animal Biotechnology
Indian Veterinary Research Institute
BIOINFORMATICS

Division of Animal Biotechnology


Indian Veterinary Research Institute
From DNA to Cell Function

DNA sequence
codes for
(split into genes)
Amino Acid
Sequence
folds into has
Protein
3D
Structure
dictates
Protein determines
Function
Cell
Activity

Division of Animal Biotechnology


Indian Veterinary Research Institute
BIOINFORMATICS

Division of Animal Biotechnology


Indian Veterinary Research Institute

Vous aimerez peut-être aussi