Académique Documents
Professionnel Documents
Culture Documents
Kudipudi.Srinivas
Research Scholar, Dept of Computer Science, S.V.K.P & Dr.K.S Raju Atrs & Science College,Penugonda-534320, India
Kudipudi_sri@yahoo.com
Introduction:
• Bioinformatics is the application of computational techniques to the management and analysis
of biological information.
• Bioinformatics describes using computational techniques to access, analyze, and interpret the
biological information in any of the available biological databases.
1. DATABASES:
1.1. Primary Databases
Sequences obtained by various sequencing techniques like
• EST: Expressed Sequence Tags
• GSS: Genome Survey Sequences
• STS: Sequence Tagged Sites and
• HTG: High Throughput Sequences
have been put in different nucleic acid and protein databases, which can be accessed by the
people all over the world through World Wide Web. The major databases called mother
databases are the nucleic acid and protein sequence.
The derived databases which are obtained by making use of the sequence information
available in the primary databases are called secondary databases. Databases like,
CUTG: Codon Usage Database of Japan
COGS: Cluster of Orthologus Groups of Protein from NCBI
PROSITE for regular expressions
PRINTS having aligned motifs and
BLOCKS having aligned motifs as blocks are fine examples of secondary databases.
These databases are an organized way to store the tremendous amount of sequence
information that accumulates from laboratories worldwide. Each database has its own specific
format. Three major database organizations around the world are responsible for maintaining most of
this data; they largely ‘mirror’ one another.
2. The Central Dogma of Biology:
This concept is explained by the central dogma of molecular biology, which states that the
sequences of a strand of DNA correspond to the amino acid sequence of a protein.
2.1. Transcription
Transcription is the process where messenger RNA (mRNA) molecules are synthesized
from DNA molecules. Transcription takes place in the nucleus. During transcription only one of
the strands of DNA corresponding to a gene (template strand) is copied into mRNA. This mRNA
molecule will be complementary to the bases that compose the template strand. The mRNA
molecules have short lives. They travel out to the cytoplasm where they direct the synthesis of a
Protein and then they are destroyed.
Transcription depends on complementary base pairings. A pairs with U, U with A, C with
G and G with C. Only one of the DNA molecules is transcribed and therefore the resulting mRNA
molecule is single stranded. The amount of transcription of any given gene can be directly
controlled by the cell. Once the mRNA molecules leave the nucleus and enter the cytoplasm, they
are loaded onto the ribosome. It is at the ribosomes that protein synthesis occurs by a process
called translation. The ribosomes are composed of ribosomal RNA (rRNA) proteins and ribosomal
proteins.
2.2. Translation
At each codon, a tRNA molecule with an anti-codon complementary to that codon attaches
to the mRNA. It brings with it the appropriate amino acid that is then incorporated into the growing
polypeptide chain. Once the amino acid has been added, the tRNA molecule is released and the
ribosome moves onto reading the next codon in the mRNA chain. This process continues until the
ribosome reads a stop codon. At this point the ribosome releases the mRNA molecule and the
completed protein. The tRNA molecule functions as an interpreter reading codons in the mRNA
molecule and translating them into amino acids. In this way, the sequence of base pairs in a given
gene determines the amino acid sequence of the protein.
3. Alignment:
Representation of two or more protein or nucleotide sequences where homologous amino
acids or nucleotides are in the same columns while missing amino acids or nucleotides replaced with
gaps.
ClustalW: It is a general purpose multiple sequence alignments program for DNA or proteins
sequences. It gives biologically meaningful multiple sequence alignments of divergent sequences
and calculates the best match for the selected sequences, and lines them up so that the identities,
similarities and differences can be seen. Cladograms or Phylograms obtained is used to see the
evolutionary relationships between species. This can be either downloaded are used online at
http://www.ebi.ac.uk/clustalW/. ClustalX is the X-window based user-friendly version of clustalW,
which can be downloaded and used locally on our machine. Tcofee is more accurate than clustalW
for sequences with less than 30% identity, but it is slower.
http://www.ch.embnet.org/software/TCoffee.html
FASTA:
FASTA is the first widely used program for database similarity searching. For nucleotide
searches, FastA may be more sensitive than BLAST. FastA can be very specific when identifying
long regions of low similarity especially for highly diverged sequences. FastA submission form
can be obtained at http://www.ebi.ac.uk/fasta33/
4. Phylogenetic Analysis:
Phylogenetic methods are used to reconstruct the relationships between macromolecular
sequences finding the genetic connections and relationships between species. The results of
phylogenetic analysis may be depicted as a hierarchical branching diagram, a ‘cladogram’ or
‘phylogenetic tree’. Programs for Phylogenetic analysis are available at
http://evolution.genetics.washington.edu/phylip.html. This software can be downloaded free of cost
and used locally or it can be used online at http://bioportal.bic.nus.edu.sg/phylip/. Tree view and
phylodraw are the major user – friendly software to show the hierarchical clustering in different
formats used for publishing and easy analyzing. Other than this phylip software there are other
software like PAUP, Mega, TreeconW and Winboot popular for phylogenetic analysis.
5. Applications of Bioinformatics
5.1. Food Industry:
Functional genomics is playing a major role in food biotechnology industry. The complete
genome sequence information available in different databases generates information that can be
used for finding metabolic pathways, various digestive enzymes, improving cell factories and
development of novel presentation methods. The information about the various microbes, which
assist in food digestion like E.coli, also plays a vital role in the major achievements of the food
industry using Bioinformatics.
5.2. Agriculture:
Crops are improved by producing plants that have disease resistant genes to pathogens
like fungui and bacteria. Homology searches, finding conserved motifs, and molecular modeling is
useful in identifying disease resistant genes. Pesticides and insecticides that can efficiently kill the
pathogens and pests are designed by molecular modeling.
6. Bioinformatics in India
In India there are various research and development units, centers and sub centers,
pharmaceuticals industries doing research on various aspects of bioinformatics like proteomics,
genomics, developing sequence analysis tools, molecular modeling, drug designing etc. Department
of Biotechnology(DBT), New Delhi have emphasized on starting Bioinformatics centers with the help
of BTISnet (Biotechnology Information System) for the proper application of Bioinformatics in various
sectors of science and technology for the benefit of researchers. DBT has sponsored various
Bioinformatics Distributed Information Centers (DICs) and Distributed Information sub Centers (Sub –
DICs) all over India.
The list of the DICs and the Sub DICs can be seen in the following websites.
http://dbtindia.nic.in/btis/dic.html
http://dbtindia.nic.in/bits/subdic.html
References:
1. Bioinformatics – A Beginner’s Guide by Jean - Michel Claverie, PhD & Cedric Notredame, PhD
2. Introduction to Bioinformatics by Arthu