Vous êtes sur la page 1sur 25
Chapter Bioinformatics <3- (in silico Biology) ase seience to an ‘the 21" century biology is being transformed from a purely laboratory information science too. The information refers to comprehensive views of DNA sequences, RNA RAMANA expression and pr tein interactions. Duc to explosion | The term ‘bioinformatics’ has been fof sequence and structure information available to | derived by combining biology and searchers people have became oplimisticta gotanswer | informatics. The key to biotechnology of ftndamental biomedical problems. discoveries is locked in the genomes of organisms, The bioinformatics holds the key tounlock these data for ‘the next generation of innovations Translation of billions of characters in DNA ences that make the genome into biologically ‘meaningful information has given bith to @ new field of sienee called Bioin-formaties A. WHAT IS BIOINFORMATICS ? A more previse definition of bioinformatics is the application of information sciences (mathematics, statics and computer sciences) 10 inerease our understanding of biology. Tus bioinformatics isa mutisciplinary science which aims to use the benelitsof computer technologies in understanding the biology of life. Now, as a subject bioinformatics consists of three care areas molecular biology database, (i) sequence comparison and sequence analysis, and (u) the emerging technology of microarrays, In brief, bioinformatics is she management and analysis of biological information stored in databases 1. What is a Database ? Adalabase isa repository of sequences (DNA QRSNIINTST IES ‘In 1987, when the Human Genome Project was conceived of, the field of bioinformatics was in its infancy. Today, bioinformatics hes become a recognised discipline on its own, born out of the necessity #0 bring together the information sciences ond the biological Sciences in understanding the wealth of data that has been ereated through ‘ariaus projects around the world of amino aeids) which provides a centralised and homogeneous view ofits contents. The repository is created snd modified through a database management system (DBMS). Every data item in the datahase i structured aevording to a scheme, defined as @ set of prespecified rules through the data definition language. The contents of database ean be accessed lusough a graphical user interface (GUD) that allows browsing through the contents of the repository very sich similar as one may browse through the books in bray 107 408 A TEXTBOOK OF BIOTECHNOLOGY - xi Most databases also allow querying of its contents through a spectalisd query language, The data definition language and the query language form the data model B, HISTORICAL BACKGROUND Historically, the protein databases were prepared firs, then nucleotide databases. In 1959, YM. Ingram first made attempt to compare sickle cell haemoglobin and normal haemoglobin, snd demonstrated their homology: In due coarse of time the other proteins associated with similar biological function were also compared. This resulted in more proein sequencing and aecumulation of vast information, Hence, itis realised to have databases so that using computation software the proteins can be quickly compared 1 1962, using sequence variability, Zuckerkandl and Pauling proposed anew strategy to study evolutionary relations between the organisms which is called ‘molecular evelution’. Ths theory was based onthe Fats that snilarity exists among the Functionally related (homologous) protein sequences ‘Margaret O, Dayhot found that during evolution protein sequences undergo changes aesonding to cortain pattems such as: () preferential aeration (replacement! in amine acids with amino acids of similar physico-chemical characteristics (bul not randomly), (no replacement of some amino seids e.g. yplophan) by anyother amino acids, and (i) development ofa point accepted mutation (PAM) on the basis of several homologous sequenees, Further work on sequence comparison on the basis of quantitative strategy was carried out In 1965, Dayoi and co-orkers collected all he protein sequences knonsn al thal time and eatalogued them as the sia of Protein Sequence and Sirvctare which was first published by the National Biomedical Research Foundation (Silver Seng MD). Later on eollection of such macromolevular sequences was published under the above tile from 1965 to 1978, The above printed book lad the foundation for the resources thatthe ene biotechnology community now depends for day to-day ‘work in computational biology. The development of eompuer methods pioneered by Day olf and her rescarch group is applicable: in eomparing provein sequences, (#) detecting distally related sequences and duplication within sequences, and (ut) deducing the evolutionary histones from alignment of protein equ ln 1980, the advent ofthe DNA sequence datshase led tothe nex pase in database sequence information through establishment of a data library by the European Molecular Biology Laboratory (EMBL), The purpose of establishing data library was Wo collect, organise and disinbute data on raclenide sequence and other information related to them. The European Bioinvormatics Institute (EBD is its successor tha is situated at Hinxton, Cambridge, United Kingdom, In 1984, the National Biomedical Research Foundation (NARP) established the protein information resource (PIR). The NBRF helps the sciemists in ienifying and interpreting the information of protein sequent In 1988, the National Institute of Health (NIED, U'S.A. developed the National Centre for Biotechnology Information (NCBI) as a division of the National Library of Medicine (NLM) to develop information system in molecular biology. The DNA Databank of Japan (DDI) al Mishima joined the daa collecting collaboration a few years later The NCBI built the GenBank, the National Insiute of Health (NIH) genetic sequen database. GenBank is an annotated collection of all publically available nucleotide and protein sequences, The record within GenBank represents single connig (contiguous) seletion of DNA or RNA with amotations. In 1988, the thre partners (DDBJ, EMBL and GenBank) ofthe Intemational Nucleotide Sequence Database Collaboration bad a meeting and agreed 10 use @ common format Allthe three centres provide separate points of data submission, yet exehange this information daily making the same database available at larg. All he three centres are collecting, direct submitting and disinbuting them so that each eentre has copies of al the sequences, Henee, they can act as 4 BIOINFORMATICS (in Sico Biology) 109, primary distribution centre for these sequences. Moreover al the databases have collaboration with each other They regularly exchange their data Now sequence data are accumulating day-by-day. Therefore, there is « neod of powerful software so that sequences ean be analysed, For the development of algonthms [any sequence of sctions (eg computational steps) hat perform a particular task] finn basis of matheraties i needed ‘Now, mathematicians, biologists and computer scientists ae taking much interes in bioingonmaties, Moreover, biologists are curious to ask reservoir of all such information because they are widely Jnerconnected through network, Thus bioinformatics is aimed at () the development of powerful software for data analysis, and (benefit the researchers through disseminating the scientifically investigated Knowledge, ete The nveleotide and amino acid monomers ate represented hy a limited alphabets The properties of biopolymers ie. macromolecules (e.g. DNA, RNA proteins) are such thal they can be transformed ‘ino sequences having digital symbols, Genetic data an other biologic data are differentiated by these digital data ‘This resulted inthe progress of bioinZormaties. (, SEQUENCES AND NOMENCLATURE ‘As mentioned carlier that the soquences of digital symbols are the transformed biopolymers. Indirectly the sequence data means the structure of biopolymer, and structure expresses the function, shows a reduetionist approach, Therefore, the sequence data ean be used as context fe 41. The IUPAC Symbols The fntemational Union of Pure and Applied Chemistry (IUPAC) has made certain recommendations, The nomenclature system in bioinformaties is base on these recommendations, © Different laboratories of the world follow nomenclature system of TUPAC so that their data sot ean uniformly and easily b (© Forrapid reproducibly and uniformity the database institution and editors (who pulish journals and research findings) also follow the recommendations of IUPAC For routine work, the basic IUPAC nomenclature system of nucle acids and proteins has ben discussed in this section, Fordetail you should go through the IUPAC web site. Language used an bioinformatics is piven in Box 81 compared. Box 5, Language used in biointormaties "The folowing language used in biomnormaies Alphabets. = Nuclootides| Wouls = Gone (prokaryotes) Invons (eukaryotes) Sentence = Operon (prokaryotes) Gene (eukaryotes) Regulatory region Chromosome Punctuation Chapter vu 2. Nomenclature of DNA Sequences Itisobvious that nucleotides ar the building blocks of DNA, and the nucletides are constituted by four bases (A G, Tand C). Symbolsof these four bases and basisof their nomenclature are used as ‘ues they are spt Table 5.1 showsthe symbols, heir meaning and basesol nucleic acid sequences

Vous aimerez peut-être aussi