Vous êtes sur la page 1sur 14

Content

What is bioinformatics. What is computational Biology. Data mining. Application of data mining. Net accessible resources. Sequence Analysis. What can be Done with sequence Analysis.? Identification of protein primary sequence from DNA sequence. Tips for searching Database. The process of Evolution. Principle and their Importance. Conclusion.

What is Bioinformatics.
Bioinformatics describe any use of computer to handle biological information. In practice, the definition used by most people in narrower, bioinformatics to them is a syononym for computational molecular biology, the use of computers to characterize the molecular components of living.

What is data mining.

Data mining is the process by which testable hypothesis are generated regarding the function or structure of gene or protein of interest by idenfenite similar sequence in better characterized organism.

Application of data mining: Include fraud detection, credit card scoring and personal profile marketing. Skillful interpretation of data can enhance customer relation, direct marketing, trend analysis, financial market forecasting and international criminal investigations.

Net accessible resources: Two main world wide web sites provide information on data mining: The data mine: This includes pointers to FTP-able papers, and two large data mining bibliographies. It attempts to provide links to as much of the available data mining information on the net as is possible. Run by Pryke , at the University of Birmingham. Knowledge discovery mine: The knowledge discovery mine has the KDD FAQ, a comprehensive catalog of tools for discovery in data ,as well as back issues of the KDD-Nugget mailing list. Run by leading KDD researcher Gregory PiatetskyShapiro.

What is sequence Analysis.


Sequence analysis is the process of trying to find out something about a nucleotide or amino acid sequence, employing in silico biology techniques. You may have sequenced a gene yourself, and wish to learn what the long string of letters representing base, actually code for. You may want to confirm that you indeed cloned a gene successfully, or you might want to learn about a sequence of DNA that you know absolutely nothing about. You may want to know if a worm has a similar protein to a human one..

What can be done now with sequence Analysis


Given the pessimistic view of sequence analysis presented in the previous section, why do we even bother with it? In the first place the attempted to find methods for successful sequence analysis is a research goal in its own right; one whose potential rewards are so vast as to make it of the first importance. In the second place, although there are many things that sequence analysis cannot yet do , there are many very worth while things that can currently be done with sequence analysis and these will be summarized in this section.

Identification of protein sequence from DNA sequence


The computer programs which are used to infer protein sequence from DNA sequence provide information which can be used to be help approach a solution. For example, if you are trying to find out in a DNA sequence a protein is encoded, it is very used to know what peptides would be encoded by all six reading frames. A stretch containing many stop codons is a poor candidate for encoding a protein. This will not absolutely tell you where the protein sequence starts and stops, but it will you guess where that might occur. Programs exist for doing these . In fact there are many factors you can used to guess where in a DNA sequence a protein sequence might reside; use of the expected codon bias, presents of characteristic sequences representing regulatory signals in the DNA and so forth. One family of programs integrates a variety of these approaches , and using either explicit algorithms or trained neural nets ,makes a prediction.

Tips for searching database.


Use latest database version Use blast first, then a finer tool (fasta, search, blitz , sweep, block et al) Search both strands when using FASTA. This is automatically done in GCG Program. Translate sequence where relevant Search 6-frame translation of DNA database EO<0.05 is statistically significant, usually biologically interesting Check also 0.05 <EO< 10, as you might find interesting stuff Pay attention to abnormal composition, t causes biased scoring Split large queries If>1000 for DNA,>200 for protein If the query has repeated segments, delete them and repeat search

The process of evolution.


Indeed, homologous proteins arise from mutations in a common ancestor coding gene. Through the process of gene divergence, some gene mutations have been accepted by natural selection because they preserved the folding and function of the coded protein. This could be represented by schematic tree where several genes come from a common ancestor gene.

Principle and their importance


Sensitivity Versus Specificity There are different ways to estimate similarity between two sequences, allowing us to modify the sensitivity and specificity of the results when performing a sequence database search with a query sequences . If the sensitivity is high, more distantly related sequence as the S. griseus protease will be retrieved.

Continue..
However, unrelated sequences as the endochitinase will also be returned. On the other hand if the specificity is high , only closely related sequences will be returned but, in this case, distantly related ones will be missed . Thus, a researcher has to know how he could manage this problem .And this is one additional reason explaining why biologists should not treat software as a black box .

Window approaches
In particular, in comparing two sequences, a dot matrix can be used where one sequence is written out horizontally and the other is written out vertically . A dot I placed at the intersection of a row and a column for each matched pair of letters. If the frequency matched letters between two sequences is high, particularly in DNA sequences , which are composed of only four building blocks , the background noise is high . In order to reduce the noise, one can place a dot only when several joined letters are matched. The numbers of joined letters evaluated together is called the window size.

Efficient use of program


When performing a database search , a research must know that he can improve his results . If he knows the principles, the use of windows, he will be increase the sensitivity by decreasing the window size parameter. This will improve the ability of the program to recognize distantly related sequences . Alternatively , he will be able to increase the specificity by increasing the window size parameter ..

conclusion
This is important for a researcher who wants to use the programs available for sequence analysis to acquire a reliable knowledge of biocomputing. Knowing the capabilities and the draw backs of the program will help us to use them in a more accurate and efficient way.

Vous aimerez peut-être aussi