Académique Documents
Professionnel Documents
Culture Documents
What is bioinformatics. What is computational Biology. Data mining. Application of data mining. Net accessible resources. Sequence Analysis. What can be Done with sequence Analysis.? Identification of protein primary sequence from DNA sequence. Tips for searching Database. The process of Evolution. Principle and their Importance. Conclusion.
What is Bioinformatics.
Bioinformatics describe any use of computer to handle biological information. In practice, the definition used by most people in narrower, bioinformatics to them is a syononym for computational molecular biology, the use of computers to characterize the molecular components of living.
Data mining is the process by which testable hypothesis are generated regarding the function or structure of gene or protein of interest by idenfenite similar sequence in better characterized organism.
Application of data mining: Include fraud detection, credit card scoring and personal profile marketing. Skillful interpretation of data can enhance customer relation, direct marketing, trend analysis, financial market forecasting and international criminal investigations.
Net accessible resources: Two main world wide web sites provide information on data mining: The data mine: This includes pointers to FTP-able papers, and two large data mining bibliographies. It attempts to provide links to as much of the available data mining information on the net as is possible. Run by Pryke , at the University of Birmingham. Knowledge discovery mine: The knowledge discovery mine has the KDD FAQ, a comprehensive catalog of tools for discovery in data ,as well as back issues of the KDD-Nugget mailing list. Run by leading KDD researcher Gregory PiatetskyShapiro.
Continue..
However, unrelated sequences as the endochitinase will also be returned. On the other hand if the specificity is high , only closely related sequences will be returned but, in this case, distantly related ones will be missed . Thus, a researcher has to know how he could manage this problem .And this is one additional reason explaining why biologists should not treat software as a black box .
Window approaches
In particular, in comparing two sequences, a dot matrix can be used where one sequence is written out horizontally and the other is written out vertically . A dot I placed at the intersection of a row and a column for each matched pair of letters. If the frequency matched letters between two sequences is high, particularly in DNA sequences , which are composed of only four building blocks , the background noise is high . In order to reduce the noise, one can place a dot only when several joined letters are matched. The numbers of joined letters evaluated together is called the window size.
conclusion
This is important for a researcher who wants to use the programs available for sequence analysis to acquire a reliable knowledge of biocomputing. Knowing the capabilities and the draw backs of the program will help us to use them in a more accurate and efficient way.