Vous êtes sur la page 1sur 3

Blast vs Fasta Blast and Fasta are two software that are used to compare biological sequences of DNA,

amino acids, proteins and nucleotides of different species and look for the similarities. These algorithms were written keeping speed in mind because as the data bank of the sequences swelled once DNA was isolated in the laboratory by the scientists in mid 1980s there raised a need to compare and find identical genes for further research at high speed. Blast is an acronym for Basic Local Alignment Search Tool and uses localized approach in comparing the two sequences. Fasta is a software known as Fast A where A stands for All because it works with the alphabet like Fast A for DNA sequencing and Fast P for protein. Both Blast and Fasta are very fast in comparing any genome database and are therefore very viable monetarily as well as in saving time. In brief: Blast vs Fasta Blast is much faster than Fasta. Blast is much more accurate than Fasta. For closely matched sequences Blast is very accurate and for dissimilar sequence Fasta is better software. Blast can be modified according to the need but Fasta cannot be modified. Blast has to use Fasta input format to get the output data. Blast is much more versatile and widely used than Fasta. 5.2 BLAST BLAST, the Basic Local Alignment Search Tool (Altschul et al., 1990), is perhaps the most widely used bioinformatics tool ever written. It is an alignment heuristic that determines local alignments between a query and a database. It uses an approximation of the Smith-Waterman algorithm. BLAST consists of two components: a search algorithm and computation of the statistical signicance of solutions. BLAST starts with the localization of substrings (so-called segment pairs or hits) in two sequences that have a certain similarity score. The hits are the starting point for deriving HSPs, locally optimal pairs that contain the hit. Extending to the left or right of an HSP would lead to a lower score. 5.2.1 BLAST terminology Denition 5.1. Let q be the query and d the database. A segment is simply a substring s of q or d. A segment-pair (s, t) (or hit) consists of two segments, one in q and one d, of the same length. Example: VALLAR PAMMAR We think of s and t as being aligned without gaps and score this alignment using a substitution score matrix, e.g. BLOSUM or PAM in the case of protein sequences. The alignment score for (s, t) is denoted by (s, t).

Denition 5.2. A locally maximal segment pair (LMSP) is any segment pair (s, t) whose score cannot be improved by shortening or extending the segment pair. A maximum segment pair (MSP) is any segment pair (s, t) of maximal alignment score (s, t). Given a cuto score S, a segment pair (s, t) is called a high-scoring segment pair (HSP), if it is locally maximal and (s, t) S. Finally, a word is simply a short substring of xed length w. Given S, the goal of BLAST is to compute all HSPs. 5.2.3 The BLAST family There are a number of di erent variants of the BLAST program: BLASTN: compares a DNA query sequence to a DNA sequence database; qDNA sDNA BLASTP: compares a protein query sequence to a protein sequence database; qprotein sprotein TBLASTN: compares a protein query sequence to a DNA sequence database (6 frames translation); qprotein st1(DNA), qprotein st2(DNA), qprotein st3(DNA), qprotein stc 1(DNA), qprotein stc 2(DNA), qprotein stc 3(DNA) BLASTX: compares a DNA query sequence (6 frames translation) to a protein sequence database; qt1(DNA) sprotein, qt2(DNA) sprotein, qt3(DNA) sprotein, qtc 1(DNA) sprotein, qtc 2(DNA) sprotein, qtc 3(DNA) sprotein TBLASTX: compares a DNA query sequence (6 frames translation) to a DNA sequence database (6 frames translation); qt1(DNA) st1(DNA), . . ., qtc 3(DNA) stc (DNA) 5.4 FASTA FASTA (pronounced fast-ay)1 is a heuristic for nding signicant matches between a query string q and a database string d. It is the older of the two heuristics introduced in the lecture. FASTAs general strategy is to nd the most signicant diagonals in the dot-plot or dynamic programming matrix. The performance of the algorithm is inuenced by a word-size parameter k, usually 6 for DNA and 2 for amino acids. The algorithm consists of four phases: Hashing, 1st scoring, 2nd scoring, alignment. FastA Format A sequence in FASTA format begins with a single-line description, followed by lines of sequence data. The description line is distinguished from the sequence data by a greater-than (">") symbol in the first column. It is recommended that all lines of text be shorter than 80 characters in length.

Summary of FASTA steps 1. Analyzes database for identical matches that are contiguous (between 5 and 10 amino acids in length (same offset values)). 2. Longest diagonals are scored again using the PAM matrix (or other matrix). The best scores are saved as init1 scores. 3. Short diagonals are removed. 4. Long diagonals that are neighbors are joined. The score for this joined region is initn. This score may be lower due to a penalty for a gap. 5. A S-W dynamic programming alignment is performed around the joined sequences to give an opt score. Thus, the time-consuming S-W step is performed only on top scoring sequences

5.5 BLAST and FASTA

(a) In BLAST, individual seeds are found and then extended without indels. (b) In FASTA, individual seeds contained in the same diagonal are merged and the resulting segments are then connected using a banded Smith-Waterman alignment.

Vous aimerez peut-être aussi