Académique Documents
Professionnel Documents
Culture Documents
Protein Informatics
7/23/03
NHLBI Symposium: From Genome to Disease
Patricia C. Babbitt
University of California, San Francisco
babbitt@cgl.ucsf.edu
• deduction of function
• tracing ancestral connections
• understanding enzyme mechanisms
• structural analysis of receptors, molecules involved
in cell signaling
• identification of molecular surfaces in protein-
protein, protein-DNA interactions
• protein engineering
• clustering of families, superfamilies
• metabolic computing/comparative genome analysis
Sequence
Conservation
Structure
Conservation
Function
Conservation
• The score should have the form: pab /qa qb , where pab is
the probability that residue a is substituted by residue b,
and qa and qb are the background probabilities for residue
a and b respectively.
• Handling gaps remains an incompletely solved problem...
<35 PAM-30
35-50 PAM-70
50-85 BLOSUM-80
>85 BLOSUM-62
SEQUENCEHOMOLOG
S• •
E • • •
Q •
U •
E • • •
N • •
C •
E • • •
A
N •
A
L •
O •
G• •
July 23, 2003 Patricia Babbitt, PhD 15
Univ. of Calif., San Francisco
• The signal-to-noise ratio can be improved using
filtering techniques designed to minimize the
composition- dependent background
• Example of common filters: over-lapping, fixed-length
"windows" for sequence comparison
• To be counted, a comparison must achieve a
minimum threshold score summed over the window,
derived empirically or from a statistical or evolutionary
model of sequence similarity
• The window size and minimum threshold score (often
termed "stringency") at which the score is counted can
be user-defined
SEQUENC
SEQUENCEANALOG (7/7 matches)
SEQUENC
SEQUENCEANALOG (0/7 matches)
...
CEHOMOL
SEQUENCEANALOG (2/7 matches)
...
1P R I M E5
11-1-2+0+4 = +2
1S E Q U E N C E A N A L Y S I S P R I M E R21
. . .
R I M E5 1P
16+6+5+6+4 = +27
1S E Q U E N C E A N A L Y S I S P R I M E R21
200
changes/100 amino acids
Fibrinopeptides
150
Hemoglobin
100
50
Cytochrome C
0
0 200 400 600 800 1000
millions of years since divergence
Proinsulin
C-peptide
r = 0.97 x 10-9/site/year
A-chain
r = 0.13 x 10-9/site/year
B-chain
Mature insulin
July 23, 2003 Patricia Babbitt, PhD 26
Univ. of Calif., San Francisco
FASTA
• Documentation
– Manual: http://www.ncbi.nlm.nih.gov/BLAST/blast_help.html
– FACS: http://www.ncbi.nlm.nih.gov/BLAST/blast_FAQs.html
– Tutorial: http://www.ncbi.nlm.nih.gov/BLAST/tutorial/Altschul-1.html
• Access at http://www.ncbi.nlm.nih.gov/BLAST/
• Tutorial at
http://www.ncbi.nlm.nih.gov/BLAST/tutorial/Altschul-2.html
• See also:
Park J et al “Sequence comparisons using multiple
sequences detect three times as many remote homologs as pairwise
methods,” JMB 284:1201-10, 1998