Vous êtes sur la page 1sur 10

Position Specific Scoring Matrix

(PSSM)

Sandeep Kumar Gupta

Department of Computer Science


Jadavpur University

April, 2017
What is PSSM?

I A position specific scoring matrix (PSSM) is a matrix based


on the amino acid frequencies (or nucleic acid frequencies) at
every position of a multiple alignment.
I PSSM that will be calculated will result in a matrix that will
assign superior scores to residues that appear more often than
by chance at a certain position.
Creating PSSM

I A PSSM is based on the frequencies of each residue in a


specific position of a multiple alignment.

Column1 : fA , 1 = 0, fC , 1 = 1, fG , 1 = 1, fT , 1 = 1
Column2 : fA , 1 = 2, fC , 1 = 1, fG , 1 = 0, fT , 1 = 0
Pseudo-counts

I Amino acid that do not appear at a specific position of a


multiple alignment must also be considered in order to model
every possible sequence and have calculable log-odds scores.
I Pseudo-counts assigns minimal score to residues that do not
appear at a certain position of the alignment.
I The equation used is:

0 fi,j + pseudocount
fi,j =
N + sum of pseudocounts

I fi,j frequency of residue i in column j


I N = number of sequences in MSA
I pseudocount is a number 1
Pseudo-counts : Example

Lets assume pseudocount = 1


From our example in previous slide we have, N = 3.
Sum of psedo-counts = 4
So the resulting matrix of relative frequencies will be :

There exist more sophisticated methods to produce more


realistic pseudocounts, and which are based on substitution
matrix or Dirichlet mixtures.
Computing PSSM

I The frequency of every residue determined at every position


has to be compared with the frequency at which any residue
can be expected in a random sequence
I For example, lets assume that each nucleotide is observed
with an identical frequency in a random sequence.
i.e qi = 0.25 for all i
I The score is derived from the ratio of the observed to the
expected frequencies. More precisely, the logarithm of this
ratio is taken and refereed to as the log-likelihood ratio:
0 !
fi,j
Scorei,j = log
qi
Computing PSSM : Example

The complete Position Specific Scoring Matrix computed from the


previous example

The matrix assigns positive scores to residues that appear more


often than expected by chance and negative scores to residues that
appear less often than expected by chance.
Using PSSM

I To search for matches scan along the sequence using a


window the length(L) of the PSSM.
I Search Window is slid one residue at a time and the scores of
the residues of every region of length L are added.
I Scores higher than a threshold are reported.
The sequence score gives an indication of how different the
sequence is from a random sequence.
Using PSSM : Example

ACTCAGCCCCAGCGGAGGTGAAGGACGTCCTTCCCCAGGAGCC
The sequence score for the current window is calculated by adding
the values at each position in PSSM

Score = 1.454
The score is less than 0, so it is more likely to be a random site
than functional.
Conclusion
Advantages
I Good for short, conserved regions
I Relatively fast and simple to implement
I Produce match score that can be interpreted based on
statistical theory.

Limitations
I insertion and deletion forbidden
I Relatively long sequence methods can therefore not be
described with this method

When to use?
I To model small regions with high-variability but constant
length.

Vous aimerez peut-être aussi