Vous êtes sur la page 1sur 44

Motivation Methods Results

tRNAscan-SE: a program for improved detection


of transfer RNA genes in genomic sequence

T M Lowe and S R Eddy

Minnie Lai and Jessica Stringham


April 7, 2011
Motivation Methods Results

tRNA

Figure: Secondary structure of tRNA


Yikrazuul. (25 April 2010), tRNA-Phe. Retrieved 6 April, 2011, from Wikipedia, the Free Encyclopedia
http://en.wikipedia.org/wiki/File:TRNA-Phe yeast en.svg
Motivation Methods Results

Motivation

knowing how many tRNAs can give us information affects


codon bias
Motivation Methods Results

Motivation

knowing how many tRNAs can give us information affects


codon bias
Specialized tRNA-detection software can take advantage of
searching for secondary structure.
Motivation Methods Results

“Current” tRNA detection methods

tRNAscan 1.3 (Fichant and Burkes)


∼97.5% of true tRNAs (Fichant and Burkes)
0.37/Mbp false positives
Motivation Methods Results

“Current” tRNA detection methods

tRNAscan 1.3 (Fichant and Burkes)


∼97.5% of true tRNAs (Fichant and Burkes)
0.37/Mbp false positives

Pavesi’s algorithm (Pavesi and friends, 1994)


Similar to tRNAscan1.3
Motivation Methods Results

“Current” tRNA detection methods

tRNAscan 1.3 (Fichant and Burkes)


∼97.5% of true tRNAs (Fichant and Burkes)
0.37/Mbp false positives

Pavesi’s algorithm (Pavesi and friends, 1994)


Similar to tRNAscan1.3

Covarience Model (Eddy and Durbin, 1994)


>99.98% of true tRNAs
<0.2/Mbp false positives
Very slow: 9.5 CPU years to scan human genome (Using 1997
computers)
Motivation Methods Results

False Positives and tRNAscan 1.3

0.37 false positives per million base pairs


Motivation Methods Results

False Positives and tRNAscan 1.3

0.37 false positives per million base pairs


Wasn’t so bad for small genomes
Motivation Methods Results

False Positives and tRNAscan 1.3

0.37 false positives per million base pairs


Wasn’t so bad for small genomes
3000 Mbp human genome
Motivation Methods Results

False Positives and tRNAscan 1.3

0.37 false positives per million base pairs


Wasn’t so bad for small genomes
3000 Mbp human genome
∼1100 false positives
Motivation Methods Results

False Positives and tRNAscan 1.3

0.37 false positives per million base pairs


Wasn’t so bad for small genomes
3000 Mbp human genome
∼1100 false positives
1300 true tRNAs
Motivation Methods Results

tRNAscan-SE
Combines these three methods
Motivation Methods Results

Phase 1 (Filtering)

Stage 1 Overview
Motivation Methods Results

Phase 1 (Filtering)

tRNAscan1.3

Uses a series of seven sequential tests, if any one of the tests


failed, the next window is examined
Motivation Methods Results

Phase 1 (Filtering)

tRNAscan1.3

Uses a series of seven sequential tests, if any one of the tests


failed, the next window is examined
Successfully identified tRNA gene would pass all seven tests
Motivation Methods Results

Phase 1 (Filtering)

tRNAscan1.3

Uses a series of seven sequential tests, if any one of the tests


failed, the next window is examined
Successfully identified tRNA gene would pass all seven tests

Test Breakdown:
Detection of invariant and semi-invariant bases with consensus
matrices
Detection of potential base-pairing structures
Detection of the appropriate length and position of potential
intro sequences
Motivation Methods Results

Phase 1 (Filtering)

Pavesi’s Algorithm

Searches subsequences
Detects A and B boxes (degenerate, so use weight matrix
analysis)
Transcription termination signal
Considers the spacing between these
Motivation Methods Results

Phase 1 (Filtering)

Schematic for Pavesi’s

Pavesi, 1249
Motivation Methods Results

Phase 1 (Filtering)

Phase 1

Runs tRNAscan 1.4


Optimized version of tRNAscan 1.3 (650-fold increase in
speed)
Basically identical except for ambiguous nucleotides
tRNAscan 1.4 has higher false positive rate, but later stages
will catch this.
Motivation Methods Results

Phase 1 (Filtering)

Phase 1

Runs tRNAscan 1.4


Optimized version of tRNAscan 1.3 (650-fold increase in
speed)
Basically identical except for ambiguous nucleotides
tRNAscan 1.4 has higher false positive rate, but later stages
will catch this.

Runs EufindtRNA
Implementation of Pavesi’s algorithm
Motivation Methods Results

Phase 1 (Filtering)

Phase 1

Runs tRNAscan 1.4


Optimized version of tRNAscan 1.3 (650-fold increase in
speed)
Basically identical except for ambiguous nucleotides
tRNAscan 1.4 has higher false positive rate, but later stages
will catch this.

Runs EufindtRNA
Implementation of Pavesi’s algorithm

Results
Combined sensitivity exceeds 99%
False positive rate is about five times tRNAscans
Motivation Methods Results

Phase 2 (Covariance Model)

Phase 2 Overview
Motivation Methods Results

Phase 2 (Covariance Model)

Motivation

BLAST and FASTA are good at finding homologues of protein


sequences.
RNA are selected more for secondary structure.
Want to be able to search for “RNA motifs”
Motivation Methods Results

Phase 2 (Covariance Model)

tRNA

Figure: Secondary structure of tRNA


Yikrazuul. (25 April 2010), tRNA-Phe. Retrieved 6 April, 2011, from Wikipedia, the Free Encyclopedia
http://en.wikipedia.org/wiki/File:TRNA-Phe yeast en.svg
Motivation Methods Results

Phase 2 (Covariance Model)

Context-Free Grammars

Palindromes (sort of)

S → aSa|bSb|aa|bb
Motivation Methods Results

Phase 2 (Covariance Model)

Context-Free Grammars

Palindromes (sort of)

S → aSa|bSb|aa|bb

Example of a palindrome

S =⇒ aSa
=⇒ aaSaa
=⇒ aabbaa

So aabbaa is a palindrome!
Motivation Methods Results

Phase 2 (Covariance Model)

RNA secondary structure

Figure: Stem-loop

Sakurambo. (27 May 2006), stem-loop structure in RNA. Retrieved 6 April, 2011, from Wikipedia, the Free

Encyclopedia http://en.wikipedia.org/wiki/File:Stem-loop.svg
Motivation Methods Results

Phase 2 (Covariance Model)

Context-Free Grammar

Stem Loops with three base pairs

S → aW1 u|cW1 g |gW1 c|uW1 a


W1 → aW2 u|cW2 g |gW2 c|uW2 a
W2 → aW3 u|cW3 g |gW3 c|uW3 a
W3 → gaaa|gcaa
Motivation Methods Results

Phase 2 (Covariance Model)

Stochastic Context-Free Grammars

Motivation
Exceptions to the rules get left out
Motivation Methods Results

Phase 2 (Covariance Model)

Stochastic Context-Free Grammars

Motivation
Exceptions to the rules get left out
Adding them to the grammar degrades the pattern
Motivation Methods Results

Phase 2 (Covariance Model)

Stochastic Context-Free Grammars

Motivation
Exceptions to the rules get left out
Adding them to the grammar degrades the pattern
Introduce probabilities
Motivation Methods Results

Phase 2 (Covariance Model)

Stochastic Context-Free Grammars

Motivation
Exceptions to the rules get left out
Adding them to the grammar degrades the pattern
Introduce probabilities
Implementation
Use CYK algorithm (a parsing algorithm)
Motivation Methods Results

Phase 2 (Covariance Model)

Covariance Model

like Hidden Markov Model, but for Stochastic Context-Free


Grammars.
Motivation Methods Results

Phase 3 (Clean up)

Phase 3 Overview
Motivation Methods Results

Phase 3 (Clean up)

Phase 3

Pseudogenes are sorted out


Receives information from phase 2 to further filters gene
sequences
Uses anti-codon in secondary structure
Outputs in secondary format
Motivation Methods Results

Phase 3 (Clean up)

Pseudogenes/false positives

Usually lack a secondary structure that is found in true tRNA


Score comparison
tRNA and EufindtRNA complement each other in psuedogene
search
High-scoring truncated tRNA may still be considered legitimate
Motivation Methods Results

Results
Motivation Methods Results

Usage

http://lowelab.ucsc.edu/cgi-bin/tRNAscan-SE.cgi
Motivation Methods Results

Usage

http://lowelab.ucsc.edu/cgi-bin/tRNAscan-SE.cgi
Motivation Methods Results

Usage

Figure: tRNA with cove score of 83.25


Motivation Methods Results

Usage

Figure: tRNA with cove score of 46.99


Motivation Methods Results

Problems

If tRNAscan1.4 and EufindtRNA don’t catch the tRNA, you


have a false negative
Limited to only tRNA (covariance model capable of more)
Motivation Methods Results

Lowe, Todd M., and Sean R. Eddy. “tRNAscan-SE: a program for improved detection of transfer RNA

genes in genomic sequence.” Nucleic Acid Research. 25.5 (1997): 955-964. Print

Pavesi, Angelo, Franco Conterio, Angelo Bolchi, Giorgio Dieci, and Simone Ottonello. “Identification of

new eukaryotic tRNA genes in genomic DNA databases by a multistep weight matrix analysis of

transcriptional control regions.” Nucleic Acids Research. 22.7 (1994): 1247-1256. Print.

Eddy, Sean R., and Richard Durbin. “RNA sequence analysis using covariance models.” Nucleic Acids

Research. 22.11 (1994): 2079-2088. Print

Fichant, Gwennaele A., and Christian Burks. “Identifying Potential tRNA Genes in Genomic DNA

Sequences.” J. Mol. Biol. 220 (1991): 659-671. Print

Durbin, R., S. Eddy, A. Krogh, G. Mitchison. Biological sequence analysis. Cambridge: Cambridge

University Press, 2006. Print

Vous aimerez peut-être aussi