Académique Documents
Professionnel Documents
Culture Documents
Research goal:
Identify motifs that enable Muller F element genes to
function within a heterochromatic environment
Annotation goals:
Define search regions enriched in regulatory motifs
Define precise location of TSS if possible
Define search regions where TSS could be found
Document the evidence used to support the TSS
annotations
Estimated evolutionary distances with
respect to D. melanogaster
D. melanogaster
D. simulans Species Substitutions per
D. sechellia
D. yakuba
neutral site
D. erecta D. ficusphila 0.80
D. ficusphila
D. eugracilis D. eugracilis 0.76
D. biarmipes
D. takahashii D. biarmipes 0.70
D. elegans D. takahashii 0.65
D. rhopaloa
D. kikkawai
D. elegans 0.72
D. bipectinata D. rhopaloa 0.66
D. ananassae
D. pseudoobscura D. kikkawai 0.89
D. persimilis D. bipectinata 0.99
D. willistoni
Data from table 1 of the modENCODE
D. mojavensis comparative genomics whitepaper
D. virilis
D. grimshawi
New species sequenced by modENCODE GEP annotation projects
TSS annotation resources
Walkthroughs:
Annotation of Transcription Start Sites in Drosophila
Reference:
TSS Annotation Workflow
2. Conservation
Type of TSS (peaked versus broad) in D. melanogaster
Sequence similarity to initial exon in D. melanogaster
Sequence similarity to other Drosophila species (Multiz)
cDNA fragmentation
RNA-Seq Read Count
RNA fragmentation
More uniform coverage
Miss the 5’ and 3’ ends
of the transcript
5’ 3’
Gene Span
Wang Z, et al. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 2009 Jan;10(1):57-63.
Techniques for finding TSS
Identify the 5’ cap at the beginning of the mRNA
Cap Analysis of Gene Expression (CAGE)
RNA Ligase Mediated Rapid Amplification of cDNA Ends
(RLM-RACE)
Cap-trapped Expressed Sequence Tags (5’ ESTs)
Juven-Gershon T and Kadonaga JT. Regulation of gene expression via the core promoter
and the basal transcriptional machinery. Dev Biol. 2010 Mar 15;339(2):225-9.
Peaked versus broad promoters
Peaked promoter Broad promoter
(Single strong TSS) (Multiple weak TSS)
50-300 bp
Kadonaga JT. Perspectives on the RNA polymerase II core promoter. Wiley Interdiscip
Rev Dev Biol. 2012 Jan-Feb;1(1):40-51.
Promoter architecture in Drosophila
Classify core promoter based on
the Shape Index (SI)
Determined by distribution of
CAGE and 5’ RLM-RACE reads
Shape index is a continuum
Most promoters in D.
melanogaster contain multiple TSS
Median width = 162 bp
Rach EA, et al. Motif composition, conservation and condition-specificity of single and
alternative transcription start sites in the Drosophila genome. Genome Biol. 2009;10(7):R73.
Using modENCODE data to classify the
type of core promoter in D. melanogaster
Only a subset of the modENCODE data are
available through FlyBase
Evidence tracks on the D. melanogaster GEP UCSC
Genome Browser [July 2014 (BDGP R6) assembly]:
FlyBase gene annotations (release 6.06)
modENCODE TSS (Celniker) annotations
DNase I hypersensitive sites (DHS)
9-state and 16-state chromatin models
RNA Pol II and CBP ChIP-Seq
Transcription factor binding site (TFBS) HOT spots
9-state chromatin model
Ho JW, et al.
Comparative analysis of
metazoan chromatin
organization. Nature.
2014 Aug
28;512(7515):449-52.
Classifying the D. melanogaster core
promoter for each unique TSS
Peaked
Single annotated TSS with a single DHS position
Intermediate
Single annotated TSS with multiple DHS positions within
a 300bp window
Broad
Multiple annotated TSS with multiple DHS positions
within a 300bp window
Insufficient evidence
No TSS annotations or DHS positions
DEMO
Classifying the core promoter of D. melanogaster Rad23
Additional DHS data from different
stages of embryonic development
DHS data
produced by the
BDTNP project
Evidence tracks:
Detected DHS Positions
(Embryos)
DHS Read Density
(Embryos)
States DJ, et al. Improved Sensitivity of Nucleic Acid Database Searches Using Application-
Specific Scoring Matrices. Methods 3:66-70.
DEMO
blastn search of the initial transcribed exon of Rad23 against D. biarmipes
contig19
RNA PolII ChIP-Seq tracks
(D. biarmipes only)
Show regions that are enriched in RNA Polymerase
II compared to input DNA
Gene Models
RNA PolII
Enrichment
RNA-Seq
Use core promoter motifs to
support TSS annotations
Some sequence motifs are enriched in the region
(~300 bp) surrounding the TSS
Some motifs (e.g., Inr, TATA) are well-characterized
Other motifs are identified based on computational analysis
Initiator (Inr)
D. mel Transcripts
RNA PolII
Enrichment
RNA-Seq
D. mel
Proteins
N-SCAN
Augustus
TransDecoder
RNA-Seq
DEMO
Genome browsers for other Drosophila species
TSS annotation summary
Most of the D. melanogaster core promoters have
multiple TSS
Classify the type of promoter (peaked/intermediate/broad)
based on the transcriptome evidence from D. melanogaster
Define search regions that might contain TSS