Académique Documents
Professionnel Documents
Culture Documents
Thesis Statement
The DNA replication timing profile can be reconstructed efficiently and accurately from discrete time points.
(Glossary)
Presentation Outline
Biology background Microarray technology Experimental data
Challenges
Engineering
Gene therapy
Insertion, deletion, modification
... A ... T
G C
G C
T A
C G
G C
A T
C G
A T
C ... G ...
Human genome > 3 billion bp Replication rate ~ 1000 bp/min Serial replication 5.7 years 6 to 10 hours (speedup > 5000)
Background
Prokaryotes
E. Coli
DnaA binds to oriC
Eukaryotes ORC
S. Cerevisiae (yeast)
ARS 11 bp consensus
Mapping of origins
Human
No known consensus Few origins characterized
Cross-hybridization
Repeats not tiled
Gaps in genome
PM probe MM probe
GAGTACATAGCATACCATGACTAGA A
S-Phase
Allelic Variation
mf(chr, bp) = {rtime1, rtime2, }
Prof. Rushen Chahal
0hr
0hr
Allelic Variation
2hr 2hr
4hr
6hr
6hr
8hr
8hr
10hr
efficiently
Genomic data (> 3 billion bp)
Initial Analysis
Tiling Analysis Software (TAS)
Wilcoxon Rank Sum test in sliding window
Assess enrichment of treatment over control
New Analysis
Thesis Statement (revisited):
The DNA replication timing profile can be reconstructed efficiently and accurately from discrete time points.
Plotting TR50
8 6 4 2 TR50 (hours) 33 33.5 34 Chromosomal Position (in millions of bp)
Smoothed TR50 curve recovers replication pattern Local minima Possible locations of replication origin
Prof. Rushen Chahal
Segregation Algorithm
Ratio 2-to-1 & Avg < 3.4 Avg 3.4 Avg > 3.9 3.4 Avg 3.9 Avg > 3.9
TNS
Early
Mid
Avg 3.9
Late
TR50 Smoothing Early Smoothed TR50 Segregate JTS Regions into 1/3s based on STR50 Mid Late Join Intervals Joined Early Joined Mid Joined Late
Parameters to evaluate:
Segregation Algorithm: sliding window size, minimum probe density Join Intervals: minimum interval size
Prof. Rushen Chahal
Evaluation
Concordance of biological phenomena
Segregation intervals FISH STR50 local minima Other origin methods Correlation with other biological data
Gene density Early replication AT content Late replication Gene expression Early replication Activating acetylation/methylation Early replication
Positional attributes
Replication timing Proximity to genes
Else
Profile many 1% runs
Implementation Details
Java
Class representation of proprietary microarray files Algorithms to process raw microarray data Diagnostic tools
Perl
Scripts to process intermediate and final data Correlations, data transformation, quality assurance
R statistical language
Smoothing, statistical plots, correlation studies
Shell scripts
Automated processing of microarray sets
Current/Expected Contributions
Algorithms, Software Infrastructure, Analysis Probe-by-probe TR50 analysis
Temporal Specificity Algorithm
Combinatorial analysis of allele locations
Segregation Algorithm
TNS, Early, Mid, Late replicating areas
Used to design validation experiments
Publications
Completed: ENCODE Project Consortium. The ENCODE (ENCyclopedia Of DNA Elements) Project. Science. 2004 Oct 22; 306(5696):636-40. ENCODE Project Consortium. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature. {In Press, to appear in June 14, 2007 issue} Karnani N., Taylor C., Malhotra A., Dutta A. Pan-S replication patterns and chromosomal domains defined by genome tiling arrays of encode genomic areas. Genome Research. {In Press, to appear in June 2007 issue} UCSC Browser Tracks: TR50, Smoothed TR50, Local Minima, Segregation In Progress: Multi-million dollar NIH grant for scale up to full human genome Paper detailing origin methods, correlations, etc.
Prof. Rushen Chahal
Collaboration outside of engineering disciplines enhances visibility, funding opportunities, and demand for CS work Developed algorithms, time complexity analysis, combinatorial analysis, feedback to experimental design
Prof. Rushen Chahal
Smoothing
Parameterization
Biology
I havent had a course in biology since 10th grade
Microarrays
New, evolving technology were still learning to deal with
Data size
Hundreds of GB of data to process Replicates, failed experiments Algorithms must be efficient
Prof. Rushen Chahal
What kind of career are you aiming for after graduation, and why?
Teaching Computer Science (Small College)
I enjoyed learning in my undergraduate curriculum with meaningful interactions with professors I taught Discrete Math at UVa in Fall 02 and Spring 03
Enjoyable, but 60-70 students too large