Académique Documents
Professionnel Documents
Culture Documents
O UTLINE
Scope of the Series Systems Biology to Systems Medicine Early Success Stories and Challenges Central Dogma Cancer: First Model
2 / 67
L ECTURE S ERIES
Introduction (DG) Cancer Biology (MO) Cell Signaling Networks (MO) Genetic Variation (DG) Massive Testing (LY) Biomarker Discovery (LY) Phenotype Prediction (DG) Embedding Mechanism (DG)
3 / 67
I NSIDE
THE
S ERIES
Molecular:
As in molecular biology, so involving biomolecules in cells, mainly DNA, RNA and proteins.
Medicine:
Associating genetic variation with disease; Predicting phenotypes from molecular concentrations; Understanding disease in the context of molecular networks of genes and gene products;
4 / 67
O UTSIDE
THE
S ERIES
Technology:
(Almost) nothing about advances in generating omics data. In particular, nothing about next-gen xyz, data acquisition, data preprocessing. For us, data are a matrix of numbers.
Software:
Computational does not stand for computer-based. Nothing on packages for implementing specic algorithms, on tools for the community, or on R or any other programming language.
5 / 67
VARIABLES
Modern capture devices enable the simultaneous and quantitative assessment of many molecular states:
RNA (mRNA, miRNA, etc.) expression. Metabolite quantication. DNA polymorphism measurements (e.g., SNPs). Protein quantication (e.g., via MassSpec). Methlyation arrays. DNA-protein binding (e.g., CHIP-Chip).
6 / 67
W HY S TATISTICAL L EARNING ?
7 / 67
O UTLINE
Scope of the Series Systems Biology to Systems Medicine Early Success Stories and Challenges Central Dogma Cancer: First Model
8 / 67
9 / 67
The objective is then to analyze relationships among biomolecules in the context of a network.
10 / 67
12 / 67
B IOMARKERS
Something measurable which carries information about the occult state of the disease. Hence a surrogate for the state of interest. Types:
Diagnostic: screening biomarkers from serum, imaging, saliva or urine. Specicity dominates. Prognostic: based on tumor tissue (e.g., expression, CNV). Sensitivity dominates. Predictive of treatment outcome: based on tumor tissue.
13 / 67
S PECIFIC O BJECTIVES
Discover biomarkers and biomarker interactions for disease progression which are clinically useful for early diagnosis, risk assessment, prognosis and personalized treatment. Elucidate the pathophysiological mechanisms of multifactorial/chronic diseases (e.g.,cancer, diabetes, obesity, metabolic disorders, aging). Develop strategies for combinatorial drug therapies and screenings, test effective treatment, and predict drugs effects and side-effects.
14 / 67
O UTLINE
Scope of the Series Systems Biology to Systems Medicine Early Success Stories and Challenges Central Dogma Cancer: First Model
15 / 67
ET AL
1999
In 1960s, acute leukemias were divided into acute lymphoblastic leukemia (ALL) and acute myeloid leukemia (AML). Separation was based on histochemical testing, and later on antibody-based testing. In either case, there was no single established test to make this diagnosis. Golub et al used supervised machine learning to learn a predictor of ALL vs AML based on a signature of gene expression values. New leukemia cases could then be classied from microarray data extracted from tissue.
16 / 67
ET AL
2000
The authors measured gene expression in samples from patients diagnosed with diffuse large B-cell lymphoma, a cancer of B-lymphocytes. Previously, such cancers were divided into low-, intermediate- and high-grade categories based on growth patterns and immunohistochemistry. Alizadeh applied hierarchical clustering to these data, revealing a division of B-cell lymphoma samples into an equal split of two subtypes. Moreover, patients retrospectively demonstrated signicant differences in survival (Kaplan-Meier analysis).
17 / 67
T HERAPEUTICS : B UTTE
ET AL
2000
Generated hypotheses about functional relationships between pairs of genes and pharmaceuticals. Two databases:
Baseline microarray measurements of 6701 genes in a standardized set of 60 human cancer cell lines. Drug susceptibility measurements for the same cell lines, across nearly 5000 anti-cancer agents.
Used mutual information to build a graph of associations between baseline RNA expression levels and inhibition of growth by thousands of anti-cancer agents. Discovered a previously unknown association between a gene and a measure of anti-cancer agent susceptibility.
18 / 67
H ISTOPATHOLOGY: R AMASWAMY
ET AL
2001
Histopathology: study of diseased tissue by sectioning, staining, and multi-resolution microscopy. Computational methods can determine whether such samples quantitatively resemble a disease. Using support vector machines, Ramaswamy et al, 2001, predicted the original source of a cancer given just a metastatic sample.
19 / 67
O UTLINE
Scope of the Series Systems Biology to Systems Medicine Early Success Stories and Challenges Central Dogma Cancer: First Model
20 / 67
B UT N OT Y ET
But despite these promising beginnings and post-HGP technological advances, and with few exceptions, the results to date from computational learning are not sufciently accurate or reproducible for clinical use. In particular, the effect of omics data on drug discovery and the identication of novel, more effective therapies has been limited. And it remains unclear how exactly to extract useful medical knowledge from experimental data, where useful means achieving higher precision and patient benet than can be currently obtained with traditional clincial practice.
21 / 67
DATA
D : High-dimensional, high-throughput genomic data. The traditional approach experimental and molecule-by-molecule is not feasible for this level of complexity. A principled mathematical approach has become indispensible for extracting knowledge from D. For example, over the last decade statistical learning has emerged as a core methodology for the analysis of D.
22 / 67
B ARRIER I: T ECHNOLOGICAL
Collecting D usually requires an invasive procedure. D is often of low quality, degraded by lab and batch effects. The number of samples in D is usually insufcient to represent the populations under study, and too small to generate results which are statistically robust and consistent across studies and devices.
23 / 67
With off-the-shelf statistical learning techniques, the learned models and decision rules usually involve nonlinear functions of a great many variables. Consequently, it is difcult to look under the hood, yet there is added value in transparency for knowledge discovery and treatment design.
24 / 67
The mathematical challenges are formidable because n, the number of samples, is very small relative to d, the number of molecular species assayed. Ideally, n d. In practice, n d. Hence, in view of trade-offs between bias and variance, incorporating rich a priori knowledge to constrain the representations of D may be unavoidable.
25 / 67
T YPICAL C ASES
Detect disease phenotypes from microarray data with d = 10, 000 transcripts and n = 100 (or fewer) patients. Measure the degree of phenotypic-regulation in pathways with d = 100 genes for n = 100 samples. Infer the statistics (and possibly the wiring diagram) of signaling and gene regulatory networks with d = 100 variables (genes, proteins) and n = 100 samples.
26 / 67
27 / 67
Standard lab report (e.g., from blood or urine) uses biomarkers X1 , X2 , ...: If Xi > , check out diseases a, b Measuring interactions among genetic markers and molecular concentrations can reveal more information: If g(Xi , Xj , Xk ) > , check ... But what is the right set {g} of functions, models or combinatorial logic? What will small n, large d allow?
28 / 67
Standard treatment (prognosis, choice of drugs) is based on population statistics. But, massive diversity among tumors of the same general category due to differing mutational signatures. How rich a sub-categorization can we afford based on the available data?
29 / 67
OF
Diseases are often the result of perturbed biomolecular networks, leading to differences in the abundances of biomolecules (e.g., mRNA, proteins, metabolites). Analyzing these differences enables learning predictors of disease presence, status and response to treatment. In particular, transcriptomics provides the global mRNA expression of particular tissue exposing transcriptional differences among diseases. What marker interactions do these data reveal?
30 / 67
DANGEROUS L IAISONS
31 / 67
O UTLINE
Scope of the Series Systems Biology to Systems Medicine Early Success Stories and Challenges Central Dogma Cancer: First Model
32 / 67
33 / 67
PACKAGING
IN THE
N UCLEUS
The human genome is about 1.8 meters long and the nucleus of a cell is 6 106 meters in diameter DNA is carefully packaged into the nucleus in a regulated way; packaging is correlated with gene expression and therefore phenotype. DNA is packaged inside the nuc
34 / 67
RNA
Ribonucleic acid Single-stranded sugar-phosphate backbone with nucleotides. Uracil(U) instead of Thymine(T). Can bind to DNA or RNA. Roles: information carrier, regulatory, enzymatic.
RNA
Ribonucleic acid Single-stranded sugarphosphate backbone with nucleotides Uracil instead of Thymine Can bind to DNA or RNA Roles:
Information carrier Regulatory enzymatic
14
35 / 67
T YPES
OF
RNA
mRNA: takes info from DNA and encodes proteins rRNA: platform and environment for protein synthesis tRNA: brings amino acids in to form proteins miRNA: regulatory and other functions siRNA: regulatory and other functions ribozymes: huge array of functions
36 / 67
C ENTRAL D OGMA
37 / 67
T RANSCRIPTION : DNA
TO
RNA
Transcription: DN
Lots o
DNA
Lots of regulation required. DNA in nucleus is very compact. RNA polymerase need associated factors in order to bind.
RNA factor
On t be ve
38 / 67
T RANSLATION : RNA
TO
P ROTEIN
39 / 67
TO
Amino acids are encoded by triplets of nucleotides called codons. The code is non-overlapping and comma-free. It is also redundant: there are 64 possible codons and 20 amino acids (and a special stop codon). The start codon is AUG (Methionine).
40 / 67
P ROTEINS
Polymers with 20 amino acids as building blocks. No complementary pairing. Perform virtually all work in the organism: enzymes, transport, signaling. come in different shapes Proteins Proteins come in many different shapes and sizes.
41 / 67
C HROMOSOMES
Single strands of DNA. Chromosomes Species have different chromosome numbers and layouts. Prokaryotes: one single circular chromosome, no nucleus. Viruses: lotschromosomes plus sex chromosomes (X and Y) Humans: 22 of little pieces. (haploid 22 chromosomes plus sex chromosomes (X and Humans:numbers) humans Y), diploid. (most eukaryotes) are diploid:
mom
dad
kid
42 / 67
E PIGENETICS
Means outside gene Epigenome controls cell type specic behaviors. Epigenetic marks and modications do not alter the DNA sequence. These marks have profound functional consequences and are heritable, and responsible for imprinting. They can cause or be altered by disease. Examples:
Cytosine methylation (CpG) Histone modications (methylation, acetylation, etc.) Other nucleotide modications (hydroxy-A, hydroxy-C, etc.)
43 / 67
E PIGENETICS
IN
C ANCER
Cancer cells show serious disruption in overall methylation; generally hypomethylated but extremely patchy. Methylation suppresses transcription and is important in gene regulation (and transposon regulation). Dysregulation of epigenetic marks causes large-scale changes in gene expression. Successful drugs have come from HDAC inhibitors and DNMT inhibitors. Methylation status obtained from microarrays and sequencing techologies.
44 / 67
O UTLINE
Scope of the Series Systems Biology to Systems Medicine Early Success Stories and Challenges Central Dogma Cancer: First Model
45 / 67
C ANCER
A disease of the genes due to the accumulation of genetic alterations over time that leads to uncontrolled cell growth and proliferation. An acquired genetic disorder.
46 / 67
C ANCER
A disease of the genes due to the accumulation of genetic alterations over time that leads to uncontrolled cell growth and proliferation. An acquired genetic disorder. Ninety percent of deaths result from metastasis, meaning that cancer cells migrate to distant organs and replace normal cells until the organ no longer functions.
46 / 67
C ANCER
A disease of the genes due to the accumulation of genetic alterations over time that leads to uncontrolled cell growth and proliferation. An acquired genetic disorder. Ninety percent of deaths result from metastasis, meaning that cancer cells migrate to distant organs and replace normal cells until the organ no longer functions. Differences in age at the onset of cancer reect different latency periods of the various types of cancer.
46 / 67
F ITNESS
OF
C ANCER C ELLS
Ability to proliferate. Propensity to invade: break away from the tumor and enter surrounding tissues. Ability to metastasize: spread to a non-adjacent body organ through the blood stream. Resistance to drugs and therapies, e.g., insensitivity to drug-induced apoptosis.
47 / 67
M UTATIONS
In a group of replicating cells, the probability of a mutation arising from DNA copying is 109 per nucleotide per cell. Should a cell acquire a mutation in a gene that confers a growth advantage, it will begin to outgrow its brethren. More replication events mean more mutation events. Subsequent hits can affect whether or not cells die when they are supposed to (a process called apoptosis), and cells that do not die can replicate more. When members of the clonal population have accrued enough mutations, they will experience gross changes in morphology and longevity, and they will be able to break away to lodge in other organs (a process called metastasis).
48 / 67
P ICTURES
Normal Colon
Adenoma
Carcinoma
Melstrom et al, doi:10.1158/1078-0432.CCR-07-4631
Cancer can be observed histologically as deviation from normal morphology and biochemically as deviation from normal gene expression. Here you can see both the gradual dysplasia of cells progressing from normal colon cells to carcinoma. Cells here are stained for expression of genes involved in cell growth and magnied 20X. More brown means higher expression of a growth gene.
49 / 67
P ROGRESSION
Alberts et al, Molecular Biology of the Cell
In a group of replicating cells, the probability of a mutation arising from DNA copying is 10^-9 per nucleotide per cell. Should a cell acquire a mutation in a gene that confers a growth advantage, it will begin to outgrow its brethren. More replication events mean more mutation events. Subsequent hits can affect whether or not cells die when they are supposed to (a process called apoptosis), and cells that dont die can replicate more. When members of the clonal population have accrued enough mutations, they will experience gross changes in morphology and longevity, and they will be able to break away to lodge in other organs (a process called metastasis). In this process many mutations may be acquired that do not help or slow cancer development - they are passenger mutations, noise that adds to the difficulty in identifying cancer processes.
50 / 67
P ROGRESSION
Alberts et al, Molecular Biology of the Cell
In a group of replicating cells, the probability of a mutation arising from DNA copying is 10^-9 per nucleotide per cell. Should a cell acquire a mutation in a gene that confers a growth advantage, it will begin to outgrow its brethren. More replication events mean more mutation events. Subsequent hits can affect whether or not cells die when they are supposed to (a process called apoptosis), and cells that dont die can replicate more. When members of the clonal population have accrued enough mutations, they will experience gross changes in morphology and longevity, and they will be able to break away to lodge in other organs (a process called metastasis). In this process many mutations may be acquired that do not help or slow cancer development - they are passenger mutations, noise that adds to the difficulty in identifying cancer processes.
51 / 67
P ROGRESSION
Alberts et al, Molecular Biology of the Cell
In a group of replicating cells, the probability of a mutation arising from DNA copying is 10^-9 per nucleotide per cell. Should a cell acquire a mutation in a gene that confers a growth advantage, it will begin to outgrow its brethren. More replication events mean more mutation events. Subsequent hits can affect whether or not cells die when they are supposed to (a process called apoptosis), and cells that dont die can replicate more. When members of the clonal population have accrued enough mutations, they will experience gross changes in morphology and longevity, and they will be able to break away to lodge in other organs (a process called metastasis). In this process many mutations may be acquired that do not help or slow cancer development - they are passenger mutations, noise that adds to the difficulty in identifying cancer processes.
52 / 67
P ROGRESSION
Alberts et al, Molecular Biology of the Cell
In a group of replicating cells, the probability of a mutation arising from DNA copying is 10^-9 per nucleotide per cell. Should a cell acquire a mutation in a gene that confers a growth advantage, it will begin to outgrow its brethren. More replication events mean more mutation events. Subsequent hits can affect whether or not cells die when they are supposed to (a process called apoptosis), and cells that dont die can replicate more. When members of the clonal population have accrued enough mutations, they will experience gross changes in morphology and longevity, and they will be able to break away to lodge in other organs (a process called metastasis). In this process many mutations may be acquired that do not help or slow cancer development - they are passenger mutations, noise that adds to the difficulty in identifying cancer processes.
53 / 67
P ROGRESSION
Alberts et al, Molecular Biology of the Cell
In a group of replicating cells, the probability of a mutation arising from DNA copying is 10^-9 per nucleotide per cell. Should a cell acquire a mutation in a gene that confers a growth advantage, it will begin to outgrow its brethren. More replication events mean more mutation events. Subsequent hits can affect whether or not cells die when they are supposed to (a process called apoptosis), and cells that dont die can replicate more. When members of the clonal population have accrued enough mutations, they will experience gross changes in morphology and longevity, and they will be able to break away to lodge in other organs (a process called metastasis). In this process many mutations may be acquired that do not help or slow cancer development - they are passenger mutations, noise that adds to the difficulty in identifying cancer processes.
54 / 67
P ROGRESSION
Alberts et al, Molecular Biology of the Cell
In a group of replicating cells, the probability of a mutation arising from DNA copying is 10^-9 per nucleotide per cell. Should a cell acquire a mutation in a gene that confers a growth advantage, it will begin to outgrow its brethren. More replication events mean more mutation events. Subsequent hits can affect whether or not cells die when they are supposed to (a process called apoptosis), and cells that dont die can replicate more. When members of the clonal population have accrued enough mutations, they will experience gross changes in morphology and longevity, and they will be able to break away to lodge in other organs (a process called metastasis). In this process many mutations may be acquired that do not help or slow cancer development - they are passenger mutations, noise that adds to the difficulty in identifying cancer processes.
55 / 67
P ROGRESSION
Alberts et al, Molecular Biology of the Cell
In a group of replicating cells, the probability of a mutation arising from DNA copying is 10^-9 per nucleotide per cell. Should a cell acquire a mutation in a gene that confers a growth advantage, it will begin to outgrow its brethren. More replication events mean more mutation events. Subsequent hits can affect whether or not cells die when they are supposed to (a process called apoptosis), and cells that dont die can replicate more. When members of the clonal population have accrued enough mutations, they will experience gross changes in morphology and longevity, and they will be able to break away to lodge in other organs (a process called metastasis). In this process many mutations may be acquired that do not help or slow cancer development - they are passenger mutations, noise that adds to the difficulty in identifying cancer processes.
56 / 67
C ANCER I NVASION
AND
M ETASTASIS
Cell surface of a liver showing multiple metastatic nodules originating from pancreatic cancer
57 / 67
T YPES
OF
G ENES I MPLICATED
Oncogenes: Protein-coding genes that are up-regulated in cancer. Mutations render these genes constitutively active. Tumor Suppressor Genes: Protein-coding genes that are down-regulated in cancer. Mutations reduce the activity of their gene products. Genetic Instability Genes: Responsible for repairing subtle mistakes due to DNA replication or exposure to carcinogens, e.g., mismatch repair, nucleotide-excision repair, base-excision repair.
58 / 67
T YPES ( CONT )
Tumor Suppressor Genes
loss of function ~ no brakes
APC, p53, RB1, NF1
Oncogenes
gain of function ~ stuck gas pedal
K-ras, RET, KIT, MET
Growth Advantage
Mutation Rate
59 / 67
T UMORIGENESIS
Discussion
Genetic Progression and the Waiting Time to Cancer
Niko Beerenwinkel1*, Tibor Antal1, David Dingli1, Arne Traulsen1, Kenneth W. Kinzler2, Victor E. Velculescu2, Bert Vogelstein2,3, Martin A. Nowak1
1 Program for Evolutionary Dynamics, Harvard University, Cambridge, Massachusetts, United States of America, 2 Ludwig Center, Sidney Kimmel Comprehensive Cancer Center at Johns Hopkins, Baltimore, Maryland, United States of America, 3 Howard Hughes Medical Institute, Johns Hopkins University, Baltimore, Maryland, United States of America
Cancer results from genetic alterations that disturb the normal cooperative behavior of cells. Recent high-throughput genomic studies of cancer cells have shown that the mutational landscape of cancer is complex and that individual cancers may evolve through mutations in as many as 20 different cancer-associated genes. We use data published by Sjoblom et al. (2006) to develop a new mathematical model for the somatic evolution of colorectal cancers. We employ the Wright-Fisher process for exploring the basic parameters of this evolutionary process and derive an analytical approximation for the expected waiting time to the cancer phenotype. Our results highlight the relative importance of selection over both the size of the cell population at risk and the mutation rate. The model predicts that the observed genetic diversity of cancer genomes can arise under a normal mutation rate if the average selective advantage per mutation is on the order of 1%. Increased mutation rates due to genetic instability would allow even smaller selective advantages during tumorigenesis. The complexity of cancer progression can be understood as the result of multiple sequential mutations, each of which has a relatively small but positive effect on net cell growth.
Balaji Veeramani & Sarah Richardson 550.635 Topics in Bioinformatics Monday September 13, 2010
60 / 67
PER
G ENE
IN
A panel of 35 tumors and 78 genes ordered by frequency of mutation in colon cancers. The number of driver genes that must acquire mutations is Genetic Progression of Cancer small for colon cancer.
Figure 2. Mutational Patterns in 35 Late-Stage Colorectal Cancer Tumors from Sjoblom et al. (2006) Matrix rows are indexed by tumors, columns are indexed by cancer-associated genes as identified by Sjoblom et al. (2006). Dark spots indicate mutated genes. Both tumors and genes have been sorted by an increasing number of mutations. The three genes mutated most often are APC (in 24 tumors; last 67 61 /
K-ras p53
APC
62 / 67
62 / 67
WF ( CONT )
Let (t)) = (N0 (t), . . . , Nd (t)). All cells are generated independently at each generation. Assume that the population size follows a deterministic evolution. Let (k|(t)) be the conditional probability that a cell is in state k at time t + 1 given (t) N(t + 1)! P((t + 1)|(t)) = N0 (t + 1)! Nd (t + 1)!
d
(k|(t))Nk (t+1) .
k=0
63 / 67
WF ( CONT )
Suppose each cell picks its parent prototype at random.
64 / 67
WF ( CONT )
Suppose each cell picks its parent prototype at random. Selective advantage is modeled by assigning weights (w0 , . . . , wd ) to the parents: (k|(t)) = wk Nk (t) . w0 N0 (t) + + wd Nd (t)
64 / 67
WF ( CONT )
Suppose each cell picks its parent prototype at random. Selective advantage is modeled by assigning weights (w0 , . . . , wd ) to the parents: (k|(t)) = wk Nk (t) . w0 N0 (t) + + wd Nd (t)
64 / 67
WF ( CONT )
Suppose each cell picks its parent prototype at random. Selective advantage is modeled by assigning weights (w0 , . . . , wd ) to the parents: (k|(t)) = wk Nk (t) . w0 N0 (t) + + wd Nd (t)
Let wk = (1 + s)k , where s is the selective advantage. The mutation rate, u, the probability for each loci to mutate from one generation to another.
64 / 67
WF ( CONT )
Suppose each cell picks its parent prototype at random. Selective advantage is modeled by assigning weights (w0 , . . . , wd ) to the parents: (k|(t)) = wk Nk (t) . w0 N0 (t) + + wd Nd (t)
Let wk = (1 + s)k , where s is the selective advantage. The mutation rate, u, the probability for each loci to mutate from one generation to another. Then (k|)
k
=
j=0
we (1)
The parameter j is the probability that a cell in the next generation will have j mutations. If the mutation rate is small u 1 we can neglect multiple mutations, and j simplies to (1 + s)j xj (1 + s)j1 xj1 j = + u(d j + 1) . (1 + s) x (1 + s) x
Portions of code omitted for illustration
The rst term is the probability to to floating points effects. produce an additional Matlab command 'mnrnd' very sensitive cell of type j without mutation, while the second term
bot valu 65 / 67 T
FOR
tk = k
C ANCER
s log (Ninit Nn )
= number of cells in the population s (log ud )2 s = constant selective advantage, > 0 tku ==k constant mutation rate per gene s log (Ninit Nn ) d = number of driver genes considered
N = number of cells of the population k = number in driver genes
with mutations
s = constant selective advantage, > 0 u = constant mutation rate per gene d = number of driver genes considered k = number of driver genes with mutations
= 107 7 init = 10
9
66 / 67
T RAVELING WAVE
u = 107 Ninit = 107 Nn = 109 d = 100 s = 0.1 for k = 20, tk will be between 5 and 15 years. u = 107 Ninit = 106 Nn = 109 s = 0.01 d = 100 Figure 3 for k = 20, tk will be between 5 and 15 years.
67 / 67
A single simulation.The rst mutations in homogenous wildtype population set of a traveling wave. Ech class has a gaussian distribution, turnover is 1 cell division per cell per day.