Académique Documents
Professionnel Documents
Culture Documents
m
@
° Genome-wide association studies (GWASs)
° Single-nucleotide polymorphism (SNP)
° High-throughput genotyping technologies
° Alzheimer͛s disease (AD):
° AD afflicts about 10% of persons over 65 and
almost half of those over 85
° ~5.5 million cases currently in U.S.
° 95% of all AD cases are Late-Onset AD (LOAD)
° Source
TGEN dataset by Reiman et al *
° Cases
° 1411 individuals
° 861 LOAD and 550 controls
° SNPs
° 312,316 SNPs
° Two additional SNPs (rs429358 and rs7412) genotyped
separately (these determine APOE status)
____________________________________________________________________
* Reiman E, Webster J, Myers A, Hardy J, Dunckley T, Zismann V, et al. GAB2 alleles
modify Alzheimer's risk in APOE epsilon4 carriers. Neuron. 2007;54(5):713-20.
° Bayesian Model Averaging
° Represents uncertainty about the correctness of
any given model
° Performs inference by weighting the prediction of
each model by our uncertainty in that model
° Model-Averaged Naïve Bayes (MANB)
MANB efficiently averages over all naive Bayes
models (on a given set of variables) in making a
prediction for an individual patient case
@
LOAD
LOAD
LOAD
SNP 1 SNP 2
͙ SNP
312,318
@ @"
*
ÿ ÿ
6*
6
6
66
@ @"
° We can take advantage of the conditional independence
relationships in NB models to make it efficient to model
average over all those many models.
° The computational ͞trick͟ is as follows*
° For each O we construct a model-averaged conditional
probability, (O | ), by averaging over whether or
not there is an arc from to O
____________________________________________________________________
* Dash D, Cooper G. Exact model averaging with naive Bayesian classifiers.
International Conference on Machine Learning (2002) 91 - 98.
@ @"
° We can take advantage of the conditional independence
relationships in NB models to make it efficient to model
average over all those many models.
° The computational ͞trick͟ is as follows*
° For each O we construct a model-averaged conditional
probability, (O | ), by averaging over whether or not
there is an arc from to O
° We use these model-averaged conditional probabilities to define a
new NB model M over which we now perform NB inference.
° Performing inference with M is the same as model averaging over
the exponential number of NB models discussed previously.
____________________________________________________________________
* Dash D, Cooper G. Exact model averaging with naive Bayesian classifiers.
International Conference on Machine Learning (2002) 91 - 98.
@
° Structure priors
° FSNB and MANB assume each arc is present with some
probability , independent of the status of other arcs in
the model.
° Informed by the literature, we chose a value of that
yields an expected number of arcs of 20.
° Parameter priors
If we think of (O |) as defining a table of
probabilities, then we assume that every way of filling in
that table (consistent with the axioms of probability) is
equally likely
@ #$
° Five-fold cross-validation
° Performance measures
° Area under the ROC curve (AUC) as a measure of
discrimination
° Calibration plots and Hosmer-Lemeshow goodness-of-
fit statistics
° Run time
° Control algorithms
° NB
° FSNB
º
º
% %
2000
1684.2
1500
MANB
1000
NB
500 FSNB
16.1 15.6
0
MANB NB FSNB
(p<0.00001).
º
with
almost all the test
cases having
probability
predictions near 0 or
1. Such extreme
predictions occur
because there are
such a large number
of features in the
model.
º
!
!
algorithm
among the three we
evaluated. This result
is likely due to the
FSNB models
containing only a few
SNP features (< 4).
º
& ! @"
@"
'
@"
!'We
believe this result may
be due to FSNB having
such a small number of
features in its models.
!
º
! @"
"m Ë Ë
ËË Ë
º
ËË ËË
" "
Questions?