Vous êtes sur la page 1sur 12

Lecture 01: ROC curve analysis

and diagnostic test


Example 1.
(A matched-pair case-control study of lung cancer (or other
cancers))
(case: lung cancer)
(control: chest problem
or health control)
Design (retrospective or prospective??) case-control study
Matching? Sex- (and/or age-) matched case(s) and controls.
More matching? cell type genotype
Measure (marker): (continuous?)
X-ray film brightness
Trace element(s) in tissue biopsy or serum sample
A non-standard (but is the most possible)
two-sample relationship:

F(x) G(x)




1

2

3

4
possible cutoff points

Question:
To select a suitable cut-off value (say, choosing among

1
,
2
,
3
,
4
,) in order that future rapid screening criteria
can be established.




1





2





3





4

















ROC curve (analysis)
Definition:
A graphical representation of the trade-off between the
false negative (FN) and false positive (FP) rates for every
possible cut-off.

An ROC curve represents the relationship of two specific
distributions (under the same monotonic transformation
class).

Let the false positive be a variable , then 1-G(F
-1
(1-))
is the ROC curve (needs to be estimated)

The empirical function 1-G
m
(F
n
-1
(1- )) converges to
1-G(F
-1
(1-)) strongly.

Data and Implementation



1-SP 1.0 0.7 0.4 0.1
SN 1.0 1.0 0.6 0.3
Receivers operating characteristic curve
















Source:
http://www.anaesthetist.com/mnm/stats/roc/
A relation with logistic regression model

Mathematical requirement:
log(SN/(1-SN))=log((1-SP)/SP)+, or
y/(1-y)=x/(1-x); =exp() = odds ratio;
y=x(x-x+1)
Homogeneity effect models (class)
(vs. heterogeneity effect)
Area under curve (AUC)
(still using logistic regression)
Assume>0
AUC=ydx= x/(x-x+1) dx
==(-1-log)/(-1)
2
; =exp()
~ 0 ==> AUC~0.5
AUC vs.

The AUC is increasing in beta
The confidence interval (CI) of AUC can be constructed via
the CI of beta (in standard two-sample setting). However,
this is a parametric approach.
Multivariate logistic regression & CI of AUC? Complex!
To combine several markers??
Theoretical thinking and Implementation
Using logistic regression model:
log[p(Y=1|X)/1-p(Y=1|X)]=
0
+
1
X
1
+
2
X
2
++
p
X
p

MLEs: b
1
X
1
+b
2
X
2
++b
p
X
p
, where bs are MLEs ofs





Define a new variable T as
T= b
1
X
1
+b
2
X
2
++b
p
X
p

Then for the n observations,
T
i
= b
1
X
1i
+b
2
X
2i
++b
p
X
pi

(Y
i
,T
i
), i=1,2,n, forms a new two-sample problem with
only one (simple) explanatory variable.
Requirement: At least one variable of (X
1
,X
2
,,X
p
) is
continuous.
Linear discriminant function obtained from
logistic regression model and its companion
ML estimation.
Merit and Drawback
Unified model (Homogeneity model!)
Easy to implement using standard statistical package (SAS,
SPSS,)
Variables selection procedure can also be employed. (SAS,
SPSS)
However, configurations of arbitrary variables leads to
difficulty in interpretation about the mechanism behind
this configuration.

--------------------------------------------------------------------
Comment
1. Case-control design: overestimate the AUC of an ROC.
2. Disease progression is not synchronous with the markers.
3. A follow-up study is needed. (Large population, time
consuming, and great costs.)
Optimal cut-off point?

Y (Sens.)



X (1-spec.)
Loss=X*a+(1-Y)*b,
Or more generally, Loss=f(X)*a+g(Y)*b,
1-Y=false negative with loss b
X=false positive with loss a
X=0 implies Y=1-Loss/b (intercept)
To minimize the Loss is equivalent to
maximize the intercept.

Gold Standard?
1. Biopsy as a gold standard (cancer)
2. Compare with the best-ever diagnosis
3. To fix sensitivity or specificity, and then compare with
another rapid diagnosis, still under 1 or 2.
Example: Creutzfeldt-Jakob Disease (CJD) and
S100 protein (as a marker of diagnosis/early
detection?) [BMJ (1998) Vol. 316, 21, pp. 577-582]

Vous aimerez peut-être aussi