Académique Documents
Professionnel Documents
Culture Documents
15 May 2008
1 2
5 6
p
Logit estimates Number of obs = 323 For women, gender=0: ln = −3.1+1.0(0) = −3.1
LR chi2(1)
Prob > chi2
=
=
4.46
0.0348 1− p
Log likelihood = -75.469757 Pseudo R2 = 0.0287
------------------------------------------------------------------------------
smoke | Coef. Std. Err. z P>|z| [95% Conf. Interval]
p
-------------+----------------------------------------------------------------
gender | .967966 .4547931 2.13 0.033 .0765879 1.859344
For men, gender=1: ln = −3.1+1.0(1) = −2.1
(Intercept)| -3.058707 .3235656 -9.45 0.000 -3.692884 -2.42453 1− p
------------------------------------------------------------------------------
9 10
Logistic Regression
Interpretation 2: odds scale Comparing odds
eβ0 : the odds of smoking for women If we subtract the log odds, mathematically
(when X=0) that’s equivalent to dividing inside the log:
log(a) – log(b) = log(a/b)
the odds of smoking for men So, if
eβ0 +β1 : (when X=1) e
β +β
0
= e-3.1+1.0 = e-2.1 = 0.12 is the odds when X=1, and
1
11 12
Logistic Regression
Interpretation: the odds ratio Useful math – ratios of exponentiated terms
odds for men eβ 0 +β1 0.12
Odds Ratio = = β0 = = 2.4 We can usually simplify an equation like this
odds for women e 0.05
ea
because
b
= e a −b
e
13 14
Taking a ratio of odds to get the odds ratio Two interpretations of logistic regression slopes
β 0 +β1
eβ1 = odds ratio
e
β0
= eβ1 the odds ratio
e But we started with P(Y=1)
comparing the odds when X=1 vs. X=0 Can we find that?
15 16
More useful math –
how to get the probability from the odds Finding the probability from the log odds
Find the log odds:
probability For X=0: log(odds) = β0
odds= 1− probability For X=1: log(odds) = β0 + β1
Find odds:
odds For X=0: odds = β 0
probability = e
1 + odds For X=1: odds = β 0 +β1
e
Transform odds into probability:
eβ0 +β1 (next slide…)
so P (X = 1) =
1 + eβ0 +β1
17 18
Finding the probability from the log odds, cont… We could even go one step further
p1
Transform odds into probability: Re lative Risk (RR) =
p2
eβ 0 +β1
odds For X = 1 : P(smoke | male) =
p= 1 + eβ 0 +β1
1 + odds eβ 0
For X = 0 : P(smoke | female) =
1 + eβ 0
eβ 0 eβ0 +β1
p1 1 + e 0 1
For X = 0 : probability = β +β
1 + eβ 0
Relative Risk for Men vs. Women :
p2
=
eβ 0
β0
no way to simplify 1+ e
eβ 0 +β1
For X = 1 : probability =
1 + eβ 0 +β1
19 20
Remember to consider study design In General
We always can calculate the relative risk Logistic regression for a binary outcome
Left side of equation is log odds
The relative risk is not appropriate for Can transform the equation to find
case-control studies odds
Again, because the investigators decide the probability
number of cases and controls to study Can compare two groups
difference of log odds ≡ log odds ratio
The odds ratio is appropriate for all odds ratio
study designs relative risk
(Almost) everything we learned before applies
21 22
Summary:
Useful math for logistic regression Another Example
If log(a ) = b, then exp(log(a)) = a = e b
Regular physical examination is an important
X=1: log(odds)= β0+β1(1) so odds for (X = 1) = eβ 0 +β1
preventative public health measure
log(a) – log(b) = log(a/b) We’ll study this outcome using the public
so log(odds|X=1) – log(odds|X=0) = log(OR for X=1 vs. X=0)
health graduate student dataset
ea eβ0 +β1 Also : ea +b = ea × e b Outcome: No physical exam in the past two years
= e a -b so = eβ1
eb eβ0 so e 2β1 = eβ1 × eβ1 = eβ1( )2
Primary predictor: age (centered)
odds Secondary predictor and potential confounder:
probability =
1 + odds regularly taking a multivitamin
eβ 0 +β1
so probability for (X = 1) =
1 + eβ 0 +β1
23 24
Problem with outcome variable: Goals
The original “physician visit” variable was meant to be
continuous, but it was collected categorically Predict Phys (no physician visit within
time since last physician visit
Since it is now categorical and we wish to use it as the
the past two years=1) with centered
outcome for a regression model, we will make it binary Age (continuous)
and use logistic regression
After adjusting for age, is taking a
Phys = 1 if over 2 years multivitamin (1=yes) a statistically significant
0 if 2 years or less predictor for not regularly visiting a physician?
Length of time since last |
Is taking a multivitamin a confounder for the
check-up | Freq. Percent Cum.
--------------------------+-----------------------------------
age-physician visit relationship?
Within the past year | 182 54.17 54.17
Within the past 1-2 years | 72 21.43 75.60
Within the past 2-5 years | 53 15.77 91.37
5 or more years | 29 8.63 100.00
--------------------------+-----------------------------------
Total | 336 100.00
25 26
Results Model 1:
Model 1: Intercept and Age Interpretation of coefficients on log odds scale
Note that agec = age-30 (centered age)
Logit estimates Number of obs = 336 β0: the log odds of not visiting a physician
LR chi2(1)
Prob > chi2
=
=
0.00
0.9567
for a 30-year-old
Log likelihood = -186.71399 Pseudo R2 = 0.0000
------------------------------------------------------------------------------
phys_no | Coef. Std. Err. z P>|z| [95% Conf. Interval]
β1: the difference in the log odds of not
-------------+---------------------------------------------------------------- visiting a physician for a one year increase
agec | -.0009585 .0176509 -0.05 0.957 -.0355536 .0336365
(Intercept) | -1.130428 .1270539 -8.90 0.000 -1.379449 -.8814066 in age
------------------------------------------------------------------------------
p p
log = β0 + β1 ( Age − 30) ⇒ log = -1.13 − 0.001( Age − 30 )
p p 1− p 1− p
log = β0 + β1 ( Age − 30) ⇒ log = -1.13 − 0.001( Age − 30 )
1− p 1− p
27 28
Model 1: How did we get the difference in log Model 1:
odds interpretation of β1 ? Interpretation of β1 (diff log odds = log OR)
Predictions by age
p p
log(a) – log(b) = log(a/b)
log = β0 + β1 ( Age − 30) ⇒ log = -1.13 − 0.001( Age − 30)
1− p 1− p so log(odds|X=31) – log(odds|X=30)
= log(OR for X=31 vs. X=30)
For a 30-year-old: difference of log odds = log odds ratio
p
log = -1.13 − 0.001(30 − 30) = −1.13
1− p Alternate interpretation for β1:
For a 31-year-old: The log odds ratio of not visiting a
p
log = -1.13 − 0.001(31 − 30) = −1.13 − 0.001 = −1.129
physician associated with a one year
1− p increase in age
β1 is the difference in the log odds
associated with a 1 year increase in age 29 30
0.3230 e
33 34
37 38
Logistic regression:
Nested models Comparing nested models that differ by one variable
Adding a single new variable to the model Compare models with p-value or CI
p What method is this?
Model 1: log = β0 + β1 ( Age − 30 )
1− p The Wald test, a test that applies the CLT, like
Z test comparing proportions in 2x2 table
p X2 test for independence in 2x2 table
Model 2: log = β0 + β1 ( Age − 30) + β 2 (Multivitamin )
1− p analogous to the t test for linear regression
H0: the new variable is not needed
Or, equivalently
H0: βnew=0 in the population
39 40
Model 2: Results Conclusion from the Wald test
Logit estimates Number of obs = 317
LR chi2(2) = 7.87 The p-value for multivitamin is 0.007 (<0.05)
Prob > chi2 = 0.0195
Log likelihood = -171.80997 Pseudo R2 = 0.0224 and the CI for coefficient multivitamin does
------------------------------------------------------------------------------ not include 0 (CI for OR doesn’t include 1)
phys_no | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
agec | .0012855 .0192619 0.07 0.947 -.0364671 .0390381
Reject H0
multivit | -.7808889 .2871247 -2.72 0.007 -1.343643 -.2181349
(Intercept) | -.8571962 .159519 -5.37 0.000 -1.169848 -.5445446 Conclude that the larger model is better:
------------------------------------------------------------------------------
after adjusting for age, multivitamin use is still
p an important predictor of physician visits in
log = β0 + β1 ( Age − 30) + β 2 (Multivitamin )
1− p the population
p
⇒ log = -0.86 + 0.001( Age − 30) − 0.78( Multivitamin )
1− p
41 42
Model 2: Model 2:
Coefficient interpretation on the log odds scale Interpretation – odds and odds ratio scale
p
log = β0 + β1 ( Age − 30) + β2 (Multivitamin )
1− p exp(β0): the odds of not visiting a physician
⇒
p
log
for a 30-year-old person who reports not
= -0.86 + 0.001( Age − 30) − 0.78( Multivitamin )
1− p regularly taking multivitamins
β0: the log odds of not visiting a physician for a 30-
year-old person who reports not regularly taking
multivitamins p
log = β0 + β1 ( Age − 30) + β 2 (Multivitamin )
1− p
β1: the log odds ratio of not visiting a physician for
p
a one year increase in age controlling for multivitamin ⇒ log = -0.86 + 0.001( Age − 30) − 0.78( Multivitamin )
1− p
use
β2: the log odds ratio of not visiting a physician for
those who take multivitamins compared with those
who do not, adjusting for age
43 44
Model 2: Model 2:
Interpretation – odds and odds ratio scale Interpretation – odds and odds ratio scale
exp(β1): after adjusting for multivitamin use, exp(β2): the odds ratio of not visiting a
the odds ratio of not visiting a physician physician for those who take multivitamins
changes by a factor of exp(β1)=1.001 compared with those who do not is
for each additional year of age exp(β2)=0.46, adjusting for age
additional age is associated with lower frequency of taking multivitamins is associated with regular physician
physician visits in these students, but the association is not visits (p=0.007)
statistically significant (p>0.05)
p
p log = β0 + β1 ( Age − 30) + β 2 (Multivitamin )
log = β0 + β1 ( Age − 30 ) + β2 (Multivitamin ) 1− p
1− p
p
p ⇒ log = -0.86 + 0.001( Age − 30) − 0.78( Multivitamin )
⇒ log = -0.86 + 0.001( Age − 30) − 0.78( Multivitamin ) 1− p
1− p
45 46
Predict Phys (no physician visit within the past CI for β1 in model 1: (-0.036, 0.034)
two years=1) with Age (continuous) Estimate for β1 in model 2: 0.001
After adjusting for age, is taking a
multivitamin (1=yes) a statistically significant CI for exp(β1) in model 1:
predictor for not regularly visiting a physician? (exp(-0.036), exp(0.034)) → (0.97, 1.03)
Estimate for exp{β1} in model 2:
Is taking a multivitamin a confounder exp(0.001) = 1.001
for the age-physician visit relationship?
Estimate from model 2 is in original CI:
multivitamin use is not a statistically
significant confounder
47 48
Interpretation of lack of confounding result Goals: conclusion 1
The factor by which the odds of Predict Phys (no physician visit
irregular physician visits changes for within the past two years=1) with
each additional year of age does not
change appreciably when we adjust for Age (continuous)
multivitamin use There is no statistically significant effect of
age on physician visits in the population
49 50
51 52
Summary of Lecture 14