Interpreting Logistic Regression Models

Logistic regression
Framework and ideas of logistic

Lecture 14: Interpreting regression similar to linear regression
logistic regression models Still have a systematic and probabilistic
part to any model
Sandy Eckel Coefficients have a new interpretation,
seckel@jhsph.edu based on log(odds) and log(odds ratios)
15 May 2008
1 2
Recall from last time:

The logit function Example: Public health graduate students
323 graduate students in introductory
In logistic regression, we are always biostatistics took a health survey. Current
modelling the outcome log(p/(1-p)) smoking status was assessed, which we will
We define the function: predict with gender
Associating demographics with smoking is vital to
logit(p)= log(p/(1-p)) planning public health programs.
We often use the name logit for
convenience Information was also collected on age, exercise, and
history of smoking; potential confounders of the
In logistic regression, we have the logit association between gender and current smoking.
on the left-hand side of the equation
First we will focus only on the association between
3 gender and current smoking status 4
Coding our two variables for the first example Recall: an analogous linear regression model
Outcome: In linear regression, if we had only one

smoking = 1 for current smokers binary X like gender, we would be predicting
two means: E(Y) = β + β (Gender )
0 for current nonsmokers 0 1
β0 – the mean outcome when X=0

Primary predictor: β0 + β1 – the mean outcome when X=1
gender = 1 for men β1 – the difference in mean outcome
when X=1 vs. when X=0
0 for women
5 6
Logistic regression model Logistic Regression

and Results Gender-specific results
 p   p 
 p   p  ln  = β0 + β1 (Gender ) ⇒ ln  = -3.1 + 1.0(Gender )
log  = β0 + β1 (Gender ) ⇒ log  = -3.1 + 1.0(Gender )
1− p  1− p  1− p  1− p 
 p 
Logit estimates Number of obs = 323 For women, gender=0: ln  = −3.1+1.0(0) = −3.1
LR chi2(1)
Prob > chi2
=
=
4.46
0.0348  1− p 
Log likelihood = -75.469757 Pseudo R2 = 0.0287
------------------------------------------------------------------------------
smoke | Coef. Std. Err. z P>|z| [95% Conf. Interval]
 p 
-------------+----------------------------------------------------------------
gender | .967966 .4547931 2.13 0.033 .0765879 1.859344
For men, gender=1: ln  = −3.1+1.0(1) = −2.1
(Intercept)| -3.058707 .3235656 -9.45 0.000 -3.692884 -2.42453  1− p 
------------------------------------------------------------------------------
β1 is the difference between men and women

gender = 1 for men
0 for women β1 is the change in log odds comparing men to
7 women 8
Logistic Regression What if we wanted to get the odds interpretation,
Interpretation 1: log(odds) scale not the log odds…
 p   p  We can start to “untransform” the equations

ln  = β0 + β1 (Gender ) ⇒ ln  = -3.1 + 1.0(Gender )
1− p  1− p  Recall:
gender = 1 for men if log(a ) = b, then exp(log(a)) = a = e b
0 for women
For women, X=0: log(odds)= β0+β1(0) = β0
β0: the log odds of smoking for women
β0+β1: the log odds of smoking for men odds of smoking for women = eβ0 = e-3.1 = 0.05
For men, X=1: log(odds)= β0+β1(1)

β1: the difference in the log odds of smoking
for men compared to women odds of smoking for men = eβ0 +β1 = e-3.1+1.0 = e-2.1 = 0.12
9 10
Logistic Regression
Interpretation 2: odds scale Comparing odds
eβ0 : the odds of smoking for women If we subtract the log odds, mathematically
(when X=0) that’s equivalent to dividing inside the log:
log(a) – log(b) = log(a/b)
the odds of smoking for men So, if
eβ0 +β1 : (when X=1) e
β +β
0
= e-3.1+1.0 = e-2.1 = 0.12 is the odds when X=1, and
1
eβ = e-3.1 = 0.05 is the odds when X=0, then

0
we want to divide them in order to compare

In the past, we’ve compared two sets of odds
by dividing to find the odds ratio (OR)
odds for men eβ 0 +β1 0.12
Odds Ratio = = β0 = = 2.4
odds for women e 0.05
11 12
Logistic Regression
Interpretation: the odds ratio Useful math – ratios of exponentiated terms
odds for men eβ 0 +β1 0.12
Odds Ratio = = β0 = = 2.4 We can usually simplify an equation like this
odds for women e 0.05
The odds of smoking is about 2 ½ times eβ0 +β1

Odds Ratio = β0
greater for men than for women. e
= e (β0 +β1 )-(β0 )
Based on this study, perhaps smoking
cessation programs should be targeted = eβ1
toward men
ea
because
b
= e a −b
e
13 14
Taking a ratio of odds to get the odds ratio Two interpretations of logistic regression slopes
eβ0 : the odds when X=0 β0+β1 = log(odds) (for X=1)

β1 = difference in log odds
eβ0 +β1 : the odds when X=1
eβ +β = odds (for X=1)
0 1
β 0 +β1
eβ1 = odds ratio
e
β0
= eβ1 the odds ratio
e But we started with P(Y=1)
comparing the odds when X=1 vs. X=0 Can we find that?
15 16
More useful math –
how to get the probability from the odds Finding the probability from the log odds
Find the log odds:
probability For X=0: log(odds) = β0
odds= 1− probability For X=1: log(odds) = β0 + β1
Find odds:
odds For X=0: odds = β 0
probability = e
1 + odds For X=1: odds = β 0 +β1
e
Transform odds into probability:
eβ0 +β1 (next slide…)
so P (X = 1) =
1 + eβ0 +β1
17 18
Finding the probability from the log odds, cont… We could even go one step further
p1
Transform odds into probability: Re lative Risk (RR) =
p2
eβ 0 +β1
odds For X = 1 : P(smoke | male) =
p= 1 + eβ 0 +β1
1 + odds eβ 0
For X = 0 : P(smoke | female) =
1 + eβ 0
eβ 0  eβ0 +β1 
 
p1  1 + e 0 1 
For X = 0 : probability = β +β
1 + eβ 0
Relative Risk for Men vs. Women :
p2
=
 eβ 0 
 β0 

no way to simplify 1+ e 
eβ 0 +β1
For X = 1 : probability =
1 + eβ 0 +β1
19 20
Remember to consider study design In General
We always can calculate the relative risk Logistic regression for a binary outcome
Left side of equation is log odds
The relative risk is not appropriate for Can transform the equation to find
case-control studies odds
Again, because the investigators decide the probability
number of cases and controls to study Can compare two groups
difference of log odds ≡ log odds ratio
The odds ratio is appropriate for all odds ratio
study designs relative risk
(Almost) everything we learned before applies
21 22
Summary:
Useful math for logistic regression Another Example
If log(a ) = b, then exp(log(a)) = a = e b
Regular physical examination is an important
X=1: log(odds)= β0+β1(1) so odds for (X = 1) = eβ 0 +β1
preventative public health measure
log(a) – log(b) = log(a/b) We’ll study this outcome using the public
so log(odds|X=1) – log(odds|X=0) = log(OR for X=1 vs. X=0)
health graduate student dataset
ea eβ0 +β1 Also : ea +b = ea × e b Outcome: No physical exam in the past two years
= e a -b so = eβ1
eb eβ0 so e 2β1 = eβ1 × eβ1 = eβ1( )2
Primary predictor: age (centered)
odds Secondary predictor and potential confounder:
probability =
1 + odds regularly taking a multivitamin
eβ 0 +β1
so probability for (X = 1) =
1 + eβ 0 +β1
23 24
Problem with outcome variable: Goals
The original “physician visit” variable was meant to be
continuous, but it was collected categorically Predict Phys (no physician visit within
time since last physician visit
Since it is now categorical and we wish to use it as the
the past two years=1) with centered
outcome for a regression model, we will make it binary Age (continuous)
and use logistic regression
After adjusting for age, is taking a
Phys = 1 if over 2 years multivitamin (1=yes) a statistically significant
0 if 2 years or less predictor for not regularly visiting a physician?
Length of time since last |
Is taking a multivitamin a confounder for the
check-up | Freq. Percent Cum.
--------------------------+-----------------------------------
age-physician visit relationship?
Within the past year | 182 54.17 54.17
Within the past 1-2 years | 72 21.43 75.60
Within the past 2-5 years | 53 15.77 91.37
5 or more years | 29 8.63 100.00
--------------------------+-----------------------------------
Total | 336 100.00
25 26
Results Model 1:
Model 1: Intercept and Age Interpretation of coefficients on log odds scale
Note that agec = age-30 (centered age)
Logit estimates Number of obs = 336 β0: the log odds of not visiting a physician
LR chi2(1)
Prob > chi2
=
=
0.00
0.9567
for a 30-year-old
Log likelihood = -186.71399 Pseudo R2 = 0.0000
------------------------------------------------------------------------------
phys_no | Coef. Std. Err. z P>|z| [95% Conf. Interval]
β1: the difference in the log odds of not
-------------+---------------------------------------------------------------- visiting a physician for a one year increase
agec | -.0009585 .0176509 -0.05 0.957 -.0355536 .0336365
(Intercept) | -1.130428 .1270539 -8.90 0.000 -1.379449 -.8814066 in age
------------------------------------------------------------------------------
 p   p 
log  = β0 + β1 ( Age − 30) ⇒ log  = -1.13 − 0.001( Age − 30 )
 p   p  1− p  1− p 
log  = β0 + β1 ( Age − 30) ⇒ log  = -1.13 − 0.001( Age − 30 )
1− p  1− p 
27 28
Model 1: How did we get the difference in log Model 1:
odds interpretation of β1 ? Interpretation of β1 (diff log odds = log OR)
Predictions by age
 p   p 
log(a) – log(b) = log(a/b)
log  = β0 + β1 ( Age − 30) ⇒ log  = -1.13 − 0.001( Age − 30)
1− p  1− p  so log(odds|X=31) – log(odds|X=30)
= log(OR for X=31 vs. X=30)
For a 30-year-old: difference of log odds = log odds ratio
 p 
log  = -1.13 − 0.001(30 − 30) = −1.13
1− p  Alternate interpretation for β1:
For a 31-year-old: The log odds ratio of not visiting a
 p 
log  = -1.13 − 0.001(31 − 30) = −1.13 − 0.001 = −1.129
physician associated with a one year
1− p  increase in age
β1 is the difference in the log odds
associated with a 1 year increase in age 29 30
Model 1: Model 1: Interpretation of β1

Interpretation of β1 (OR
Interpretation: = ratioratio)
log(odds of odds)
for one odds ratio for one year age difference
year age difference β
p e 0 is the odds of not visiting a physician for
odds of not visiting a physician = = e -1.13−0.001(Age−30 )
1− p 30-year-olds
For a 31-year-old: β +β
e0 1 is the odds of not visiting a physician
p
= e-1.13−0.001(31−30) = e-1.13−0.001 = e−1.131 = 0.3227 for 31-year-olds
1− p
For a 30-year-old: β1
e is the odds ratio of not visiting a
p
= e-1.13 = 0.3230 physician corresponding to a one year
1− p increase in age
β 0 +β1
Odds ratio = 0.3227 = 0.999 = e β 0 = eβ1
0.3230 e
31 32
Model 1: Interpretation of β1 Model 1: Interpretation of β1
the OR for two
What isInterpretation: year
odds agefor
ratio difference?
two year the OR for ten
What isInterpretation: yearratio
odds agefor
difference?
10 year age
age difference p difference p
odds of not visiting a physician = = e -1.13−0.001(Age−30 ) odds of not visiting a physician = = e -1.13−0.001(Age−30 )
1− p 1− p
For a 32-year-old: For a 40-year-old:
p p
= e-1.13−0.001(32−30) = e-1.13−0.001×2 = e−1.132 = 0.3224 = e-1.13−0.001(40−30) = e-1.13−0.01 = e−1.14 = 0.3198
1− p 1− p
For a 30-year-old: For a 30-year-old:

p p
= e-1.13 = 0.3230 = e-1.13 = 0.3230
1− p 1− p
eβ 0 + 2β1 0.3198 eβ 0 +10β1
Ratio = 0.3224
= 0.998 = β 0 = e 2β1 = eβ1 ( ) 2
Ratio =
0.3230
= 0.990 = β 0 = e10β1 = eβ1
e
( ) 10
0.3230 e
33 34
Model 1: Interpretation of β1 Model 1: How could we get a Relative Risk?

What is the OR for any age difference? (if it was appropriate based on our study design)
β e -1.13−0.001(Age−30 )
e 1 is the proportional increase of the probability of not visiting a physician = p =
odds of not visiting a physician corresponding 1 + e -1.13−0.001(Age−30 )
to a one year increase in age For a 40-year-old:
e-1.13−0.001(40−30) e-1.13−0.01 e-1.14
p= = = 0.2423
(odds for 30 - yr - old) × (odds for 31 - yr - old) = (odds for 31 - yr - old) 1+ e-1.13−0.001(40−30) 1+ e-1.13−0.01 1+ e-1.14
(odds for 30 - yr - old)
For a 30-year-old:
( )
β1 10
= e10β1 is the proportional increase of e-1.13−0.001(0) e-1.13
e p= = = 0.2442
the odds of not visiting a physician 1+ e-1.13−0.001(0) 1+ e-1.13
corresponding to a ten year increase in age The relative risk (RR) is
p1
= 0.2423 = 0.992
35
p2 0.2442 36
Model 1:
Probabilities and Relative Risk for 10 year diff Remember those Goals?
eβ 0
is the probability of not visiting a
1 + eβ 0 Predict Phys (no physician visit within the past
physician for 30-year-olds two years=1) with Age (continuous)
eβ 0 +β1 ×10 After adjusting for age, is taking a
is the probability of not visiting a
1 + eβ 0 +β1×10 multivitamin (1=yes) a statistically
physician for 40-year-olds significant predictor for not regularly
eβ 0 +β1×10 visiting a physician?
1 + eβ 0 +β1 ×10
eβ 0 Is taking a multivitamin a confounder for the
is the relative risk of not
1 + eβ 0 age-physician visit relationship?
visiting a physician for 40-year-olds vs. 30-
year-olds
37 38
Logistic regression:
Nested models Comparing nested models that differ by one variable
Adding a single new variable to the model Compare models with p-value or CI
 p  What method is this?
Model 1: log  = β0 + β1 ( Age − 30 )
1− p  The Wald test, a test that applies the CLT, like
Z test comparing proportions in 2x2 table
 p  X2 test for independence in 2x2 table
Model 2: log  = β0 + β1 ( Age − 30) + β 2 (Multivitamin )
1− p  analogous to the t test for linear regression
H0: the new variable is not needed
Or, equivalently
H0: βnew=0 in the population
39 40
Model 2: Results Conclusion from the Wald test
Logit estimates Number of obs = 317
LR chi2(2) = 7.87 The p-value for multivitamin is 0.007 (<0.05)
Prob > chi2 = 0.0195
Log likelihood = -171.80997 Pseudo R2 = 0.0224 and the CI for coefficient multivitamin does
------------------------------------------------------------------------------ not include 0 (CI for OR doesn’t include 1)
phys_no | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
agec | .0012855 .0192619 0.07 0.947 -.0364671 .0390381
Reject H0
multivit | -.7808889 .2871247 -2.72 0.007 -1.343643 -.2181349
(Intercept) | -.8571962 .159519 -5.37 0.000 -1.169848 -.5445446 Conclude that the larger model is better:
------------------------------------------------------------------------------
after adjusting for age, multivitamin use is still
 p  an important predictor of physician visits in
log  = β0 + β1 ( Age − 30) + β 2 (Multivitamin )
1− p  the population
 p 
⇒ log  = -0.86 + 0.001( Age − 30) − 0.78( Multivitamin )
1− p 
41 42
Model 2: Model 2:
Coefficient interpretation on the log odds scale Interpretation – odds and odds ratio scale
 p 
log  = β0 + β1 ( Age − 30) + β2 (Multivitamin )
1− p  exp(β0): the odds of not visiting a physician
⇒
 p 
log
for a 30-year-old person who reports not
 = -0.86 + 0.001( Age − 30) − 0.78( Multivitamin )
1− p  regularly taking multivitamins
β0: the log odds of not visiting a physician for a 30-
year-old person who reports not regularly taking
multivitamins  p 
log  = β0 + β1 ( Age − 30) + β 2 (Multivitamin )
1− p 
β1: the log odds ratio of not visiting a physician for
 p 
a one year increase in age controlling for multivitamin ⇒ log  = -0.86 + 0.001( Age − 30) − 0.78( Multivitamin )
1− p 
use
β2: the log odds ratio of not visiting a physician for
those who take multivitamins compared with those
who do not, adjusting for age
43 44
Model 2: Model 2:
Interpretation – odds and odds ratio scale Interpretation – odds and odds ratio scale
exp(β1): after adjusting for multivitamin use, exp(β2): the odds ratio of not visiting a
the odds ratio of not visiting a physician physician for those who take multivitamins
changes by a factor of exp(β1)=1.001 compared with those who do not is
for each additional year of age exp(β2)=0.46, adjusting for age
additional age is associated with lower frequency of taking multivitamins is associated with regular physician
physician visits in these students, but the association is not visits (p=0.007)
statistically significant (p>0.05)
 p 
 p  log  = β0 + β1 ( Age − 30) + β 2 (Multivitamin )
log  = β0 + β1 ( Age − 30 ) + β2 (Multivitamin ) 1− p 
1− p 
 p 
 p  ⇒ log  = -0.86 + 0.001( Age − 30) − 0.78( Multivitamin )
⇒ log  = -0.86 + 0.001( Age − 30) − 0.78( Multivitamin ) 1− p 
1− p 
45 46
Goals Was multivitamin use a confounder?
Predict Phys (no physician visit within the past CI for β1 in model 1: (-0.036, 0.034)
two years=1) with Age (continuous) Estimate for β1 in model 2: 0.001
After adjusting for age, is taking a
multivitamin (1=yes) a statistically significant CI for exp(β1) in model 1:
predictor for not regularly visiting a physician? (exp(-0.036), exp(0.034)) → (0.97, 1.03)
Estimate for exp{β1} in model 2:
Is taking a multivitamin a confounder exp(0.001) = 1.001
for the age-physician visit relationship?
Estimate from model 2 is in original CI:
multivitamin use is not a statistically
significant confounder
47 48
Interpretation of lack of confounding result Goals: conclusion 1
The factor by which the odds of Predict Phys (no physician visit
irregular physician visits changes for within the past two years=1) with
each additional year of age does not
change appreciably when we adjust for Age (continuous)
multivitamin use There is no statistically significant effect of
age on physician visits in the population
The “slope” is roughly the same before

and after adjusting for multivitamin use.
49 50
Goals: conclusion 2 Goals: conclusion 3
After adjusting for age, is taking a Is taking a multivitamin a

multivitamin (1=yes) a statistically confounder for the age-physician
significant predictor for not regularly visit relationship?
visiting a physician? The effect of age on physician visit is still
After adjusting for age, those who regularly nonsignificant after adjusting for
take a multivitamin are also more likely to multivitamin use and
have visited a physician during the past two multivitamin use is not a confounder
years (p=0.007)
51 52
Summary of Lecture 14
Logistic regression interpretation

Intercept – log odds when all X’s are 0
Slope
difference in log odds for a 1 unit increase in X,
controlling for other X’s
log odds ratio associated with a 1 unit increase in X,
controlling for other X’s
Transform log odds/ log odds ratio to odds/odds
ratio scale by exponentiating
For a continuous X, eβ is the factor by which the odds
changes (or odds ratio) for each unit change of X
Can also transform from log odds to probability
Nested models in Logistic regression that
differ by one variable
Use the Wald test (z-test) for the new variable
53

Interpreting Logistic Regression Models

Transféré par

Informations du document

Description originale:

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Interpreting Logistic Regression Models

Transféré par

Droits d'auteur :

Formats disponibles

Logistic regression

Framework and ideas of logistic

Recall from last time:

Outcome: In linear regression, if we had only one

β0 – the mean outcome when X=0

Logistic regression model Logistic Regression

β1 is the difference between men and women

 p   p  We can start to “untransform” the equations

For men, X=1: log(odds)= β0+β1(1)

eβ = e-3.1 = 0.05 is the odds when X=0, then

we want to divide them in order to compare

The odds of smoking is about 2 ½ times eβ0 +β1

eβ0 : the odds when X=0 β0+β1 = log(odds) (for X=1)

Model 1: Model 1: Interpretation of β1

For a 30-year-old: For a 30-year-old:

Model 1: Interpretation of β1 Model 1: How could we get a Relative Risk?

Goals Was multivitamin use a confounder?

The “slope” is roughly the same before

Goals: conclusion 2 Goals: conclusion 3

After adjusting for age, is taking a Is taking a multivitamin a

Logistic regression interpretation

Vous aimerez peut-être aussi