Académique Documents
Professionnel Documents
Culture Documents
Part II
Overview of Association Methods
Dependent Independent
Variable Variable Method
Categorical Categorical Relative Risk (C.I.)
(Discrete) (Discrete) Odds Ratio (C.I.)
Chi-square test
test of proportions
Dependent Independent
Variable Variable Method
Continuous Continuous Linear regression
Correlations
1. Identify H0 and HA
2. Identify a test statistic
3. Determine a significance level, =
0.05, = 0.01
4. Critical value determines rejection /
acceptance region
5. p-value
6. Interpret the result
Measures of association for
continuous dependent
variables
and
continuous independent
variables
Linear Regression
In linear regression one variable (X) is used to
predict another (Y).
X independent, predictor variable
Y dependent, response variable
We assume that we collect a sample of pairs
of observations,
(Xi, Yi) for i = 1, 2,, n
Modeling the relationship between X and Y
requires the specification of two components:
A Systematic Component and a Random
Systematic Component:
E(Yi | Xi) = + Xi
= intercept
= slope
X
Percent body fat = + (Abdomen Circum)
Percent body fat = -39.28 + 0.6313 (Abdomen
Circum)
60
40
20
0
50 100 150
Abdomen circumference (cm)
Examples of Different Beta Values
beta positive beta negative
5 5
0 0
y2
y1
-5 -5
-2 -1 0 1 2 -2 -1 0 1 2
x x
0 2
y4
y3
0
-2
-2
-2 -1 0 1 2 -2 -1 0 1 2
x x
For the simple linear model we can test
hypotheses regarding the estimated :
H0 : = 0
HA : 0
where b
Students t distribution - two tailed test
H0 : = 0
HA : 0
(two tailed because > 0 or < 0 )
.4
.3
.2
.1
0
-4 -2 0 2 4
T
Upper percentile of t distribution
Area =
0 t n
80
the strength
of
60 association
between two
40
(quantitative)
30 35 40 45 50 variables.
Knee circumference (cm)
There are two common correlation
measures:
1. Pearson Correlation Coefficient:
Based on the actual data values.
Measure of linear association.
Natural when the variables have Gaussian.
Related to linear regression and R2.
2. Spearman Rank Correlation:
Based on ranks of each variable (ranks
assigned separately).
Useful measure of the monotone
association, which may not be linear.
Pearsons Correlation Coefficient
The correlation between two variables X
and Y is defined as:
E X X Y Y
V X V Y
Properties:
The correlation is constrained: -1
+1
| | = 1 means perfect linear
relationship:
Y = a + bX
Perfect positive 2
correlation ( = 1) 1
y
-1
-2
-2 -1 0 1 2
x
2
Perfect negative
correlation ( = -1) 1
0
y
-1
-2
-2 -1 0 1 2
x
We estimate the sample correlation
coefficient r using...
r
1 N
i 1 X i X Yi Y
N 1 s X sY
1 N Xi X Yi Y
N 1 i 1 s X sY
To test the hypothesis:
H0 : = 0
HA : 0
We use the statistic:
r
T n2
1 r 2
r
T n2
1 r2
.799
252 2
1 .799 2
21
1.1
1
60
40
Percent body fat from
20 Siri's (1956) equation
0
400
300
Weight (lbs)
200
100
80
60
Height (inches)
40
20
50
45
Neck circumference 40
(cm)
35
30
140
120
Chest circumference
100 (cm)
80
150
Abdomen 2 100
circumference (cm)
50
160
140
80
50
45
Knee circumference 40
(cm)
35
30
1 1.05 1.1 100 200 300 400 30 35 40 45 50 50 100 150 30 35 40 45 50
Measures of association for
categorical dependent
variables
and
continuous independent
variables
Logistic Regression
Y a dichotomous (diseased/non-diseased)
outcome variable and
B1 = ln (Odds Ratio)
RR for CHD
Non-Smoker / Non Drinking 1.0
Non-Smoker/ Heavy Drinking 0.6
Smoker/ Non-Drinking 2.1
Smoker/ Heavy Drinker 9.3
intervention)
2. RR for incident disease
3. 2 test
Cohort Analysis
1. H0: P (Disease | Exposed) = P (Disease| Not
exposed)
Case Control Analysis
1. H0: P ( Exposed | Disease) = P(Not exposed |
Disease)
2. OR
3. 2 test
Cross-sectional Analysis
1. H0: P (Disease | Exposed) = P (Disease| Not
exposed)
2. RR or OR for prevalent disease
3. 2 test
Statistics arent the only way to talk about
association:
A non-statistical way to describe the strength of
association between a risk factor and an outcome is to
put it terms of associations that are well known
Tverdal et al. (1990) Coffee consumption and death from CHD in middle-aged
Norwegian men and women. BMJ 300:566-569.
Example Problem : Suppose that 1000
individuals agreed to participate in a study
to test whether a new 8 week exercise
program really reduces physiological and
psychological stress. Participants are
randomized into two exercise groups, given
a questionnaire to collect demographic
information, and their stress levels followed
for eight weeks.
characteristics ( means,
proportions) in one treatment group
versus the other treatment
The descriptive statistics for background
demographic characteristics are given
below.
Group 1 Group 2
% Females 44.0 51.0
p1 p 2 0
Z
p 0 1 p 0 p 0 1 p 0
n1 n2
X
n
Sample mean= X = i
I=1
S
X + t/2
n