Académique Documents
Professionnel Documents
Culture Documents
regression
Review of simple ANOVA
ANOVA
for comparing means between
more than 2 groups
Hypotheses of One-Way
ANOVA
H0 : μ1 μ2 μ3 μc
All population means are equal
i.e., no treatment effect (no variation in means among
groups)
H1 : Not all of the population means are the same
At least one population mean is different
i.e., there is a treatment effect
Does not mean that all population means are different
(some pairs may be the same)
The F-distribution
A ratio of variances follows an F-distribution:
2
between
~ Fn ,m
2
within
H a : 2
between 2
within
How to calculate ANOVA’s by
hand…
Treatment 1 Treatment 2 Treatment 3 Treatment 4
y11 y21 y31 y41
y12 y22 y32 y42 n=10 obs./group
y13 y23 y33 y43
y14 y24 y34 y44 k=4 groups
y15 y25 y35 y45
y16 y26 y36 y46
y17 y27 y37 y47
y18 y28 y38 y48
y19 y29 y39 y49
y110 y210 y310 y410
10
10 10 10
y1 j
y 2j y 3j y 4j The group means
j 1 j 1
y1 y 2
j 1
y 3
j 1 y 4
10 10 10 10
10
10 10
(y (y
10
( y 2 j y 2 ) 2
(y y 3 ) y 4 ) 2
2
1j y1 ) 2
3j 4j
j 1 j 1 j 1 j 1 The (within)
10 1 10 1 10 1 10 1 group variances
Sum of Squares Within (SSW),
or Sum of Squares Error (SSE)
10
(y
10 10
(y (y
10
y 2 )
(y
2
1j y1 ) 2 2j 3j y 3 ) 2
4j y 4 ) 2
j 1 j 1 j 1 j 1
The (within) group
variances
10 1 10 1 10 1 10 1
10 10
(y
10 10
(y ( y 3 j y 3 ) + y 4 ) 2
2
y1 ) +
2 ( y 2 j y 2 ) 2 + 4j
1j
j 1 j 3 j 1
j 1
4 10
i 1 j 1
( y ij y i ) 2 Sum of Squares Within (SSW)
(or SSE, for chance error)
Sum of Squares Between (SSB), or
Sum of Squares Regression (SSR)
4 10
Overall mean of
all 40 y
i 1 j 1
ij
observations
(“grand mean”) y
40
(y
Sum of Squares Between
Squared difference of every
4 10 4 4 10
( y
i 1 j 1
ij y i ) 2
+ 10x ( y i y ) 2
= ( y ij y ) 2
i 1 i 1 j 1
59.7) 2+ (69-59.7) 71 65 64 65
2…+….(sum of 40 squared
deviations) = 2060.6
Step 3) Fill in the ANOVA table
Source of variation d.f. Sum of squares Mean Sum of F-statistic p-value
Squares
Total 39 2257.1
Step 3) Fill in the ANOVA table
Source of variation d.f. Sum of squares Mean Sum of F-statistic p-value
Squares
Total 39 2257.1
INTERPRETATION of ANOVA:
How much of the variance in height is explained by treatment group?
R2=“Coefficient of Determination” = SSB/TSS = 196.5/2275.1=9%
Coefficient of Determination
SSB SSB
R 2
SSB SSE SST
The amount of variation in the outcome variable (dependent
variable) that is explained by the predictor (independent variable).
ANOVA example
Table 6. Mean micronutrient intake from the school lunch by school
S1a, n=25 S2b, n=25 S3c, n=25 P-valued
Calcium (mg) Mean 117.8 158.7 206.5 0.000
SDe 62.4 70.5 86.2
Iron (mg) Mean 2.0 2.0 2.0 0.854
SD 0.6 0.6 0.6
Folate (μg) Mean 26.6 38.7 42.6 0.000
SD 13.1 14.5 15.1
Mean 1.9 1.5 1.3 0.055
Zinc (mg)
SD 1.0 1.2 0.4
a School 1 (most deprived; 40% subsidized lunches). FROM: Gould R, Russell J,
Barker ME. School lunch menus
b School 2 (medium deprived; <10% subsidized). and 11 to 12 year old children's
c School 3 (least deprived; no subsidization, private school). food choice in three secondary
schools in England-are the
d ANOVA; significant differences are highlighted in bold (P<0.05). nutritional standards being met?
Appetite. 2006 Jan;46(1):86-92.
Answer
Step 1) calculate the sum of squares between groups:
Mean for School 1 = 117.8
Mean for School 2 = 158.7
Mean for School 3 = 206.5
Total 74 489,179
**R2=98113/489179=20%
School explains 20% of the variance in lunchtime calcium
intake in these kids.
Beyond one-way ANOVA
Often, you may want to test more than 1
treatment. ANOVA can accommodate
more than 1 treatment or factor, so long
as they are independent. Again, the
variation partitions beautifully!
B
What’s Slope?
E ( yi / xi ) xi
Predicted value for an
individual…
yi= + *xi + random errori
Sy/x
Sy/x
Sy/x
Sy/x
Sy/x
Sy/x
Regression Picture
yi
ŷi xi
C A
B
y
B y
A
C
yi
1. Lee DM, Tajar A, Ulubaev A, et al. Association between 25-hydroxyvitamin D levels and cognitive performance in middle-aged
and older European men. J Neurol Neurosurg Psychiatry. 2009 Jul;80(7):722-9.
Distribution of vitamin D
Mean= 63 nmol/L
Standard deviation = 33 nmol/L
Distribution of DSST
Normally distributed
Mean = 28 points
Standard deviation = 10 points
Four hypothetical datasets
I generated four hypothetical datasets,
with increasing TRUE slopes (between
vit D and DSST):
0
0.5 points per 10 nmol/L
1.0 points per 10 nmol/L
1.5 points per 10 nmol/L
Dataset 1: no relationship
Dataset 2: weak relationship
Dataset 3: weak to moderate
relationship
Dataset 4: moderate
relationship
The “Best fit” line
Regression
equation:
E(Yi) = 28 + 0*vit
Di (in 10 nmol/L)
The “Best fit” line
Regression
equation:
E(Yi) = 26 + 0.5*vit
Di (in 10 nmol/L)
The “Best fit” line
Regression equation:
E(Yi) = 22 + 1.0*vit
Di (in 10 nmol/L)
The “Best fit” line
Regression equation:
E(Yi) = 20 + 1.5*vit Di
(in 10 nmol/L)
Tn-2=
ˆ 0
s.e.( ˆ )
Example: dataset 4
Standard error (beta) = 0.03
T98 = 0.15/0.03 = 5, p<.0001
Sufficient vs.
Deficient
Results…
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Interpretation:
The deficient group has a mean DSST 9.87 points
lower than the reference (sufficient) group.
The insufficient group has a mean DSST 6.87
points lower than the reference (sufficient) group.