Académique Documents
Professionnel Documents
Culture Documents
Total Sum of Squares = Regression Sum of Squares + Error (or Residual) Sum of Squares
We can organize the results of a simple linear regression analysis in an ANOVA table.
M SREG M SREG
Regression dfREG SSREG M SREG M SE P (T 2 ≥ M SE ) T 2 ∼ F (dfREG , dfE )
Pn
Pn (Ŷi −Ȳ )2 M SREG M SREG
Regression 1 i=1 (Ŷi − Ȳ )2 i=1
1 M SE P (T 2 ≥ M SE ) T 2 ∼ F (1, n − 2)
Pn
Pn (Yi −Ŷi )2
Error n−2 i=1 (Yi − Ŷi )2 i=1
n−2
Pn
Total n−1 i=1 (Yi − Ȳ )2
M SREG
The F -statistic M SE is used to test
or H0 : β1 = 0 vs. HA : β1 6= 0 for short. The test is equivalent to the t-test that we learned about previously
because
M SREG β̂12 2
(1) F = =h i2 = t and (2) T 2 ∼ F with 1 and n − 2 d.f. ⇐⇒ T ∼ t with n − 2 d.f.
M SE SE(β̂1 )
M SREG
The F -statistic M SE is a special case of the F -statistic used to compare full and reduced models.
The full model corresponds to the situation where β1 can be any value. The reduced model forces β1 to be
0, just like H0 . Write down formulas for RSS(red.), RSS(full), dfRSS(red.) , and dfRSS(full) for the special
case of simple linear regression; and show that the resulting reduced vs. full model F -statistic is the same as
F = MM SREG
SE .
SSREG SSE
Because SST O = SSREG + SSE, we may write 1 = SST O + SST O .
SSE
SST O is the proportion of total variation in the Y values that was not explained by the regression of Y on X.
SSE SSREG
The remaining proportion of variation in the Y values is 1 − SST O = SST O . This quantity – known as the
coefficient of determination – is the proportion of the variation in the Y values that was explained by the
regression of Y on X.
It can be shown that the coefficient of determination is equal to the square of the sample linear correlation coeffi-
cient between X and Y .
SSE SSREG
1− = = r2
SST O SST O