Académique Documents
Professionnel Documents
Culture Documents
Chap 12-1
Within
d.f.
1
2n-2
Sum of
squares
SSA
Squared
(squared difference
differenc in means
e in
means
multiplie
d by n)
SSW
Pooled
variance
equivalent to
numerator of
pooled
variance
Total
2n-1
variation
Mean
Sum of
Squares
TSS
F-statistic
n( X Y ) 2
sp2
Go to
(X Y )
sp
n
sp
n
p-value
) (t 2 n 2 )
F1,2n2
Chart
notice
values
are just
(t2n2)2
n
X n Yn 2
X Y
SSB n (X n (
)) n (Yn ( n n )) 2
2
2
i 1
i 1
n
X n Yn 2
Y X
n (
) n ( n n )2
2
2
2
i 1
i 1 2
SSB algebra
X n 2 Yn 2
X *Y
Y
X
X *Y
) ( ) 2 n n ( n )2 ( n )2 2 n n )
2
2
2
2
2
2
2
2
n( X n X n * Yn Yn ) n( X n Yn ) 2
n((
Source of
variation
Between
(2 groups)
Within
d.f.
1
2n-2
Sum of
squares
SSB
Squared
(squared difference
differenc in means
e in
times n
means
multiplie
d by n)
SSW
Pooled
variance
equivalent to
numerator of
pooled
variance
Total
2n-1
variation
Mean
Sum of
Squares
TSS
F-statistic
p-value
Go to
n( X Y )
sp
(X Y )
sp
n
sp
n
) 2 (t 2 n 2 ) 2
F1,2n2
Chart
notice
values
are just
(t2n2)2
Chapter Goals
After completing this chapter, you should be
able to:
Chapter Goals
(continued)
Introduction to
Regression Analysis
Types of Relationships
Linear relationships
Y
Curvilinear relationships
Y
X
Y
X
Y
Types of Relationships
(continued)
Strong relationships
Y
Weak relationships
Y
X
Y
X
Y
Types of Relationships
(continued)
No relationship
Y
X
Y
Population
Slope
Coefficient
Independent
Variable
Random
Error
term
Yi 0 1Xi i
Linear component
Random Error
component
(continued)
Yi 0 1Xi i
Observed Value
of Y for Xi
Predicted Value
of Y for Xi
Slope = 1
Random Error
for this Xi value
Intercept = 0
Xi
Estimate of
the regression
Estimate of the
regression slope
intercept
Yi b0 b1Xi
Value of X for
observation i
2
2
Interpretation of the
Slope and the Intercept
Square Feet
(X)
245
1400
312
1600
279
1700
308
1875
199
1100
219
1550
405
2350
324
2450
319
1425
255
1700
Graphical Presentation
Excel Output
Regression Statistics
Multiple R
0.76211
R Square
0.58082
Adjusted R Square
0.52842
Standard Error
41.33032
Observations
ANOVA
10
df
SS
MS
F
11.0848
Regression
18934.9348
18934.9348
Residual
13665.5652
1708.1957
Total
32600.5000
Coefficients
Intercept
Square Feet
Standard Error
t Stat
P-value
Significance F
0.01039
Lower 95%
Upper 95%
98.24833
58.03348
1.69296
0.12892
-35.57720
232.07386
0.10977
0.03297
3.32938
0.01039
0.03374
0.18580
Graphical Presentation
Intercept
= 98.248
Interpretation of the
Intercept, b0
house price 98.24833 0.10977 (square feet)
Interpretation of the
Slope Coefficient, b1
house price 98.24833 0.10977 (square feet)
Predictions using
Regression Analysis
Predict the price for a house
with 2000 square feet:
Do not try to
extrapolate
beyond the range
of observed Xs
Measures of Variation
SST
SSR
Total Sum of
Squares
Regression Sum
of Squares
SST ( Yi Y )2
SSR ( Yi Y )2
SSE
Error Sum of
Squares
SSE ( Yi Yi )2
where:
Y
i = Predicted value of Y for the given Xi value
Measures of Variation
(continued)
Measures of Variation
(continued)
Y
Yi
SSE = (Yi - Yi )
Xi
_
Y
Coefficient of Determination, r2
SST
total sum of squares
2
note:
0 r 1
2
Examples of Approximate
r2 Values
Y
r2 = 1
r2 = 1
r =1
2
Examples of Approximate
r2 Values
Y
0 < r2 < 1
Examples of Approximate
r2 Values
r2 = 0
No linear relationship
between X and Y:
r2 = 0
Excel Output
SSR 18934.9348
r
0.58082
SST 32600.5000
2
Regression Statistics
Multiple R
0.76211
R Square
0.58082
Adjusted R Square
0.52842
Standard Error
41.33032
Observations
ANOVA
10
df
SS
MS
F
11.0848
Regression
18934.9348
18934.9348
Residual
13665.5652
1708.1957
Total
32600.5000
Coefficients
Intercept
Square Feet
Standard Error
t Stat
P-value
Significance F
0.01039
Lower 95%
Upper 95%
98.24833
58.03348
1.69296
0.12892
-35.57720
232.07386
0.10977
0.03297
3.32938
0.01039
0.03374
0.18580
S YX
SSE
n2
(
Y
Y
)
i i
i1
Where
SSE = error sum of squares
n = sample size
n2
Excel Output
Regression Statistics
Multiple R
0.76211
R Square
0.58082
Adjusted R Square
0.52842
Standard Error
41.33032
Observations
ANOVA
S YX 41.33032
10
df
SS
MS
F
11.0848
Regression
18934.9348
18934.9348
Residual
13665.5652
1708.1957
Total
32600.5000
Coefficients
Intercept
Square Feet
Standard Error
t Stat
P-value
Significance F
0.01039
Lower 95%
Upper 95%
98.24833
58.03348
1.69296
0.12892
-35.57720
232.07386
0.10977
0.03297
3.32938
0.01039
0.03374
0.18580
small s YX
large s YX
Assumptions of Regression
Normality of Error
Homoscedasticity
Independence of Errors
Residual Analysis
ei Yi Yi
Not Linear
residuals
residuals
Linear
x
Non-constant variance
residuals
residuals
Constant variance
residuals
residuals
residuals
Independent
X
Residuals
251.92316
-6.923162
273.87671
38.12329
284.85348
-5.853484
304.06284
3.937162
218.99284
-19.99284
268.38832
-49.38832
356.20251
48.79749
367.17929
-43.17929
254.6674
64.33264
10
284.85348
-29.85348
Measuring Autocorrelation:
The Durbin-Watson Statistic
Autocorrelation
(e e
i 2
2
e
i
i1
i 1
Inconclusive
dL
Do not reject H0
dU
Excel/PHStat output:
Durbin-Watson Calculations
Sum of Squared
Difference of Residuals
3296.18
Sum of Squared
Residuals
3279.98
Durbin-Watson
Statistic
1.00494
n
(e e
i 2
ei
i 1
i1
)2
3296.18
1.00494
3279.98
(continued)
(continued)
Inconclusive
dL=1.29
Do not reject H0
dU=1.45
S YX
Sb1
SSX
S YX
(X X)
where:
Sb1
S YX
SSE
Excel Output
Regression Statistics
Multiple R
0.76211
R Square
0.58082
Adjusted R Square
0.52842
Standard Error
Observations
ANOVA
Sb1 0.03297
41.33032
10
df
SS
MS
F
11.0848
Regression
18934.9348
18934.9348
Residual
13665.5652
1708.1957
Total
32600.5000
Coefficients
Intercept
Square Feet
Standard Error
t Stat
P-value
Significance F
0.01039
Lower 95%
Upper 95%
98.24833
58.03348
1.69296
0.12892
-35.57720
232.07386
0.10977
0.03297
3.32938
0.01039
0.03374
0.18580
small Sb1
large Sb1
Test statistic
b1 1
t
Sb1
d.f. n 2
where:
b1 = regression slope
coefficient
1 = hypothesized slope
Sb1 = standard
error of the slope
Square Feet
(x)
245
1400
312
1600
279
1700
308
1875
199
1100
219
1550
405
2350
324
2450
319
1425
255
1700
H1: 1 0
Coefficients
Intercept
Square Feet
b1
Standard Error
Sb1
t Stat
P-value
98.24833
58.03348
1.69296
0.12892
0.10977
0.03297
3.32938
0.01039
b1 1 0.10977 0
t
3.32938
t
Sb1
0.03297
H1: 1 0
Coefficients
Intercept
Square Feet
d.f. = 10-2 = 8
a/2=.025
Reject H0
a/2=.025
Do not reject H0
-t/2
-2.3060
Reject H
0
t/2
2.3060 3.329
b1
Standard Error
Sb1
t Stat
P-value
98.24833
58.03348
1.69296
0.12892
0.10977
0.03297
3.32938
0.01039
Decision:
Reject H0
Conclusion:
There is sufficient evidence
that square footage affects
house price
P-value = 0.01039
H0: 1 = 0
H1: 1 0
Coefficients
Intercept
Square Feet
P-value
Standard Error
t Stat
P-value
98.24833
58.03348
1.69296
0.12892
0.10977
0.03297
3.32938
0.01039
F Test statistic:
where
MSR
F
MSE
MSR
SSR
k
MSE
SSE
n k 1
Excel Output
Regression Statistics
Multiple R
0.76211
R Square
0.58082
Adjusted R Square
0.52842
Standard Error
41.33032
Observations
ANOVA
MSR 18934.9348
F
11.0848
MSE 1708.1957
10
df
MS
F
11.0848
Regression
18934.9348
18934.9348
Residual
13665.5652
1708.1957
Total
32600.5000
Coefficients
Intercept
Square Feet
Standard Error
P-value for
the F-Test
t Stat
P-value
Significance F
0.01039
Lower 95%
Upper 95%
98.24833
58.03348
1.69296
0.12892
-35.57720
232.07386
0.10977
0.03297
3.32938
0.01039
0.03374
0.18580
Test Statistic:
H 0 : 1 = 0
MSR
F
11.08
MSE
H 1 : 1 0
= .05
df1= 1
df2 = 8
Decision:
Reject H0 at = 0.05
Critical
Value:
F = 5.32
Conclusion:
=.
05
Do not
reject H0
Reject H0
F.05 = 5.32
b1 t n2Sb1
d.f. = n - 2
Standard Error
t Stat
P-value
Lower 95%
Upper 95%
98.24833
58.03348
1.69296
0.12892
-35.57720
232.07386
0.10977
0.03297
3.32938
0.01039
0.03374
0.18580
(continued)
Coefficients
Intercept
Square Feet
Standard Error
t Stat
P-value
Lower 95%
Upper 95%
98.24833
58.03348
1.69296
0.12892
-35.57720
232.07386
0.10977
0.03297
3.32938
0.01039
0.03374
0.18580
Hypotheses
H0: = 0 (no correlation between X and Y)
HA: 0 (correlation exists)
Test statistic
r -
t (with
n 2 degrees of freedom)
2
where
1 r
r r
n2
if b1 0
r r 2 if b1 0
(No correlation)
H1: 0
(correlation exists)
=.05 , df = 10 - 2 = 8
r
1 r 2
n2
.762 0
1 .762 2
10 2
3.33
r
1 r 2
n2
.762 0
1 .762 2
10 2
3.33
Conclusion:
There is
evidence of a
linear association
at the 5% level of
significance
d.f. = 10-2 = 8
a/2=.025
Reject H0
-t/2
-2.3060
a/2=.025
Do not reject H0
Reject H0
t/2
2.3060
Decision:
Reject H0
3.33
Y = b0+b1Xi
Prediction Interval
for an individual Y,
given Xi
Xi
1 (Xi X)2 1
(Xi X)2
hi
n
SSX
n (Xi X)2
Y t n-2S YX
1
(Xi X)2
317.85 37.12
2
n (Xi X)
Y t n-1S YX
1
(Xi X)2
1
317.85 102.28
2
n (Xi X)
In Excel, use
PHStat | regression | simple linear regression
Check the
confidence and prediction interval for X=
box and enter the X-value and confidence level
desired
(continued)
Input values
Y
Confidence Interval Estimate for Y|X=Xi
Prediction Interval Estimate for YX=Xi
(continued)
Chapter Summary
Chapter Summary
(continued)
Chap 12-79
Chapter Goals
After completing this chapter, you should be
able to:
Population slopes
Random Error
Yi 0 1X1i 2 X 2i k Xki
Estimated
intercept
Yi b0 b1X1i b 2 X 2i bk Xki
In this chapter we will always use Excel to obtain the
regression slope coefficients and other regression
summary measures.
pe
o
Sl
e
bl
a
i
ar
v
r
fo
X1
Y b0 b1X1 b 2 X 2
X1
varia
r
o
f
e
lo p
X2
ble X 2
Example:
2 Independent Variables
Dependent variable:
Pie sales (units per week)
Independent variables: Price (in $)
Advertising ($100s)
Pie
Sales
Price
($)
Advertising
($100s)
350
5.50
3.3
460
7.50
3.3
350
8.00
3.0
430
8.00
4.5
350
6.80
3.0
380
7.50
4.0
430
4.50
3.0
470
6.40
3.7
450
7.00
3.5
10
490
5.00
4.0
11
340
7.20
3.5
12
300
7.90
3.2
13
440
5.90
4.0
14
450
5.00
3.5
15
300
7.00
2.7
Sales = b0 + b1 (Price)
+ b2 (Advertising)
Excel:
PHStat:
0.72213
R Square
0.52148
Adjusted R Square
0.44172
Standard Error
47.46341
Observations
ANOVA
Regression
15
df
SS
MS
29460.027
14730.013
Residual
12
27033.306
2252.776
Total
14
56493.333
Coefficients
Standard Error
Intercept
306.52619
114.25389
2.68285
0.01993
57.58835
555.46404
Price
-24.97509
10.83213
-2.30565
0.03979
-48.57626
-1.37392
74.13096
25.96732
2.85478
0.01449
17.55303
130.70888
Advertising
t Stat
6.53861
Significance F
P-value
0.01201
Lower 95%
Upper 95%
b1 = -24.975: sales
will decrease, on
average, by 24.975
pies per week for
each $1 increase in
selling price, net of
the effects of changes
due to advertising
Predicted sales
is 428.62 pies
Predictions in PHStat
Check the
confidence and
prediction interval
estimates box
Predictions in PHStat
(continued)
Input values
<
Predicted Y value
<
Coefficient of
Multiple Determination
2
Y .12..k
SST
total sum of squares
Multiple Coefficient of
Determination
Regression Statistics
Multiple R
0.72213
R Square
0.52148
Adjusted R Square
0.44172
Standard Error
Regression
15
df
SSR 29460.0
.52148
SST 56493.3
52.1% of the variation in pie sales
is explained by the variation in
price and advertising
47.46341
Observations
ANOVA
2
Y.12
(continued)
SS
MS
29460.027
14730.013
Residual
12
27033.306
2252.776
Total
14
56493.333
Coefficients
Standard Error
Intercept
306.52619
114.25389
2.68285
0.01993
57.58835
555.46404
Price
-24.97509
10.83213
-2.30565
0.03979
-48.57626
-1.37392
74.13096
25.96732
2.85478
0.01449
17.55303
130.70888
Advertising
t Stat
6.53861
Significance F
P-value
0.01201
Lower 95%
Upper 95%
Adjusted r2
Adjusted r2
(continued)
n 1
r 1 (1 r
)
1
n kvariables)
k = number of independent
(where n = sample size,
2
adj
2
Y .12..k
Adjusted r2
(continued)
Regression Statistics
Multiple R
0.72213
R Square
0.52148
Adjusted R Square
0.44172
Standard Error
47.46341
Observations
ANOVA
Regression
15
df
2
radj
.44172
MS
29460.027
14730.013
Residual
12
27033.306
2252.776
Total
14
56493.333
Coefficients
Standard Error
Intercept
306.52619
114.25389
2.68285
0.01993
57.58835
555.46404
Price
-24.97509
10.83213
-2.30565
0.03979
-48.57626
-1.37392
74.13096
25.96732
2.85478
0.01449
17.55303
130.70888
Advertising
t Stat
6.53861
Significance F
P-value
0.01201
Lower 95%
Upper 95%
Y b0 b1X1 b 2 X 2
<
Residual =
ei = (Yi Yi)
Y
Yi
<
Yi
x2i
X1
<
x1i
X2