Vous êtes sur la page 1sur 50

Chapter 14

Simple Linear Regression

Simple Linear Regression Model


Least Squares Method
Coefficient of Determination
Model Assumptions
Testing for Significance
Using the Estimated Regression Equation
for Estimation and Prediction
Computer Solution
Residual Analysis: Validating Model
Assumptions

2005 Thomson/South-Western

Simple Linear Regression Model


The equation that describes how y is related to x and
an error term is called the regression model.
The simple linear regression model is:
y = 0 + 1x +

where:
0 and 1 are called parameters of the model,

is a random variable called the error term.

2005 Thomson/South-Western

Simple Linear Regression Equation

The simple linear regression equation is:


E(y) = 0 + 1x

Graph of the regression equation is a straight line.


0 is the y intercept of the regression line.
1 is the slope of the regression line.
E(y) is the expected value of y for a given x value.

2005 Thomson/South-Western

Simple Linear Regression Equation

Positive Linear Relationship


E(y)
E(y)
Regression line
Intercept
0

Slope 1
is positive
xx

2005 Thomson/South-Western

Simple Linear Regression Equation

Negative Linear Relationship


E(y)
E(y)
Intercept Regression line
0
Slope 1
is negative
xx

2005 Thomson/South-Western

Simple Linear Regression Equation

No Relationship
E(y)
E(y)
Regression line
Intercept
0

Slope 1
is 0
xx

2005 Thomson/South-Western

Estimated Simple Linear Regression


Equation

The estimated simple linear regression


equation

y b0 b1x

The graph is called the estimated regression line.


b0 is the y intercept of the line.
b1 is the slope of the line.
the estimated value of y for a given x value.

isy

2005 Thomson/South-Western

Estimation Process
Regression Model
y = 0 + 1x +
Regression Equation
E(y) = 0 + 1x
Unknown Parameters
0, 1

b0 and b1
provide estimates of
0 and 1

2005 Thomson/South-Western

Sample Data:
x
y
x1
y1
.
.
.
.
xn
yn

Estimated
Regression Equation

y b0 b1x

Sample Statistics
b0, b1
8

Least Squares Method

Least Squares Criterion


min (yi y i )2

where:
yi = observed value of the dependent variable
for the ith observation
yi =^estimated value of the dependent variable
for the ith observation

2005 Thomson/South-Western

Least Squares Method

Slope for the Estimated Regression Equation

(x x)(y y)

b
(x x)
i

2005 Thomson/South-Western

10

Least Squares Method

y-Intercept for the Estimated Regression


Equation

b0 y b1x

where:
xi = value of independent variable for ith
observation
yi = value of dependent variable for ith
_ observation
x = mean value for independent variable
_
y = mean value for dependent variable
n = total number of observations
2005 Thomson/South-Western

11

Simple Linear Regression

Example: Reed Auto Sales


Reed Auto periodically has
a special week-long sale.
As part of the advertising
campaign Reed runs one or
more television commercials
during the weekend preceding the sale. Data
from a
sample of 5 previous sales are shown on the next
slide.

2005 Thomson/South-Western

12

Simple Linear Regression

Example: Reed Auto Sales


Number of Number of
TV Ads
Cars Sold
1
14
3
24
2
18
1
17
3
27

2005 Thomson/South-Western

13

Estimated Regression Equation

Slope for the Estimated Regression


Equation
(xi x)(yi y) 20

b1

5
2
4
(xi x)

y-Intercept for the Estimated Regression


Equation b y b x 20 5(2) 10
0

Estimated Regression Equation

y 10 5x

2005 Thomson/South-Western

14

Scatter Diagram and Trend Line


30

Cars Sold

25
20

y = 5x + 10

15
10
5
0
0

2005 Thomson/South-Western

2
TV Ads

15

Coefficient of Determination

Relationship Among SST, SSR, SSE


SST =
SSE

SSR

2
2
2

(
y

y
)

(
y

y
)

(
y

y
)
i
i
i i

where:
SST = total sum of squares
SSR = sum of squares due to regression
SSE = sum of squares due to error

2005 Thomson/South-Western

16

Coefficient of Determination

The coefficient of determination is:


r2 = SSR/SST
where:
SSR = sum of squares due to regression
SST = total sum of squares

2005 Thomson/South-Western

17

Coefficient of Determination
r2 = SSR/SST = 100/114 = .8772
The regression relationship is very strong; 88%
of the variability in the number of cars sold can be
explained by the linear relationship between the
number of TV ads and the number of cars sold.

2005 Thomson/South-Western

18

Sample Correlation Coefficient


rxy (sign of b1 ) Coefficient of Determination

rxy (sign of b1 ) r 2
where:
b1 = the slope of the estimated regression
equation

y b0 b1 x

2005 Thomson/South-Western

19

Sample Correlation Coefficient


rxy (sign of b1 ) r 2
y 10 5is
x +.
The sign of b1 in the equation

rxy =+ .8772

rxy =

2005 Thomson/South-Western

+.9366

20

Assumptions About the Error Term


1.
1. The
The error
error is
is aa random
random variable
variable with
with mean
mean of
of zero.
zero.
2.
2. The
The variance
variance of
of ,, denoted
denoted by
by
22,, is
is the
the same
same for
for
all
all values
values of
of the
the independent
independent variable.
variable.
3.
3. The
The values
values of
of
are
are independent.
independent.
4.
4. The
The error
error is
is aa normally
normally distributed
distributed random
random
variable.
variable.

2005 Thomson/South-Western

21

Testing for Significance


To
To test
test for
for aa significant
significant regression
regression relationship,
relationship, we
we
must
must conduct
conduct aa hypothesis
hypothesis test
test to
to determine
determine whether
whether
the
the value
value of
of 11 is
is zero.
zero.
Two
Two tests
tests are
are commonly
commonly used:
used:
t Test

and

F Test

Both
Both the
the tt test
test and
and FF test
test require
require an
an estimate
estimate of
of 22,,
the
the variance
variance of
of in
in the
the regression
regression model.
model.

2005 Thomson/South-Western

22

Testing for Significance


An Estimate of
The mean square error (MSE) provides the estimate
of 2, and the notation s2 is also used.

s 2 = MSE = SSE/(n 2)
where:

SSE ( yi y i ) 2 ( yi b0 b1 xi ) 2

2005 Thomson/South-Western

23

Testing for Significance

An Estimate of
To estimate we take the square root of 2.
The resulting s is called the standard error of
the estimate.

SSE
s MSE
n2

2005 Thomson/South-Western

24

Testing for Significance: t Test

Hypotheses

H 0: 1 0
H a: 1 0

Test Statistic

b1
t
sb1

2005 Thomson/South-Western

25

Testing for Significance: t Test

Rejection Rule
Reject H0 if p-value <
or t < -tor t > t
where:
t is based on a t distribution
with n - 2 degrees of freedom

2005 Thomson/South-Western

26

Testing for Significance: t Test


1. Determine the hypotheses. H 0: 1 0

H a: 1 0
2. Specify the level of significance. = .05

b1
3. Select the test statistic.t
sb1
4. State the rejection rule.
Reject H0 if p-value < .05
or |t| > 3.182 (with
3 degrees of freedom)

2005 Thomson/South-Western

27

Testing for Significance: t Test


5. Compute the value of the test statistic.

b1
5
t

4.63
sb1 1.08
6. Determine whether to reject H0.
t = 4.541 provides an area of .01 in the upper
tail. Hence, the p-value is less than .02. (Also,
t = 4.63 > 3.182.) We can reject H0.

2005 Thomson/South-Western

28

Confidence Interval for 1


We can use a 95% confidence interval for 1 to test
the hypotheses just used in the t test.
H0 is rejected if the hypothesized value of 1 is not
included in the confidence interval for 1.

2005 Thomson/South-Western

29

Confidence Interval for 1

The form of a confidence interval for 1 is: t / 2sb1

b1 t / 2sb1

is the
margin
of error

b1 is the
point
t / 2 is the t value providing an area
where
estimat
of /2 in the upper tail of a t distribution
or

with n - 2 degrees of freedom

2005 Thomson/South-Western

30

Confidence Interval for 1

Rejection Rule
Reject H0 if 0 is not included in
the confidence interval for 1.

95% Confidence Interval for 1

b1 t / 2=sb15 +/- 3.182(1.08) = 5 +/- 3.44


or

1.56 to 8.44

Conclusion
0 is not included in the confidence interval.
Reject H0

2005 Thomson/South-Western

31

Testing for Significance: F Test

Hypotheses

H 0: 1 0
H a: 1 0

Test Statistic
F = MSR/MSE

2005 Thomson/South-Western

32

Testing for Significance: F Test

Rejection Rule
Reject H0 if
p-value <
or F > F

where:
F is based on an F distribution with
1 degree of freedom in the numerator and
n - 2 degrees of freedom in the denominator

2005 Thomson/South-Western

33

Testing for Significance: F Test


1. Determine the hypotheses. H 0: 1 0

H a: 1 0

2. Specify the level of significance. = .05


3. Select the test statistic.F = MSR/MSE
4. State the rejection rule.
Reject H0 if p-value < .05
or F > 10.13 (with 1 d.f.
in numerator and
3 d.f. in denominator)

2005 Thomson/South-Western

34

Testing for Significance: F Test


5. Compute the value of the test statistic.
F = MSR/MSE = 100/4.667 = 21.43
6. Determine whether to reject H0.
F = 17.44 provides an area of .025 in
the upper tail. Thus, the p-value
corresponding to F = 21.43 is less than
2(.025)
= .05. Hence,
we reject
H0.
The statistical
evidence
is sufficient
to
conclude
that we have a significant relationship
between the
number of TV ads aired and the number of
cars sold.
2005 Thomson/South-Western

35

Some Cautions about the


Interpretation of Significance Tests
Rejecting H0: 1 = 0 and concluding that
the
relationship between x and y is significant
does not enable us to conclude that a causeand-effect
Justrelationship
because weisare
able to
reject Hx0:and
1 =y.0 and
present
between
demonstrate statistical significance does not enable
us to conclude that there is a linear relationship
between x and y.

2005 Thomson/South-Western

36

Using the Estimated Regression Equation


for Estimation and Prediction

Confidence Interval Estimate of


E(yp)
y p t / 2sy p

Prediction Interval Estimate of yp

yp t / 2sind
where:
confidence coefficient is 1 - and
t/2 is based on a t distribution
with n - 2 degrees of freedom

2005 Thomson/South-Western

37

Point Estimation
If 3 TV ads are run prior to a sale, we expect
the mean number of cars sold to be:
y =^ 10 + 5(3) = 25 cars

2005 Thomson/South-Western

38

Confidence Interval for E(yp)

Excels Confidence Interval Output


D
1
2
3
4

E
CONFIDENCE INTERVAL
xp
3
x bar
2.0
x p -x bar
1.0

(x p -x bar)2

6
7
8
9
10
11
12
13

1.0
2

(x p -x bar)
Variance of y hat
Std. Dev of y hat
t Value
Margin of Error
Point Estimate
Lower Limit
Upper Limit

2005 Thomson/South-Western

4.0
2.1000
1.4491
3.1824
4.6118
25.0
20.39
29.61
39

Confidence Interval for E(yp)


The 95% confidence interval estimate of the
mean number of cars sold when 3 TV ads are
run is:
25 + 4.61 = 20.39 to 29.61 cars

2005 Thomson/South-Western

40

Prediction Interval for yp

Excels Prediction Interval Output


1
2
3
4

H
PREDICTION INTERVAL
Variance of y ind
6.76667
Std. Dev. of y ind
2.60128
Margin of Error
8.27845

Lower Limit 16.72

6
7

Upper Limit 33.28

2005 Thomson/South-Western

41

Prediction Interval for yp


The 95% prediction interval estimate of the
number of cars sold in one particular week
when 3 TV ads are run is:
25 + 8.28 = 16.72 to 33.28 cars

2005 Thomson/South-Western

42

Residual Analysis
If the assumptions about the error term appear
questionable, the hypothesis tests about the
significance of the regression relationship and the
interval estimation results may not be valid.
The residuals provide the best information about .
Residual for Observation i

yi yi
Much of the residual analysis is based on an
examination of graphical plots.

2005 Thomson/South-Western

43

Residual Plot Against x

If the assumption that the variance of is the


same for all values of x is valid, and the
assumed regression model is an adequate
representation of the relationship between the
variables, then
The residual plot should give an overall
impression of a horizontal band of points

2005 Thomson/South-Western

44

Residual Plot Against x

Residual

y y

Good Pattern

2005 Thomson/South-Western

45

Residual Plot Against x

Residual

y y

Nonconstant Variance

2005 Thomson/South-Western

46

Residual Plot Against x


y y

Residual

Model Form Not Adequate

2005 Thomson/South-Western

47

Residual Plot Against x

Residuals
Observation

Predicted Cars Sold

Residuals

15

-1

25

-1

20

-2

15

25

2005 Thomson/South-Western

48

Residual Plot Against x


TV Ads Residual Plot

Residuals

2
1
0
-1
-2
-3
0

2005 Thomson/South-Western

TV Ads

49

End of Chapter 14

2005 Thomson/South-Western

50

Vous aimerez peut-être aussi