Course 6 Econometrics Regression

Purpose of Regression and
Correlation Analysis
Regression Analysis is Used Primarily for
Prediction
The Simple Linear

Regression Model
A statistical model used to predict the values of a

dependent or response variable based on values of
at least one independent or explanatory variable
Correlation Analysis is Used to Measure

Strength of the Association Between
Numerical Variables
Types of Regression Models

The Scatter Diagram
Positive Linear Relationship
Relationship NOT Linear
Plot of all (Xi , Yi) pairs

Axis
100
Title
50
Negative Linear Relationship
Axis Title
0
20
40
60
Error Variable: Required

Conditions
Simple Linear Regression

Model
Relationship Between Variables Is a Linear Function
The Straight Line that Best Fit the Data
Random
Error
Y intercept
Yi 0 1 X i i
Dependent
(Response)
Variable
No Relationship
Slope
Independent
(Explanatory)
Variable
The error is a critical part of the regression

model.
Four requirements involving the distribution of
must be satisfied.
The probability distribution of is normal.
The mean of is zero: E() = 0.
The standard deviation of is s for all values of x.
The set of errors associated with different values of y
are all independent.

6
Sample Linear Regression

Model
Population
Linear Regression Model
Y
Yi 0 1X i i
Observed
Value
i = Random Error
m
YX
0 1X i
Y i b0 b1X
Yi
= Predicted Value of Y for observation i
Xi
= Value of X for observation i
b0
= Sample Y - intercept used as estimate of

the population 0
b1 = Sample Slope used as estimate of the
population 1
X
Observed Value
REGRESSION COEFFICIENTS
To calculate the estimates of the
coefficients
that minimize the differences
between the data
points and the line, use the
formulas:
b1
The regression equation that

estimates
the equation of the first
order linear model
is:
To calculate the estimates of the coefficients that

minimize the differences between the data points and
the line, use the formulas ( least squares method):
b1
cov(X, Y)
y b 0 b1x
s 2x
n X iYi ( X i )( Yi )
n( X i2 ) ( X i ) 2
et b0 Y b1 X
EXCEL offers several approaches to regression,
b 0 y b1 x
including trendlines, regression functions and the

regression analysis tool
9
You wish to examine the

relationship between the
square footage of produce
stores and its annual sales.
Sample data for 7 stores
were obtained. Find the
equation of the straight
line that fits the data best
Store
Square
Feet
Annual
Sales
($000)
1
2
3
4
5
6
7
1,726
1,542
2,816
5,555
1,292
2,208
1,313
3,681
3,395
6,653
9,543
3,318
5,563
3,760
Scatter Diagram Example

12000
Annua l Sa le s ($000)
Simple Linear Regression

Equation: Example
10000
8000
6000
4000
2000
0
0
Excel Output
1000
2000
3000
4000
S q u a re F e e t
5000
6000
Graph of the Best

Straight Line
Equation for the Best

Straight Line
Y i b0 b1 X i
12000
1636 . 415 1 . 487 X i

From Excel Printout:
C o e ffi c i e n ts
I n te r c e p t
1 6 3 6 .4 1 4 7 2 6
10000
8000
6000
4000
2000
0
0
1000
2000
3000
4000
5000
6000
X V a ria b le 1 1 .4 8 6 6 3 3 6 5 7
S q u a re F e e t
Interpreting the Results
Inferences about the Slope: t

Test
t Test for a Population Slope
Yi = 1636.415 +1.487Xi
Is a Linear Relationship Between X & Y ?

Null and Alternative Hypotheses
H0: 1 = 0 (No Linear Relationship)
H1: 1 0 (Linear Relationship)
The slope of 1.487 means for each increase of one

unit in X, the Y is estimated to increase 1.487units.
For each increase of 1 square foot in the size of the
store, the model predicts that the expected annual
sales are estimated to increase by $1487.
Test Statistic: t
b1 1
Where Sb
1
S b1
SYX
n
2
( Xi X )
i 1
and df = n - 2
Graph of the Best

Straight Line
Standard Error of Estimate
( Yi Yi )
i 1
12000
n2
The standard deviation of the variation of

observations around the regression line
Syx
SSE
n2
10000
8000
6000
4000
2000
0
0
1000
2000
3000
4000
S q u a re F e e t
5000
6000
Inferences about the

Slope: t Test Example
Example: Produce Stores

Data for 7 Stores:
Store
1
2
3
4
5
6
7
Square
Feet
Annual
Sales
($000)
1,726
1,542
2,816
5,555
1,292
2,208
1,313
3,681
3,395
6,653
9,543
3,318
5,563
3,760
Yi = 1636.415 +1.487Xi
The slope of this model
is 1.487.
Is there a linear
relationship between the
square footage of a store
and its annual sales?
Reject
X V a r i a b l e 11 . 0 6 2 4 9 0 3 7
1.91077694
Conclusion: There is a significant linear relationship
between annual sales and the size of the store.
SSE =(Yi - Yi )2
Decision:
Reject H0
Conclusion:
There is evidence of a
linear relationship.
SST = Total Sum of Squares

measures_the variation of the Yi values around their
mean Y
SSE = Error Sum of Squares

variation attributable to factors other than the
relationship between X and Y
Excel Output for Produce Stores

df
SST = (Yi - Y)2

_
SSR = (Yi - Y)2
_
Y
SS
R e g r e ssi o n
30380456.12
R e si d u a l
1871199.595
T o ta l
32251655.71
SSR
Xi
0.0002812
Measures of Variation
The Sum of Squares: Example
Measures of Variation: The

Sum of Squares
_
9.009944
X V a ria b le 1
explained variation attributable to the relationship

between X and Y
At 95% level of Confidence The confidence Interval for the

slope is (1.062, 1.911). Does not include 0.
0.0151488
SSR = Regression Sum of Squares
U p p er 95%
2797.01853
P-valu e
3.6244333
Measures of Variation:
The Sum of Squares
Excel Printout for Produce Stores

475.810926
t S tat
I n te r c e p t
.025
-2.5706 0 2.5706
Confidence Interval Estimate of the Slope

b1 tn-2 Sb1
L o w er 95%
From Excel Printout
Reject
.025
Inferences about the Slope:

Confidence Interval Example
I n te r c e p t
Test Statistic:
H0: 1 = 0
H 1: 1 0
a .05
df 7 - 2 = 7
Critical Value(s):
Regression
Model Obtained:
SSE
SST
ANOVA - Summary Table

Testing the validity of the model
We pose the question:
Source of Degrees Sum of

of
Squares
Variation
Freedom
Is there at least one independent variable linearly

related to the dependent variable?
To answer the question we test the hypothesis
H0 : 1 = 0
H1: At least one i is not equal to 0
If at least one i is not equal to zero, the model is valid.
Explained
(Factor)
k-1
SSR
Within
(Error)
n-k
SSE
Total
n-1
SST =
SSR+SSE
Mean
F Test
Square Statistic
(Variance)
MSR
=
MSR =
MSE
SSR/(k - 1)
MSE =
SSE/(n - k)
The Coefficient of
Determination
To test these hypotheses we perform an analysis

of variance procedure.
The F test
Construct the F statistic
MSR=SSR/k-1
[Variation in y] = SSR + SSE.
Large F results from a large SSR.
Then, much of the variation in y is
explained
by the regression
model.
Rejection
region
The null hypothesis should
be rejected; thus, the model is valid.
MSR
F
MSE
MSE=SSE/(n-k)
F >Fa,k,n-k
r2 =
SSR
SST
^=b +b X
Y
i
0
1 i
Measures of Variation:
Example
Excel Output for Produce Stores
R e g r e ssi o n S ta ti sti c s
Y r2 = 1, r = -1
^=b +b X
Y
i
0
1 i
M u lt ip le R
X
Y
r2 = 0, r = 0
^=b +b X
Y
i
0
1 i
X
0 .9 7 0 5 5 7 2
R S q u a re
0 .9 4 1 9 8 1 2 9
A d ju s t e d R S q u a re
0 .9 3 0 3 7 7 5 4
S t a n d a rd E rro r
6 1 1 .7 5 1 5 1 7
O b s e r va t i o n s
r2 = .94
^=b +b X
Y
i
0
1 i
X
total sum of squares
Required conditions
must be satisfied.
X
Yr2 = .8, r = +0.9
regression sum of squares
Measures the proportion of variation that is

explained by the independent variable X in
the regression model
Coefficients of Determination
(r2) and Correlation (r)
Y r2 = 1, r = +1
94% of the variation in annual sales can be

explained by the variability in the size of the
store as measured by square footage
Syx
Estimation of
Predicted Values
Estimation of
Predicted Values
Confidence Interval Estimate for mXY
Confidence Interval Estimate for

Individual Response Yi at a Particular Xi
The Mean of Y given a particular Xi

Size of interval vary according to
distance away from mean, X.
Standard error
of the estimate
Y i t n 2 Syx
t value from table
with df=n-2
1
( X X )2
n i
n ( X X )2
i
Addition of this 1 increased width of

interval from that for the mean Y
1
( X X )2
Y i t n 2 Syx 1 n i
n ( X X )2
i
i 1
i 1
Interval Estimates for

Different Values of X
Y
Example: Produce Stores
Confidence
Interval for the
mean of Y
Confidence Interval
for a individual Yi
_
X
Data for 7 Stores:

Store
Square
Feet
Annual
Sales
($000)
1
2
3
4
5
6
7
1,726
1,542
2,816
5,555
1,292
2,208
1,313
3,681
3,395
6,653
9,543
3,318
5,563
3,760
Confidence Interval Estimate for Individual Y

Find the 95% confidence interval for the average annual sales
for stores of 2,000 square feet
Predicted Sales Yi = 1636.415 +1.487Xi = 4610.45 ($000)

SYX = 611.75
1
( X X )2
Y i t n 2 Syx
n i
n
( X i X )2
i 1
Regression Model Obtained:
Yi = 1636.415 +1.487Xi
A Given X
Estimation of Predicted
Values: Example
X = 2350.29
Predict the annual

sales for a store with
2000 square feet.
tn-2 = t5 = 2.5706
= 4610.45 980.97
Confidence interval for mean Y
Estimation of Predicted
Values: Example
Confidence Interval Estimate for mXY
Find the 95% confidence interval for annual sales of one
particular store of 2,000 square feet
Predicted Sales Yi = 1636.415 +1.487Xi = 4610.45 ($000)

X = 2350.29
SYX = 611.75
tn-2 = t5 = 2.5706
1
( X X )2
Y i t n 2 Syx 1 n i
= 4610.45 1853.45
n ( X X )2
i
i 1
Confidence interval for individual

Y

Course 6 Econometrics Regression

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Course 6 Econometrics Regression

Transféré par

Droits d'auteur :

Formats disponibles

Purpose of Regression and

The Simple Linear

A statistical model used to predict the values of a

Correlation Analysis is Used to Measure

Types of Regression Models

Relationship NOT Linear

Plot of all (Xi , Yi) pairs

Negative Linear Relationship

Error Variable: Required

Simple Linear Regression

The error is a critical part of the regression

are all independent.

Sample Linear Regression

= Predicted Value of Y for observation i

= Value of X for observation i

= Sample Y - intercept used as estimate of

The regression equation that

To calculate the estimates of the coefficients that

EXCEL offers several approaches to regression,

including trendlines, regression functions and the

You wish to examine the

Scatter Diagram Example

Simple Linear Regression

Graph of the Best

Equation for the Best

1636 . 415 1 . 487 X i

Interpreting the Results

Inferences about the Slope: t

Is a Linear Relationship Between X & Y ?

The slope of 1.487 means for each increase of one

Graph of the Best

Standard Error of Estimate

The standard deviation of the variation of

Inferences about the

Example: Produce Stores

Conclusion: There is a significant linear relationship

between annual sales and the size of the store.

SST = Total Sum of Squares

SSE = Error Sum of Squares

Excel Output for Produce Stores

SST = (Yi - Y)2

Measures of Variation: The

explained variation attributable to the relationship

At 95% level of Confidence The confidence Interval for the

SSR = Regression Sum of Squares

Excel Printout for Produce Stores

Confidence Interval Estimate of the Slope

From Excel Printout

Inferences about the Slope:

ANOVA - Summary Table

Source of Degrees Sum of

Is there at least one independent variable linearly

To test these hypotheses we perform an analysis

total sum of squares

regression sum of squares

Measures the proportion of variation that is

94% of the variation in annual sales can be

Confidence Interval Estimate for

The Mean of Y given a particular Xi

Addition of this 1 increased width of

Interval Estimates for

Example: Produce Stores

Data for 7 Stores:

Confidence Interval Estimate for Individual Y