Vous êtes sur la page 1sur 109

Demand Estimation

&
Forecasting
Definition of Elasticity of demand
Price Elasticity of demand:


Income Elasticity:


Cross Price Elasticity:


|
|
.
|

\
|
A
A
=
q
p
p
q
e
p
|
|
.
|

\
|
A
A
=
q
I
I
q
e
I
|
|
.
|

\
|
A
A
=
q
p
p
q
e
r
r
p
r
Interpreting the Price Elasticity of
Demand: How Elastic Is Elastic?

Demand is elastic if the price elasticity of
demand is greater than 1

Inelastic if the price elasticity of demand is
less than 1, and

Unit-elastic if the price elasticity of
demand is exactly 1.

Highway department
charges for crossing a
bridge
Nature of goods according to Income
elasticity
e
I
>0 => Normal Goods

e
I
< 0 => Inferior Goods

e
I
<1 => Necessities

e
I
>1 => Luxury Goods


Cross-Price Elasticity
Goods are substitutes when the cross-price
elasticity of demand is positive
e.g. Coke & Pepsi, Zen & Santro

Goods are complements when the cross-price
elasticity of demand is negative
e.g. tea & sugar, petrol & petrol-driven car

Alcoholic Beverages elasticities (e)
Many public policy issues are related to the
consumption of alcoholic beverages

Spirits refer to all beverages that contain
alcohol other than beer & wine

Price elasticity (e
pb
) of dd for beer -0.23
Cross-price (e
pb,

pw
) 0.31
Cross-price (e
pb,

ps
) 0.15
Income elasticity (e
Ib
) -0.09
Income elasticity (e
Iw
) 5.03
Income elasticity (e
Is
) 1.21
Alcoholic Beverages elasticities (e)
Demand for beer inelastic
10% increase in beer price will result in 2.3% decrease in
beer demand
Wine & spirit are substitutes for beer
A 10% increase in wine price will result in 3.1% increase in
the quantity of beer demanded
Similarly for spirit, a 10% increase will increase 1.5%
increase in quantity of beer demand
Beer is an inferior good
10% increase in income will result in 0.9% decline in
quantity of beer demanded
Both wine & spirit are luxury goods as income
elasticities are >1
Information about demand is essential for
making pricing and production decisions

A knowledge of future demand conditions can
also be extremely useful for managers due to
huge financial implications

Large business firms pioneered the use of
empirical demand functions and forecasts for
future production-pricing decisions
Determinants of Demand
Consumer Income (more purchasing power)
Price of the product
The prices of related goods
Substitute Goods (e.g. petrol vs. diesel)
Complementary Goods (diesel car & diesel sale)
Consumer expectations of future price & income
Population & growth of the economy
Consumer tastes and preferences
Demand=f(Y, Pr, Po, ..)
Methods of Demand Estimation
Interview and Experimental Methods
Expert Opinion

Consumer Interviews/ surveys
Interviews can solicit useful information when market data
is scarce.
Sample selection to represent consumer population & skill
of surveyors are important

Market Experiments
Controlled experiments in test markets can generate useful
insights
Advantage over surveys as it reflect actual consumer
behavior
Experiments can be expensive
General Empirical Demand Specification
Q = f(P, M, P
R
N)
Where,
Q= quantity demanded
P = Price of the good
M = Consumers income
P
R
= Price(s) of the related product(s)
N = Number of buyers

Linear form of the demand function is
Q = a + bP + cM + dP
R
+ eN
We need to know the value of a, b, e..how ??
There are many ways but most common one is
through Regression Analysis


Regression Analysis
Regression analysis is concerned with
the study of the relationship between
one variable called explained or
dependent variable (y) and one or more
other variables called independent or
explanatory variables (x
1,
x
2
x
n
)

Y = f (x
1,
x
2
x
n
)

Methodology for Regression Analysis
Theory
Mathematical model of theory
Econometric model of theory
Data collection
Estimation of econometric model
Hypothesis testing
Forecasting
Using the model for policy purpose
Specification of Mathematical & Econometric Model
Y = B
1
+ B
2
X; Mathematical model (Deterministic)
Y = B
1
+ B
2
X + u Econometric model (Example of
linear regression model)
Y Dependent Variable; X Independent Variable; u Error term
B
1
& B
2
are parameters to be estimated

X
Y
B
2

* * *
* * *
X
Y
B
1


Econometric Model
Actual = systemic part+ random error
Say, Consumption (C) = Function (f) of income
(I) with error (u)
C = f(I) + u
u represents the combined influence on
dependent variable of a large number of
independent variables that are not explicitly
introduced in the regression model
We hope that influence of those omitted or
neglected variables is small and at best
random
Assumptions
The relationship between X & Y is linear
The Xs are non-stochastic variables whose
values are fixed
The error has zero expected value; E(u)=0
The error term has constant variance; E(u
2
) =
2
homoscedastic
Errors are statistically independent.
Thus, E(u
i
u
j
)=0 for all i j no auto
correlation
The error term is normally distributed;
u ~ N (0,
2
)
u
i
X
i
= 0 u & X are uncorrelated
Y~ N (B
1
+ B
2
X,
2
)



Linearity Assumption
The term linear in a simple regression model does not mean a
linear relationship between variables, but a model in which the
parameters enter the model in a linear way





A function is said to be linear in parameter if it
appears with a power of one and is not multiplied or
divided by any other parameters






Useful Functional Form
Linear:


Reciprocal




Log-Log
Useful Functional Form
Log-linear


Linear-log


Log-inverse

Population Regression Function
Let Y represents weekly expenditure on
lottery &

X represents weekly personal disposable
income

For simplicity, we assume a hypothetical
population of 100 players, which has been
divided into 10 PDI classes in increments
of $25 starting with $150 and ending
with $375
Weekly exp on Lotto and weekly PDI
150 175 200
PDI,X
Y, Weekly exp on Lotto
PRL

E(Y/X
i
) = B
1
+ B
2
X
(mathematical)

Y
i
= B
1
+ B
2
X
i
+u
i
(stochastic,
individual values
different from mean
values)

B
1
B
2
parameters
225
u
i

u
i

PRF
For any X value, there are 10 Y values
Also, there is a general tendency for Y
to increase as X increases people with
higher PDI likely to spend more on
lottery.
This will be more clear if we take mean
value of Y corresponding to various Xs
If we connect various mean values of Y,
the resulting line is called PRL
Sample Regression Function (SRF)
In practice, we rarely have population data but only sample from
the population
Suppose we have randomly selected sample of Y values
corresponding to X values
Now we have only one Y corresponding to each X
We cannot say which SRL represent PRL
Can we estimate PRF from sample data?
Y X Y X
18 150 23 150
24 175 18 175
26 200 24 200
23 225 25 225
30 250 28 250
27 275 27 275
34 300 31 300
35 325 29 325

* * * *
* * * * * *
* * * * *
* * * * *
SRL 1
SRL 2
X
Y
SRF

Here, SRL: =b
1
+b
2
X
i
Where , b
1
,b
2
are estimator of E(Y/X
i
), B
1
and B
2

An estimator is a formula that suggests how
we can estimate population parameter

A particular numerical value obtained by the
estimator in an application is an estimate

Stochastic SRF: Y
i
=b
1
+b
2
X
i
+e
i,
e
i
=estimator of
u
i
SRF
Thus, e
i
= Y
i


Granted that SRF is only
approximation of PRF, can we find a
method that will make this
approximation as close as possible?

Or, how should we construct SRF so
that b
1
& b
2
are as close as B
1
& B
2
?


Population & Sample Regression Line
Suppose we would like to estimate demand of
rice in Gurgaon and the demand =f(income)

One way to estimate this is to go each person
in Gurgaon to collect data on income and rice
consumption to estimate the equation
C = B
1
+ B
2
M, where B
1
& B
2
are parameters
to be estimated

Other way is to collect data from a sample of
say 100 people and estimate C = b
1
+ b
2
M
Population & Sample Regression Line
However, for another sample we may get C =
c
1
+ c
2
M and so on
We cannot say which SRL represent PRL
Can we estimate PRF from sample data?
Granted that SRF is only approximation of PRF,
can we find a method that will make this
approximation as close as possible?
Or, how should we construct SRF so that b
1
&
b
2
are as close as B
1
& B
2
?



Estimation of parameters:
Method of Ordinary Least Squares
We have, e
i
= Y
i
= Y
i
- b
1
- b
2
X
i

Objective is to choose b
1
& b
2
so that e
i
are as
small as possible

OLS states that b
1
& b
2
should be chosen in such a
way that RSS in minimum

Thus, minimise e
i
2
= (Y
i
- b
1
- b
2
X
i
)
2

b
2
= x
i
y
i
/ x
i
2
=

b
1
= - b
2



(
t
X -
X
_
) (
t
Y -
Y
_
)/ (
t
X -
X
_
)
2

Estimating coefficients
Consider a firm with a fixed capital stock that has
been rented under a long-term lease for Rs 100 per
production period. Other input of the firms
production process is labor, which can be increased or
decreased depending on the firms needs. So, cost of
the capital input is fixed and cost of labor is variable.
The manager of the firm wants to know the
relationship between output and cost. This will allow
the manager to predict the cost of any specified rate
of output for the next production period

The manager is interested to estimate the
coefficients b
1
and b
2
of the function
Y = b
1
+ b
2
X, where Y is total cost and X
is total output
Estimates
Cost
(Y
t
)
Output
(X
t
)
t
Y -
Y
_

t
X -
X
_
(
t
X -
X
_
)
2
(
t
X -
X
_
) (
t
Y -
Y
_
)
100 0 -137 -12.29 151.04 1645.45
150 5 -87.14 -7.29 53.14 635.25
160 8 -77.14 -4.29 18.4 330.93
240 10 -2.86 -2.29 5.24 -6.55
230 15 -7.14 2.71 7.34 -19.35
370 23 132.86 10.71 114.7 1422.93
410 25 172.86 12.71 161.54 2197.05
Y
_
=
237.14
X
_
=
12.29

(
t
X -
X
_
)
2

= 511.4
(
t
X -
X
_
) (
t
Y -
Y
_
)
=6245.71

Estimates
Y

= 87.08 + 12.21 X
One unit change in X results in 12.21 units change in Y
b
2
= (
t
X - X ) (
t
Y - Y )/ (
t
X - X )
2
= 12.21
b
1
= Y - b
2 X
= 237.14 12.21 (12.29) = 87.08
EVIEWS
Estimates
So far we have estimated b
1
& b
2
using OLS

It is evident that least square estimates are
a function of sample data

Since the data are likely to change from
sample to sample, the estimates will also
change

Therefore, what is needed is some measure of
reliability or precision of the estimators b
1
&
b
2,
which can be measured by standard error
Variances (& SEs) of OLS estimators
(T-2) is called dof, number of independent observations, as we loose 2 dof
to compute b
1
& b
2
in estimating Y(cap)

Computing sources of variation

Y
t
Total
Variation
(
t
Y - Y )
2

t
Y

=
1

b +
2

b X
t
Explained
Variation
(
t
Y

- Y )
2

Unexplained
Variation
(
t
Y -
t
Y

)
2
100 18,807.38 87.08 22,518 166.93
150 7593.38 148.13 7922.78 3.5
160 5950.58 148.76 2743.66 613.06
240 8.18 209.18 781.76 949.87
230 50.98 270.23 1094.95 1618.45
370 17,651.78 357.91 17,100.79 4.37
410 29,880.58 392.33 24,083.94 312.23
Y = 237.14 (
t
Y - Y )
2

=79,942.86

(
t
Y

- Y )
2

=76,245.88
(
t
Y -
t
Y

)
2

=3668.41

Standard error of estimate
Var (b
2
) = [ (
t
Y -
t
Y

)
2
/(T 2)]/ (
t
X - X )
2


= [3668.41/ (7 -2)]/511.4 = 1.4161
se (b
2
) = 4161 . 1 = 1.19
= 87.08 + 12.21 X
(***) (1.19)
- where figures in parentheses are estimated std. errors, which measures
variability of estimates from sample to sample
- t-test is used to determine if there is a significant relationship between
dependent variable and each independent variable
- The test requires that s.e. of the estimated regression coefficient be computed
Hypothesis testing
Say, prior knowledge or expert opinion tells us that true
average price to earning (p/e) ratio in the population of
BSC is 20

Suppose a particular random sample of 30 stocks gives
this estimate as 23

Is the value of 23 statistically different from 20?

Due to sample fluctuations it is possible that 23 may not
statistically different from 20

In this case we may not reject the hypothesis that true
value of p/e is 20

This can be done through hypothesis testing

Hypothesis testing
Suppose someone suggests that X has no effect
in Y

Null hypothesis: H
0
: B
2
= 0

If H
0
is accepted, there is no point in including X
in the model

If X really belongs to the model then one would
expect that H
0
must be rejected against
alternate hypothesis H
1,
which says that

B
2
0

It could be positive or negative

Though in our analysis b
2
0, we should not look
at numerical results alone because of sampling
fluctuations


Statistical evaluation of regression results

This can be done through ttest
t-test: test of statistical significance of each
estimated regression coefficient



b: estimated coefficient
SE
b
: standard error of the estimated coefficient
Rule of 2: if absolute value of t is greater than 2,
estimated coefficient is significant at the 5% level
If coefficient passes t-test, the variable has a true
impact on demand
b

SE
b

t =
CI Vs TOS
In CI approach, we specify a plausible range
of values for the true parameter and find out
if CI includes the hypothesized value of the
parameter
If it does, we do not reject H
o
but if it lies
outside CI, we can reject H
o
In test of significance approach, instead of
specifying a range of values, we pick a specific
value of the parameter suggested by H
o
In practice, whether we use CI approach or
TOS approach of hypothesis testing is a
matter of personal choice and convenience
Test of significance
One property of normal distribution is that
any linear function of normally distributed
variables is itself normally distributed

Since b
1
and b
2
are linear function of u, which
is normally distributed

Therefore, b
1
and b
2
should also be normally
distributed
Test of significance
b
1
~ N (B
1
,
2
1 b
)
b
2
~N (B
2
,
2
2 b
)
Z = (b
2
B
2
)/ se(b
2
) = (b
2
B
2
)/ / (
2
i
x ) ~ N ( o . 1 )
Where x
i
= (X
i
- X )
- Since we dont know , w e h a v e t o u s e t h e e s t i m a t e o f
.
- In that case, (b
2
B
2
)/

/ (
2
i
x ) ~ t
n-2

= estimator (b
2
) hypothesized value (B
2
*
)/se of estimator (b
2
)
- If the absolute value of this ratio is equal to greater than
the table value of t for (n-2) dof, b
2
is said to be
statistically significant
- In our case, t = b
2
/ [se(b
2
)] = 12.21/1.19 = 10.26 > table
value of t stat at 95% confidence interval and at 5 dof,
which is 2.015
- So H
0
: B
2
= 0 is rejected
|
2
|
2
+sd |
2
+2.58sd |
2
-sd |
2
-2.58sd
0.5% 0.5%
5% level
1% level
hypothetical distribution under
0
2 2 0
: | | = H
acceptance region for b
2

0 0 0 0 0
5
b
2

t-statistic
The diagram show the acceptance region and the
rejection regions for a 5% and 1% significance
test.
2.5% 2.5%
Explanatory power of a model
Y X
Y
X
Y
X
Breakdown of total variation
X


X
t

(X
t
,Y
t
)
SRF

Total Variation
(Y
t
- )
(
t
- )=
variation in Y
t
explained
by regression
e
t
=(Y
t
-
t
)
Decomposition of Sum of Squares
(Y
t
- ) = (
t
- ) + (Y
t
-
t
)
After squaring both sides and algebraic
manipulations, we get

TSS = ESS + RSS
2 2 2

( ) ( ) ( )
t t t
Y Y Y Y Y Y = +

2
2
2

( )
( )
t
Y Y
ExplainedVariation
R
Total Variation Y Y

= =

Goodness of fit: R
2


Value of R
2
ranges from 0 to 1

If the regression equation explains none of
the variation of Y
i
(i.e. no relationship
between X & Y), R
2
will be zero

If the equation explains all the variation, R
2

will be one
In general, higher the R
2
value
,
the better the
regression equation
A low R
2
would be indicative of a rather poor
fit
2V Ex

Three Variable Regression
Model
Y
i
= B
1
+B
2
X
2i
+B
3
X
3i_
Nonstochastic form,
PRF

Y
i
= B
1
+B
2
X
2i
+B
3
X
3i
+u
i
stochastic

B
2,
B
3
called partial regression or partial
slope coefficients

B
2
measures the change in mean value of Y,
per unit change in X
2
holding the value of
X
3
constant

Y
i
= b
1
+b
2
X
2i
+b
3
X
3i
+e
i
SRF

Assumptions
Linear relationship

Xs are non-stochastic variables.

No linear relationship exists between two or
more independent variables (no multi-
collinearaity). Ex:X
2i
= 3 +2X
3
Error has zero expected value, constant
variance and normally distributed
RSS = e
2
= (Y
i

i
)
2

= (Y
i
b
1
-b
2
X
2i
-b
3
X
3i
)
2

Testing of hypothesis, t-test
Say,
i
= -1336.09 + 12.7413X
2i
+85.7640X
3i

(175.2725) (0.9123) (8.8019)
p=0.000 0.000 0.000
R
2
= 0.89, n =32

H
0
: B
1
=0, b
1
/se(b
1
)~ t
(n-3)
H
0
: B
2
=0, b
2
/se(b
2
)~ t
(n-3)
H
0
: B
3
=, (b
3
- )

/se(b
3
)~ t
(n-3)



Testing Joint Hypothesis, F Test
H
0
: B
2
= B
3
= 0
Or, H
0
: R
2

= 0

X
2
& X
3
explain zero percent of the
variation of Y

H
1
: At least one B 0

A test of either hypothesis is called a test
of overall significance of the estimated
multiple regression
We know, TSS = ESS + RSS
F test
If computed F value exceeds critical F value, we
reject the null hypothesis that the impact of
explanatory variables is simultaneously equal to zero

Otherwise we cannot reject the null hypothesis

It may happen that not all the explanatory
variables individually have much impact on dependent
variable (i.e., some of the t values may be
statically insignificant) yet all of them collectively
influence dependent variable (H
0
is rejected in F
test)

This happen only we have the problem of
multicollinearity
Specification error
In this example we have seen that
both the explanatory variables are
individually and collectively different
from zero

If we omit any one of these
explanatory variable from our model,
then there would be specification
error
What would be b
1
, b
2
& R
2
in 2-
variable model?
Specification error

i
= -1336.09 + 12.7413X
2i
+85.7640X
3i

(175.2725) (0.9123) (8.8019)
p=0.000 0.000 0.000
R
2
= 0.89, n =32


i
= -191.66 + 10.48X
2
(264.43) (1.79)
R
2
= 0.53


i
= 807.95 + 54.57X
3i

(231.95) (23.57)
R
2
= 0.15
R
2
versus Adjusted R
2

Larger the number of explanatory variables
in the model, the higher the R
2
will be

However, R
2
does not take into account dof

Therefore, comparing R
2
values of the two
models with same dependent variable but
different numbers of explanatory variables
is essentially like comparing apples and
bananas

We need a measure of fit that is adjusted
for the no. of explanatory variables in the
model
R
2
versus Adjusted R
2
Such a measure is called Adj R
2



If k > 1, Adj R
2
R
2,
as the no of explanatory
variables increases in the model, Adj R
2

becomes increasingly smaller than R
2

It enable us to compare two models that have
same dependent variable but different
numbers of independent variables

In our example, it can be shown that
Adj R
2
=0.88 < 0.89 (R
2
)
2 2
( 1)
1 (1 )
( )
n
R R
n k

When to add an additional variable?


We often faced with problem of
deciding among several competing
explanatory variables

Common practice is to add variables as
long as Adj R
2
increases even though its
numerical value may be smaller than R
2

Computer output & Reporting
The Chicken Consumption
Example

Explain US Consumption of Chicken

Time Series Observations - 1950-1984

Variable Definitions
CHCONS - Chicken consumption in the
US

LDY - Log of disposable income in the
US

PC/PB - Price of Chicken relative to the
Price of Best Red Meat
Data Time plots
Actual plots of the data over time
follows
Note the trends and cycles
What are the relationships between
the variables?
Are movements in CHCONS related to
movements in LDY and PC/PB?
0.0
10.0
20.0
30.0
40.0
50.0
60.0
1
9
5
0
1
9
5
2
1
9
5
4
1
9
5
6
1
9
5
8
1
9
6
0
1
9
6
2
1
9
6
4
1
9
6
6
1
9
6
8
1
9
7
0
1
9
7
2
1
9
7
4
1
9
7
6
1
9
7
8
1
9
8
0
1
9
8
2
1
9
8
4
C
H
C
O
N
S
YEAR
Time plot - CHCONS Actual Data
0.0000
1.0000
2.0000
3.0000
4.0000
5.0000
6.0000
7.0000
8.0000
9.0000
10.0000
L
D
Y
Year
Timeplot-LDY Actual Data
0.0000
0.2000
0.4000
0.6000
0.8000
1.0000
1.2000
1.4000
1.6000
1
9
5
0
1
9
5
3
1
9
5
6
1
9
5
9
1
9
6
2
1
9
6
5
1
9
6
8
1
9
7
1
1
9
7
4
1
9
7
7
1
9
8
0
1
9
8
3
P
C
/
P
B
Year
Timeplot-PC/PB Actual Data
Chicken Consumption vs.
Income
There may be a relationship between
CHCONS and LDY

A simple plot of the two variables
seems to reveal this

Note the positive relationship
0.0
10.0
20.0
30.0
40.0
50.0
60.0
7.0000 7.5000 8.0000 8.5000 9.0000 9.5000
C
H
C
O
N
S
LYD
Scatter Plot - CHCONS vs. LYD
Chicken Consumption vs.
Relative Price of Chicken
There may also be a relationship
between CHCONS and PC/PB

A plot of these two variables shows
the relationship

Note the negative relationship
0.0
10.0
20.0
30.0
40.0
50.0
60.0
0.0000 0.2000 0.4000 0.6000 0.8000 1.0000 1.2000 1.4000 1.6000
C
H
C
O
N
S
PC/PB
Scatter Plot - CHCONS vs PC/PB
CHCONS = f(LDY)
Simple linear regression captures the
relationship between CHCONS and
LDY, assuming no other relationships

This regression explains much of the
change in CHCONS, but not everything

The plotted regression line shows the
hypothesized relationship and the
actual data
CHCONS = f(LDY)
LDY Const.
Coeff 15.86 -92.17
SE(b) 0.53 4.34


R
2
= 0.9641 SE(y) = 2.03
F = 879.05 df = 33
SSReg= 3639.12 SSResid = 136.61
(also called SSE) (also called SSR)
0.00
10.00
20.00
30.00
40.00
50.00
60.00
7.0000 7.5000 8.0000 8.5000 9.0000 9.5000
C
H
C
O
N
S
LYD
Regression Line - CHCONS = f(LYD)
CHCONS = f(LYD) Actual Data
CHCONS = f(PC/PB)
Another simple regression examines the
relationship between CHCONS and
PC/PB

While the line explains some of the
variation of CHCONS, there is more
unexplained error
CHCONS = f(PC/PB)
PC/PB Const.
Coeff -28.83 50.77
SEb 2.93 1.75


R
2
= 0.746 SE(y) = 5.39
F = 97.14 df = 33
SSReg = 2818.32 SSResid = 957.42
(also called ESS) (also called RSS)
0.00
10.00
20.00
30.00
40.00
50.00
60.00
0.0000 0.5000 1.0000 1.5000
C
H
C
O
N
S
PC/PB
Regression Line - CHCONS = f(PC/PB)
CHCONS=f(PC/PB) Actual Data
CHCONS = f(LDY,PC/PB)
LDY PC/PB Const.
Coeff 12.79 -8.08 -63.19
SEb 0.54 1.12 4.84


R
2
= .986 SEy = 1.27
F = 1149.89 df = 32
SSReg = 3723.92 SSResid = 51.82
(SSE) (SSR)
0.0
10.0
20.0
30.0
40.0
50.0
60.0
C
H
C
O
N
S
YEAR
Actual vs. Predicted
Actual CHCONS=f(LDY,PC/PB)
Table 7.8 Gujarati: US Defense budget outlays
1962 1981

Y
t
= Defense budget outlays for year t ($ Bn)
X
2t
=GNP for year t ($ Bn)
X
3t
=US military sales/assistance ($ Bn)
X
4t
=Aerospace industry sales ($ Bn)
X
5t
= Military conflicts involving troops
=0, if troops < 100000
=1, if troops > 100000
Table 8.10, Gujarati
Table gives data used by a telephone cable
manufacturer to predict sales to a major
consumer for the period 1968 1983

Y=annual sales in MPF (million paired feet)
X
2
=GNP (billion $)
X
3
=housing starts (1000 of units)
X
4
=Unemployment rate (%)
X
5
=Prime rate lagged 6 months
X
6
= Customer line gains (%)
Introduce later
Table 7.10, Gujarati
Consider following demand function for money
in US for 1980 1998


Where, M = Real money demand
Y = Real GDP
r = Interest rate
LTRATE: Long term interest rate (30
yr tr bond)
TBRATE: 3 months tr bill rate

t
u b
t
b
t t
e r Y b M
3 2
1
=
Regression Problems
Multicollinearity: two or more independent
variables are highly correlated, thus it is
difficult to separate the effect each has on the
dependent variable.

Passing the F-test as a whole, but failing the t-
test for each coefficient is a sign that
multicollinearity exists.

A standard remedy is to drop one of the
closely related independent variables from the
regression


Collinearity
Y
X2
X3
Y
X2
X3
X3
X2 X3
Y
Table 10.7, Gujarati
Y= number of people employed (in 000)
X
1
=GNP implicit price deflator
X
2
=Nominal GNP (million $)
X
3
=number of people unemployed (in 000)
X
4
=number of people in armed force
X
5
=Non-institutionalized population over 14 years
X
6
=year=1for 1947, 2for 1948 and 16 for 1962
Regress and explain the results,
Regress for shorter time-span,
Pair wise correlation,
Regress among Xs,
Real GNP=(X
2
/X
1
),
Drop X
6
as X
5
& X
6
are correlated
Drop X
3


Ex: Petrol demand=f[(car+two), teleden, price]
Regression Problem - Autocorrelation
Lebanon Ex.
Errors are correlated
Observed in time series data
Say, output of a farm regress on capital and labor
on quarterly data and there is a labor strike on a
particular quarter affecting the output in that
particular quarter No autocorrelation


But if the strike affect the output in other quarters
as well Autocorrelation



Reasons
Inertia or sluggishness
agricultural commodities
Supply
t
= B
1
+ B
2
P
t-1
+ u
t


Say at the end of period t, P
t
turns out to be
lower than P
t-1
, so the farmer may decode to
produce less in period (t+1) than t
Data manipulation
Monthly to quarterly data by averaging,
thereby damping the fluctuating of monthly
data
Smoothness leads to systematic pattern in
disturbances


Reasons
Model Specification errors
Omitting relevant variable(s)
Ex: Y=b
1
+b
2
X
2t
+b
3
X
3t
+b
4
X
4t
+u
t
but we run the
regression Y=b
1
+b
2
X
2t
+b
3
X
3t
+ v
t,
where
v
t
=b
4
X
4t
+u
t
If we plot v, it will not be random but exhibit a
systematic pattern creating (false)
autocorrelation
Lags
consumption
t
=b
1
+b
2
income
t
+b
3
consumption
t-1
+ u
t

Autocorrelation
u
t

Time
Consequences of autocorrelation
Least square estimates are linear and
unbiased but they do not have minimum
variance property

t & F statistics are not reliable

R
2
may be an unreliable measure of true
R
2

Detection
Plotting OLS residuals against time
No autocorrelation errors are randomly distributed
Presence of autocorrelation errors exhibit a distinct
behavior
DW statistics (based on estimated residuals)
Others

To correct autocorrelation consider:
Transforming the data into a different order of
magnitude
Introducing lagged data

Ex: Carbon, petro-diesel

Forecasting
Famous forecasting quotes
"I have seen the future and it is very much like the
present, only longer." - Kehlog Albran, The Profit

This nugget of pseudo-philosophy is actually a concise
description of statistical forecasting. We search for statistical
properties of a time series that are constant in time - levels,
trends, seasonal patterns, correlations and autocorrelations, etc.
We then predict that those properties will describe the future as
well as the present.

"Prediction is very difficult, especially if it's about the
future." Niels Bohr, Nobel laureate in Physics

This quote serves as a warning of the importance of validating a
forecasting model out-of-sample. It's often easy to find a model
that fits the past data well--perhaps too well! - but quite another
matter to find a model that correctly identifies those patterns in
the past data that will continue to hold in the future
Precise forecasting (demand, prices etc) should
be an integral part of the planning process

Billions in revenue can be lost if a company
forecast too low and its inventory is sold out

Similarly, a company can incur significant losses if
forecasts are too high and excess inventory builds
up

Thus, a comprehensive knowledge of the
forecasting process is extremely important for
firms success and industrys sustenance

There is an array of empirical methods that
are available today for forecasting

An appropriate method is chosen based on
the availability of the data and the desired
nature of the forecasts

Long term (several years ahead)
Medium-term (quarterly to monthly)
Short-term (daily to hourly to several minutes
ahead)

Forecasting Techniques
Qualitative Analysis
Expert Opinion:
personal insight or panel consensus on future
expectations
Subjective in nature

Survey Methods
Through interview or questionnaires ask firms,
government agencies or individuals about their future
plans
Frequently supplement quantitative forecasts

Forecasting Techniques
Econometric Forecasting
Combine economic theory and statistical tools to
predict future relations
Say, you have estimated following relationship
DD
petrol
= 2.08 + 1.95 GDP 0.87 Pr
petrol
- 0.78 teldn
and you want to forecast petrol demand for
2016 using this relationship. SO
[DD
petrol
]
2016
= 2.08 + 1.95 GDP
2016
0.87 [Pr
petrol
]
2016

- 0.78 [teldn]
2016
We however need to know the future values of
independent variables
Forecasting Techniques
Time Series Techniques

A time series is a sequence of observations taken
sequentially in time

An intrinsic feature of a time series is that,
typically adjacent observations are dependent

The nature of this dependence among
observations of a time series is of considerable
practical interest

Time Series Analysis is concerned with techniques
for the analysis of this dependence

Time series data
Secular Trend: long run pattern

Cyclical Fluctuation: expansion and
contraction of overall economy (business
cycle)

Seasonality: annual sales patterns tied to
weather, traditions, customs

Irregular or random component
0 2 4 6 8 10 12 14 16 18 20
Years

(a)
Sales ($)
Secular trend
Cyclical patterns
Trend & Cyclical Patterns
Trend, Seasonal & Random
Components
Long-run trend
(secular plus cyclical)
peak
peak
Seasonal
pattern
Random
fluctuations
J F M A M J J A S O N D
Months

(b)


Sales ($)
Forecasting Techniques
Time Series Techniques
Examine the past behavior of a time series in
order to infer something about its future behavior

A sophisticated and widely used technique to
forecast the future demand

Examples
Univariate: AR, MA, ARMA, ARIMA, Exponential
Smoothing, ARIMA-GARCH etc.

Multivariate: VAR, Cointegration etc.
Ex-Post vs. Ex-Ante Forecasts
How can we compare the forecast
performance of our model?

There are two ways.

Ex Ante: Forecast into the future, wait for the
future to arrive, and then compare the actual to
the predicted

Ex Post: Fit your model over a shortened sample

Then forecast over a range of observed data
Then compare actual and predicted.
Ex-Post and Ex-Ante
Estimation & Forecast Periods
Suppose you have data covering the
period 1980.Q1-2001.Q4
80.1 99.4 2001.4
Ex-Post Estimation Period
Ex-Post
Forecast
Period
Ex-Ante
Forecast
Period
The
Future
Examining the In-Sample Fit
One thing that can be done, once you
have fit your model is to examine the in-
sample fit
That is, over the period of estimation, you
can compare the actual to the fitted data
It can help to identify areas where your
model is consistently under or over
predicting take appropriate measures
Simply estimate equation and look at
residuals
Model Performance
RMSE =\(1/n(f
i
x
i
)
2
- difference
between forecast and actual summed
smaller the better
MAE & MAPE smaller the better
The Theil inequality coefficient always
lies between zero and one, where zero
indicates a perfect fit.
Bias portion - Should be zero
How far is the mean of the forecast from
the mean of the actual series?


Model Performance
Variance portion - Should be zero
How far is variation of forecast from forecast of
actual series variance?

Covariance portion - Should be one
What portion of forecast error is unsystematic
(not predictable)

If your forecast is "good", the bias and
variance proportions should be small so that
most of the bias should be concentrated on
the covariance proportions

Vous aimerez peut-être aussi