Regression 290611

INSTITUTE OF MANAGEMENT TECHNOLOGY, GHAZIABAD
IMT
Financial Econometrics Term IV June-Aug 2011
Course Instructor: Dr. Kakali Kanjilal
IMT
Contents
1. Course Overview 2. Data & Basic Statistics 3. Data Classification 4. Analysis of Cross-section data 5. Analysis of qualitative cross section data 6. Analysis of Time Series Data 7. Analysis of Cross section and Time Series Data
Steps involved in econometric analysis

Business Problems
IMT
1. 2. 3. 4.
Translate into a few hypothesis Identify parameters/attributes Inquire on the information /data Identify most suitable model/analytical tool
Estimate the model
1. No 2.
Is the estimation correct? (Specification tests & diagnostic checks) Is the model adequate?
Yes
Leverage the model for predictions & strategic advice
Tests of the hypotheses
Source: Based on Maddala 2001
Data Handling by SAS

1. How to start SAS? 2. How do we get access/bring the data? (Import data file) 3. How do we know the contents of the data? (Proc contents) 4. How can we see the data layout? (Proc Print) 5. How do we make sure data is in desired form? 1. (Proc Means, Proc Univariate, Proc Freq) 6. Do we have all information that is required? 1. Drop/Keep 2. Add/Create Variables 3. Bring/Invoke additional Data 4. Append? 5. Merge/Join
IMT
IMT
Regression Analysis
What is Regression?
1. It means Drop, Falling off, Deterioration etc. in English 2. F. Galton introduced the word Regression . Means tends to move /Regress towards average. 3.
IMT
Today's world, it means statistical relationship of dependent variable on independent variable(s) having fixed values. It predicts/estimates the average value of the dependent variable based on the known fixed values of independent variables.
4. It does not necessarily mean causation. For example, the movements in domestic Stock returns could be dependent on the movements in global crude oil price, however, crude price movements dont the changes in Stock returns.
A Regression Situation
W eek ly con sum ption ex pen d iture & in com e for 6 0 fam ilies where for fix ed values o f X , we get d ifferen t set o f values for Y .
Y 80 55 eekly Family 60 65 70 75
IMT
100 65 70 74 80 85 88
120 79 84 90 94 98
140 80 93 95 103 108 113 115
Consumption Expenditure $$
Total Conditional Mean E (Y/X
325 65
462 77
445 89
707 101
C an we estim ate/pred ict a statistical relatio nship o f Y (d epend ent) fo r given X (ind epend ent?
Som e other E x am p les
1. 2. 3. T he d em an d o f a prod u ct with respect to its ad vertisin g ex pen d itu re . T his relation ship will be able to an aly ze % chan ge in d em an d with respect to 1% chan ge in ad vertisin g ex pen d itu re . T he stock retu rn o f a firm with respect to its sales, n o . of em ploy ees. Price respon siven ess/elasticity o f the d em an d for a prod u ct
eekly Family Income $$ 160 102 107 110 116 118 125 678 113 180 110 115 120 130 135 140 750 125 685 137 200 120 136 140 144 145 220 135 137 140 152 157 160 162 1043 149 966 161 240 137 145 155 165 175 189 260 150 152 175 178 180 185 191 1211 173
Population Regression
IMT
The line E(Y/ ) = f( i )= 1 + 2 i is a population regression function. 1 & 2 are parameters; represent behaviours of 60 families/population . Not observable. Assumes a linear relationship On average , as income increases, expenditure also goes up, although the individual family expenditure deviates (WHY?) with a tendency of being clustered around the mean. Deviation = ei = Yi - E (Y/ i ) ; 1) Systematic/Deterministic component = E (Y/ i )
2) Error /unsystematic /random component
200
E (Y/X) = f(Xi ) = Mean

150
100
2 = Slope
50
1 = Intercept
Distribution of Y for income $240
0 60 80 100 120 140 160 180 200 220 240 260
Conditional Distribution of Expenditure for different levels of family income
Sample Regression
Sample Regression line Yi(est) = b1 (est) + b2 (est) i + ei (est) ; b1(est) & b2(est) are statistic; sample characteristics, estimates of population parameters. Based on samples, regression estimates vary
IMT
How can we make sure sample estimates are true representation of population?
200 180 160
X 80 100 120 140 160 180 200 220 240 260
Y (1st) 55 88 90 80 118 120 145 135 145 175
Y (2nd) 70 65 90 95 110 115 120 140 155 150
Mean 65 77 89 101 113 125 137 149 161 173
140 120 100 80 60 40 20 0 60 80 100 120 140 160 180 200 220 240 260 Y (1st) Y (2nd) Mean
Samples from the population of 60 families
Sample regression lines
Two-Variable Regression Estimation

Y
X X X
IMT
e3 e1
X X X X X X X
e4
e2
X X
X
PRL : Yi = 1 + 2 i + ui ; latent, can be estimated from SRL SRL : Yi Yi (est) ei(est) ; i = 1, 2, 3, , n = b1 (est) + b2 (est) i - ei(est) ; i.e. ei(est) = Yi - Yi (est) ; i.e. ei(est)2 = (Yi - Yi (est) )2 ; Estimate b1 and b2, so that ei(est)2 (Residual Sum of Squares/RSS) is minimum. ei(est)2 / b1 = 0 & ei(est)2 / b2 = 0 gives Ordinary Least Square (OLS) estimates of 1 & 2. 1 & 2 can also be estimated by Maximum Likelihood Estimator (MLE). However, MLE and OLS give same results.
SRL = Yi(est) = b1 + b2
Assumptions of OLS estimation

The relationship between variables as:
IMT
& Y is linear ; implies linearity in parameters, not in
Y=
,Y= Y=
+
2 1
ln( ), ln(Y) =
3 2
, Y=
2 2
NOT + ln( ) or Y =
1
The
s are non-stochastic variables whose values are fixed

2
The error has zero expected value; E(u)=0 The error term has constant variance; E(u2) = Errors are statistically independent. Thus, E(ui uj)=0 for all i j ; no auto correlation The error term is normally distributed : u ~ N (0, ui i = 0 ; u & Y~ N (1 + 2 , are uncorrelated
2 2)
; homoscedastic
) s are uncorrelated;
Applicable in case of multiple regression , multicollinearity
OLS Estimates & its properties

OLS Estimates b2 = ( i ) ( Yi Y )/ ( = Cov ( , Y)/Var ( ); = xiyi/ xi ; b1 = Y b1
i
IMT
Variance of b1, b2 & ei )2 Var(b2) =

2 2
/(
2 i 2
)2
i
Var ( b1) =
/n(
)2
= ei2 /(n-2) ; (n-2) is called dof, number of independent observations, as we loose 2 dof to compute b1 & b2 in estimating Yi .
Cov (b1 , b2 ) =Var(b2) = (

2
/(
)2 )
By the assumption of normality in Y and u, b1 and b2 also follow normal distributions with b1 ~ N (E(b1) = 1,Var b1 ); Z = (b1 - 1) / Var (b1) ~ N (0,1) b2 ~ N (E(b2) = 2,Var b2 ); Z = (b1 - 2) / Var (b2) ~ N (0,1)
Goodness of Fit of estimated model

IMT
r2 (coefficient of determination) (in case of two-variable/bivariate regression) and R2 (in case of more than two variables/multivariate regression) measure how close sample regression line fits the data. r2 represent the overlapping portion below.
YY
r2 = 0
r2 between 0 to 1
Y= X
r2 close to 1
r2 = 1
r2 lies between 0 and 1. The higher the r2, the better the estimated model Y(est)
Goodness of Fit: Decompose Variation

Total Variation = Estimated/Explained part + Error Part Y
Total = ( Yi Y)
X
IMT
Yi
X
Due to error = ei Yi (est) Due to Regression = ( Yi(est) Y)
or,
TSS = ESS + RSS
or, (Yi Y)2 = (Yi(est) Y)2 + (Yi Yi (est))2 or, r2 = (Yi(est) Y)2 / (Yi Y )2 = 1 ( ei2 / (Yi Y )2 ) or, % of total variation in Y explained by the estimated regression model. or, r2 can be written as = ( (
i
Xi
) (Yi Y)) 2/ (
)2 (Yi Y)2
= Cov ( ,Y) 2 / Var ( ) Var (Y)
If the regression equation explains none of the variation of Yi (i.e. no relationship between zero If the equation explains all the variation, r2 will be one A low r2 would be indicative of a rather poor fit
& Y), r2 will be
Example: Actual Plots

Data on Weekly Family Consumption Expenditure and Weekly Family Income Y = Weekly Family Consumption Expenditure, $ X = Weekly Family Income, $
Y X
IMT
Corr = 0.98
70 65 90 95 110 115 120 140 155 150
80 100 120 140 160 180 200 220 240 260

180
160
Family Consumption Expenditu e
140 120 100 80 60 40 20 0 80 100 120 140 160 180 200 220 240 260
Shows a positive relationship, the sign of family income is expected to have positive sign
Regression Results
eekl Famil n ome eekl Famil onsumption Expen iture, $ eekl Famil n ome, $
IMT
OLS Es
b2 S (b1) S (b2) ^2 ^2 /b2 /b1

a
- b2

s
24.45 0.51 41.14 6.41 0.0013 0.04 42.16

(b1) (b2)
/ ^2 (b1))

q ( ^2)

S , hyp h
b2 = 0
byb
)'
('%# $
&
% $!# # "!

b2/S (b2)
f 2.306 w -

v ( , )^2/V

^2/ -2 ( )*V ( )

/ q
^2/ ^2 ( ^2)

Sq
^2/ * ^2)* ^2 (v

b1

0.9621 0.9808 14.243 3.813 8 0.05
Microsoft Office Excel Worksheet
5%
gn f
Actual vs Predicted Expenditure

180 Y 160 140
32 1 0
IMT
Yi e
Ye
4
100 80 60 40 20 0 80 100 120 140 160 180 200 220 240 260 24.5 170 0.51
7 5 6
120
111
= 24.45+0.51*X
Example: Software application: E-views

Dependent Variable: Y: Consumption Expenditure Method: Least Squares Date: 05/20/11 Time: 15:01 Sample: 1 10 Included observations: 10 Variable Coefficient Std. Error 6.413817 0.035743 0.962062 0.957319 6.493003 337.2727 -31.78092 202.8679 0.000001 t-Statistic 3.812791 14.24317 111.0000 31.42893 6.756184 6.816701 6.689797 2.680127
IMT
Prob. 0.0051 0.0000
C 24.45455 X/Income 0.509091 R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood F-statistic Prob(F-statistic)
Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion Hannan-Quinn criter. Durbin-Watson stat
Example: Software application: SAS
IMT
Microsoft Office Excel Worksheet
IMT
Business Problem: 1.Understand the factors driving the performance of pharma companies 2. Predicting its performance in future

Regression 290611

Transféré par

Informations du document

Description originale:

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Regression 290611

Transféré par

Droits d'auteur :

Formats disponibles

INSTITUTE OF MANAGEMENT TECHNOLOGY, GHAZIABAD

Financial Econometrics Term IV June-Aug 2011

Course Instructor: Dr. Kakali Kanjilal

Steps involved in econometric analysis

Estimate the model

Leverage the model for predictions & strategic advice

Tests of the hypotheses

Source: Based on Maddala 2001

Data Handling by SAS

140 80 93 95 103 108 113 115

Total Conditional Mean E (Y/X

E (Y/X) = f(Xi ) = Mean

Distribution of Y for income $240

0 60 80 100 120 140 160 180 200 220 240 260

Conditional Distribution of Expenditure for different levels of family income

200 180 160

X 80 100 120 140 160 180 200 220 240 260

Y (1st) 55 88 90 80 118 120 145 135 145 175

Y (2nd) 70 65 90 95 110 115 120 140 155 150

Mean 65 77 89 101 113 125 137 149 161 173

Samples from the population of 60 families

Sample regression lines

Two-Variable Regression Estimation

Assumptions of OLS estimation

& Y is linear ; implies linearity in parameters, not in

s are non-stochastic variables whose values are fixed

Applicable in case of multiple regression , multicollinearity

OLS Estimates & its properties

Variance of b1, b2 & ei )2 Var(b2) =

Cov (b1 , b2 ) =Var(b2) = (

Goodness of Fit of estimated model

Goodness of Fit: Decompose Variation

Due to error = ei Yi (est) Due to Regression = ( Yi(est) Y)

TSS = ESS + RSS

= Cov ( ,Y) 2 / Var ( ) Var (Y)

& Y), r2 will be

Example: Actual Plots

70 65 90 95 110 115 120 140 155 150

80 100 120 140 160 180 200 220 240 260

Family Consumption Expenditu e

0.9621 0.9808 14.243 3.813 8 0.05

Microsoft Office Excel Worksheet

Actual vs Predicted Expenditure

Example: Software application: E-views

Prob. 0.0051 0.0000

Example: Software application: SAS

Microsoft Office Excel Worksheet

Vous aimerez peut-être aussi