Académique Documents
Professionnel Documents
Culture Documents
IMT
IMT
Contents
1. Course Overview 2. Data & Basic Statistics 3. Data Classification 4. Analysis of Cross-section data 5. Analysis of qualitative cross section data 6. Analysis of Time Series Data 7. Analysis of Cross section and Time Series Data
IMT
1. 2. 3. 4.
Translate into a few hypothesis Identify parameters/attributes Inquire on the information /data Identify most suitable model/analytical tool
1. No 2.
Is the estimation correct? (Specification tests & diagnostic checks) Is the model adequate?
Yes
IMT
IMT
Regression Analysis
What is Regression?
1. It means Drop, Falling off, Deterioration etc. in English 2. F. Galton introduced the word Regression . Means tends to move /Regress towards average. 3.
IMT
Today's world, it means statistical relationship of dependent variable on independent variable(s) having fixed values. It predicts/estimates the average value of the dependent variable based on the known fixed values of independent variables.
4. It does not necessarily mean causation. For example, the movements in domestic Stock returns could be dependent on the movements in global crude oil price, however, crude price movements dont the changes in Stock returns.
A Regression Situation
W eek ly con sum ption ex pen d iture & in com e for 6 0 fam ilies where for fix ed values o f X , we get d ifferen t set o f values for Y .
Y 80 55 eekly Family 60 65 70 75
IMT
100 65 70 74 80 85 88
120 79 84 90 94 98
Consumption Expenditure $$
325 65
462 77
445 89
707 101
C an we estim ate/pred ict a statistical relatio nship o f Y (d epend ent) fo r given X (ind epend ent?
Som e other E x am p les
1. 2. 3. T he d em an d o f a prod u ct with respect to its ad vertisin g ex pen d itu re . T his relation ship will be able to an aly ze % chan ge in d em an d with respect to 1% chan ge in ad vertisin g ex pen d itu re . T he stock retu rn o f a firm with respect to its sales, n o . of em ploy ees. Price respon siven ess/elasticity o f the d em an d for a prod u ct
eekly Family Income $$ 160 102 107 110 116 118 125 678 113 180 110 115 120 130 135 140 750 125 685 137 200 120 136 140 144 145 220 135 137 140 152 157 160 162 1043 149 966 161 240 137 145 155 165 175 189 260 150 152 175 178 180 185 191 1211 173
Population Regression
IMT
The line E(Y/ ) = f( i )= 1 + 2 i is a population regression function. 1 & 2 are parameters; represent behaviours of 60 families/population . Not observable. Assumes a linear relationship On average , as income increases, expenditure also goes up, although the individual family expenditure deviates (WHY?) with a tendency of being clustered around the mean. Deviation = ei = Yi - E (Y/ i ) ; 1) Systematic/Deterministic component = E (Y/ i )
2) Error /unsystematic /random component
200
100
2 = Slope
50
1 = Intercept
Sample Regression
Sample Regression line Yi(est) = b1 (est) + b2 (est) i + ei (est) ; b1(est) & b2(est) are statistic; sample characteristics, estimates of population parameters. Based on samples, regression estimates vary
IMT
How can we make sure sample estimates are true representation of population?
140 120 100 80 60 40 20 0 60 80 100 120 140 160 180 200 220 240 260 Y (1st) Y (2nd) Mean
IMT
e3 e1
X X X X X X X
e4
e2
X X
X
PRL : Yi = 1 + 2 i + ui ; latent, can be estimated from SRL SRL : Yi Yi (est) ei(est) ; i = 1, 2, 3, , n = b1 (est) + b2 (est) i - ei(est) ; i.e. ei(est) = Yi - Yi (est) ; i.e. ei(est)2 = (Yi - Yi (est) )2 ; Estimate b1 and b2, so that ei(est)2 (Residual Sum of Squares/RSS) is minimum. ei(est)2 / b1 = 0 & ei(est)2 / b2 = 0 gives Ordinary Least Square (OLS) estimates of 1 & 2. 1 & 2 can also be estimated by Maximum Likelihood Estimator (MLE). However, MLE and OLS give same results.
SRL = Yi(est) = b1 + b2
IMT
Y=
,Y= Y=
+
2 1
ln( ), ln(Y) =
3 2
, Y=
2 2
NOT + ln( ) or Y =
1
The
The error has zero expected value; E(u)=0 The error term has constant variance; E(u2) = Errors are statistically independent. Thus, E(ui uj)=0 for all i j ; no auto correlation The error term is normally distributed : u ~ N (0, ui i = 0 ; u & Y~ N (1 + 2 , are uncorrelated
2 2)
; homoscedastic
) s are uncorrelated;
IMT
/(
2 i 2
)2
i
Var ( b1) =
/n(
)2
= ei2 /(n-2) ; (n-2) is called dof, number of independent observations, as we loose 2 dof to compute b1 & b2 in estimating Yi .
/(
)2 )
By the assumption of normality in Y and u, b1 and b2 also follow normal distributions with b1 ~ N (E(b1) = 1,Var b1 ); Z = (b1 - 1) / Var (b1) ~ N (0,1) b2 ~ N (E(b2) = 2,Var b2 ); Z = (b1 - 2) / Var (b2) ~ N (0,1)
r2 (coefficient of determination) (in case of two-variable/bivariate regression) and R2 (in case of more than two variables/multivariate regression) measure how close sample regression line fits the data. r2 represent the overlapping portion below.
YY
r2 = 0
r2 between 0 to 1
Y= X
r2 close to 1
r2 = 1
r2 lies between 0 and 1. The higher the r2, the better the estimated model Y(est)
IMT
Yi
X
or,
or, (Yi Y)2 = (Yi(est) Y)2 + (Yi Yi (est))2 or, r2 = (Yi(est) Y)2 / (Yi Y )2 = 1 ( ei2 / (Yi Y )2 ) or, % of total variation in Y explained by the estimated regression model. or, r2 can be written as = ( (
i
Xi
) (Yi Y)) 2/ (
)2 (Yi Y)2
If the regression equation explains none of the variation of Yi (i.e. no relationship between zero If the equation explains all the variation, r2 will be one A low r2 would be indicative of a rather poor fit
IMT
Corr = 0.98
140 120 100 80 60 40 20 0 80 100 120 140 160 180 200 220 240 260
Shows a positive relationship, the sign of family income is expected to have positive sign
Regression Results
eekl Famil n ome eekl Famil onsumption Expen iture, $ eekl Famil n ome, $
IMT
OLS Es
b2 S (b1) S (b2) ^2 ^2 /b2 /b1
a
- b2
s
24.45 0.51 41.14 6.41 0.0013 0.04 42.16
(b1) (b2)
/ ^2 (b1))
q ( ^2)
S , hyp h
b2 = 0
byb
)'
('%# $
&
% $!# # "!
b2/S (b2)
f 2.306 w -
v ( , )^2/V
^2/ -2 ( )*V ( )
/ q
^2/ ^2 ( ^2)
Sq
^2/ * ^2)* ^2 (v
b1
5%
gn f
IMT
Yi e
Ye
4
100 80 60 40 20 0 80 100 120 140 160 180 200 220 240 260 24.5 170 0.51
7 5 6
120
111
= 24.45+0.51*X
IMT
C 24.45455 X/Income 0.509091 R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood F-statistic Prob(F-statistic)
Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion Hannan-Quinn criter. Durbin-Watson stat
IMT
IMT
Business Problem: 1.Understand the factors driving the performance of pharma companies 2. Predicting its performance in future