Econometrics Assignment HW4

Econometrics Assignment- HW4
Group- B
Group Members:
Aman Bansal
(PGP15006)
M.Vikas
(PGP15027)
Nikhil Sharma
(PGP15094)
Sumit Agarwal
(PGP15052)
Aman Chandila (PGP15066)
Vikas Srivastava(PGP15119)
Q1. We will work with w.data dataset in this assignement. Bring this
data to R.
Ans. R script
Console:
Q2. Provide a table with summary statistics (number of observations,

mean, standard deviation, minimum, maximum) of all the variables in
the dataset.
Ans. R script
Console:
Q3. We want to know whether the probability that a person will visit a
doctor can be predicted by some of the demographic characteristics.
Create a binary variable docvisit which takes the value of 1 only if a
person has a non-zero number of visits to doctor. Note: We are NOT
going to use the panel characteristics of this data.
Ans. R script
Console:
Q4. Run a linear probability model where the docvisit variable depends
on the log of income, age, good health, male, and household size
variables. Interpret your results.
Ans. R script
From the above results, we can conclude that a person is more likely to visit a
doctor if:
His/her income is higher

He/she is of old age
The person is a female
Their household size is small
The person has bad health
We have also used the describe function to calculate the summary statistics of
the fitted model which shows that the probability of a person visiting a doctor
lies between 0.36 to 0.97 and the mean is 0.65.
Although the simple linear model cannot be used in the case when we use a
binary dependent variable as the model violates the rule of probability lying
between 0 and 1, in this case the model is not violating the rule.
Q5. Now add some employment variables to your linear probability
model: whether this person receives welfare payments or not, whether
this person is unemployed or not, whether this person has a full time
work or not. Run the model, obtain heteroscedasticity-robust standard
errors, and interpret your results.
Ans. R script
studentized Breusch-Pagan test

data: reg2
BP = 1733.6, df = 8, p-value < 2.2e-16
As we can see the p-value is significant so we reject the null hypothesis

that the coefficients of the regressed terms on the square of residuals
is zero, hence there exists heteroscedasticity. So we need to adjust the
coefficient for heteroscedasticity and hence included vcov=hccm.
The results are
coeftest(reg2)
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.4777448 0.0509612 9.3747 < 2.2e-16 ***
loginc
0.0503848 0.0065586 7.6823 1.606e-14 ***
age
0.0010003 0.0002361 4.2368 2.274e-05 ***
goodh
-0.2035693 0.0054209 -37.5524 < 2.2e-16 ***
male
-0.1308615 0.0057018 -22.9510 < 2.2e-16 ***
hsize
-0.0111345 0.0019470 -5.7187 1.083e-08 ***
unemp
-0.0382156 0.0097844 -3.9058 9.412e-05 ***
sozh
0.0092916 0.0139168 0.6677 0.5044
ft
-0.0476117 0.0062582 -7.6079 2.861e-14 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
coeftest(reg2,vcov=hccm)
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.47774478 0.05077345 9.4093 < 2.2e-16 ***
loginc
0.05038479 0.00653149 7.7141 1.252e-14 ***
age
0.00100032 0.00023317 4.2900 1.792e-05 ***
goodh
-0.20356928 0.00536717 -37.9286 < 2.2e-16 ***
male
-0.13086147 0.00569630 -22.9730 < 2.2e-16 ***
hsize
-0.01113445 0.00198496 -5.6094 2.046e-08 ***
unemp
-0.03821556 0.00970430 -3.9380 8.233e-05 ***
sozh
0.00929160 0.01373041 0.6767 0.4986
ft
-0.04761173 0.00614553 -7.7474 9.651e-15 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
As we can see the p-values have changed significantly, the estimates

havent changed though and can be interpreted as we have done
before.
Q6. Now test whether the employment related variables should belong
in your model or not.
Ans. R script
The employment related variables i.e. unemployment, full time work, welfare
payments are stored in a variable q0 and the linear hypothesis test is done to
check whether these variables should belong in our model or not. From the
results, it can be concluded that the Probability is very low and we can easily
reject the Null Hypothesis at 5% level. Thus, all the employment related variables
should belong to our model.
Q7. Suppose we have two male individuals, both with good health, not
unemployed and the employment being full time, not receiving welfare
payments, with following other characteristics: log of income 5.02 and
10.03 respectively, age 20 and 60 respectively, and household size of 4.
Find out the likelihood of these two individuals visiting a doctor. What
do you think is going on here?
Ans. R script
The likelihood of the two individuals visiting a doctor is 32.41 % and 61.65 %.
This shows that keeping everything constant and increasing the age and income
of the individual increases the probability of a person visiting a doctor.
Q8. Run a logit model instead of a linear probability model in part 5.

What can we infer from the results?
Ans. R Script
glm(formula = docvisit ~ loginc + age + goodh + male + hsize +
unemp + sozh + ft, family = binomial(link = logit), data = mydata)
Deviance Residuals:
Min
1Q Median
3Q
Max
-2.1022 -1.1511 0.6393 0.9268 1.4913
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.166536 0.246044 -0.677 0.498
loginc
0.241696 0.031736 7.616 2.62e-14 ***
age
0.005205 0.001144 4.551 5.35e-06 ***
goodh
-0.973606 0.026848 -36.263 < 2e-16 ***
male
-0.610408 0.027252 -22.399 < 2e-16 ***
hsize
-0.052709 0.009240 -5.705 1.17e-08 ***
unemp
-0.188675 0.047145 -4.002 6.28e-05 ***
sozh
0.044193 0.067146 0.658 0.510
ft
-0.235469 0.030354 -7.757 8.67e-15 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 42494 on 32836 degrees of freedom
Residual deviance: 39669 on 32828 degrees of freedom
AIC: 39687
Number of Fisher Scoring iterations: 4
Interc
ept
-0.166536
Loginc
0.005205
Age
goodh
male
hsize
unem
p
sozh
0.241696
-0.973606
-0.610408
-0.052709
-0.188675
0.044193
-0.235469
Ft
Y*
Probabil
ity
10
10
10
10
30
60
30
30
12
0.2928
91
0.4490
41
0.0233
6
2.2568
56
0.5727
04
0.6104
11
0.4941
6
0.9052
4
These are some predicted values from the model. The coefficients prove that the
probability of a person visiting a doctor increases if:
The persons income is higher

He/she is of old age
The person has bad health
The person is a female
Household size is less
The person is employed
The person does not receive welfare payments
The person works full-time
Q9. Calculate McFaddens pseudo R-squared.

Ans. R Script
> logLik(logit)
'log Lik.' -19834.49 (df=9)
> 1 - logit$deviance/logit$null.deviance
[1] 0.06648934
Q10. Is the overall model significant?

Ans. R Script
lrtest(logit)
Likelihood ratio test
Model 1: docvisit ~ loginc + age + goodh + male + hsize + unemp + sozh +
ft
Model 2: docvisit ~ 1
#Df LogLik Df Chisq Pr(>Chisq)
1 9 -19835
2 1 -21247 -8 2825.4 < 2.2e-16 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
The Lrtest test the hypothesis of the goodness of fit of two models.
Null Hypothesis: The smaller model provides better goodness of fit.
From the above results it can be easily seen that the probability value is very low
and thus the null hypothesis is rejected and we conclude that our model is
significant.
Q11. Get the predicted likelihood for the two individuals from part 7.
Ans. R script
predict(logit,indvalues,type="response")
1
2
0.2932925 0.6317290
The predicted likelihood is 29.33% and 63.17%.
Q12. Test whether the employment variables should be in the model or

not.
Ans. R Script
lrtest(logit,logit1)
Likelihood ratio test
Model 1: docvisit ~ loginc + age + goodh + male + hsize + unemp + sozh +
ft
Model 2: docvisit ~ loginc + age + goodh + male + hsize
#Df LogLik Df Chisq Pr(>Chisq)
1 9 -19835
2 6 -19866 -3 63.196 1.219e-13 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
From the above result, we see that the probability of model 2 is very small and
can be easily rejected at 1 % significance level, i.e. the p value is very low, so the
. Thus, the Null Hypothesis is rejected and we conclude that employment related
variables should belong to our model.
Q13. Calculate the average partial effects. Interpret your results.

Ans. R Script
Average partial effects explain how much on an average a dependent variable

will change by one-unit change in the independent variable.
From the results shown above, we can infer that goodh variable plays an
important role in changing the probability of a person visiting a doctor. On an
average, a person with bad health has 20.39 % more chance of visiting a doctor
than a person with good health. Similarly, gender also plays an important role in
determining the probability of a person visiting a doctor. A female is 12.9% more
likely to visit a doctor than a male. Unemployment and income also play a role in
determining the probability with both affecting the probability by 4% and 5%
respectively.

Econometrics Assignment HW4

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Econometrics Assignment HW4

Transféré par

Droits d'auteur :

Formats disponibles

Econometrics Assignment- HW4

Q2. Provide a table with summary statistics (number of observations,

His/her income is higher

studentized Breusch-Pagan test

As we can see the p-value is significant so we reject the null hypothesis

As we can see the p-values have changed significantly, the estimates

Q8. Run a logit model instead of a linear probability model in part 5.

The persons income is higher

Q9. Calculate McFaddens pseudo R-squared.

Q10. Is the overall model significant?

The predicted likelihood is 29.33% and 63.17%.

Q12. Test whether the employment variables should be in the model or

Q13. Calculate the average partial effects. Interpret your results.

Average partial effects explain how much on an average a dependent variable

Vous aimerez peut-être aussi