Vous êtes sur la page 1sur 8

Econometrics Assignment- HW4

Group- B

Group Members:
Aman Bansal
(PGP15006)
M.Vikas
(PGP15027)
Nikhil Sharma
(PGP15094)
Sumit Agarwal
(PGP15052)
Aman Chandila (PGP15066)
Vikas Srivastava(PGP15119)
Q1. We will work with w.data dataset in this assignement. Bring this
data to R.
Ans. R script

Console:

Q2. Provide a table with summary statistics (number of observations,


mean, standard deviation, minimum, maximum) of all the variables in
the dataset.
Ans. R script

Console:

Q3. We want to know whether the probability that a person will visit a
doctor can be predicted by some of the demographic characteristics.
Create a binary variable docvisit which takes the value of 1 only if a
person has a non-zero number of visits to doctor. Note: We are NOT
going to use the panel characteristics of this data.
Ans. R script

Console:

Q4. Run a linear probability model where the docvisit variable depends
on the log of income, age, good health, male, and household size
variables. Interpret your results.
Ans. R script

From the above results, we can conclude that a person is more likely to visit a
doctor if:

His/her income is higher


He/she is of old age
The person is a female
Their household size is small
The person has bad health

We have also used the describe function to calculate the summary statistics of
the fitted model which shows that the probability of a person visiting a doctor
lies between 0.36 to 0.97 and the mean is 0.65.
Although the simple linear model cannot be used in the case when we use a
binary dependent variable as the model violates the rule of probability lying
between 0 and 1, in this case the model is not violating the rule.
Q5. Now add some employment variables to your linear probability
model: whether this person receives welfare payments or not, whether
this person is unemployed or not, whether this person has a full time
work or not. Run the model, obtain heteroscedasticity-robust standard
errors, and interpret your results.
Ans. R script

studentized Breusch-Pagan test


data: reg2
BP = 1733.6, df = 8, p-value < 2.2e-16

As we can see the p-value is significant so we reject the null hypothesis


that the coefficients of the regressed terms on the square of residuals
is zero, hence there exists heteroscedasticity. So we need to adjust the
coefficient for heteroscedasticity and hence included vcov=hccm.
The results are
coeftest(reg2)
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.4777448 0.0509612 9.3747 < 2.2e-16 ***
loginc
0.0503848 0.0065586 7.6823 1.606e-14 ***
age
0.0010003 0.0002361 4.2368 2.274e-05 ***
goodh
-0.2035693 0.0054209 -37.5524 < 2.2e-16 ***
male
-0.1308615 0.0057018 -22.9510 < 2.2e-16 ***
hsize
-0.0111345 0.0019470 -5.7187 1.083e-08 ***
unemp
-0.0382156 0.0097844 -3.9058 9.412e-05 ***
sozh
0.0092916 0.0139168 0.6677 0.5044
ft
-0.0476117 0.0062582 -7.6079 2.861e-14 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
coeftest(reg2,vcov=hccm)
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.47774478 0.05077345 9.4093 < 2.2e-16 ***
loginc
0.05038479 0.00653149 7.7141 1.252e-14 ***
age
0.00100032 0.00023317 4.2900 1.792e-05 ***
goodh
-0.20356928 0.00536717 -37.9286 < 2.2e-16 ***
male
-0.13086147 0.00569630 -22.9730 < 2.2e-16 ***
hsize
-0.01113445 0.00198496 -5.6094 2.046e-08 ***
unemp
-0.03821556 0.00970430 -3.9380 8.233e-05 ***
sozh
0.00929160 0.01373041 0.6767 0.4986
ft
-0.04761173 0.00614553 -7.7474 9.651e-15 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

As we can see the p-values have changed significantly, the estimates


havent changed though and can be interpreted as we have done
before.

Q6. Now test whether the employment related variables should belong
in your model or not.
Ans. R script

The employment related variables i.e. unemployment, full time work, welfare
payments are stored in a variable q0 and the linear hypothesis test is done to
check whether these variables should belong in our model or not. From the
results, it can be concluded that the Probability is very low and we can easily
reject the Null Hypothesis at 5% level. Thus, all the employment related variables
should belong to our model.

Q7. Suppose we have two male individuals, both with good health, not
unemployed and the employment being full time, not receiving welfare
payments, with following other characteristics: log of income 5.02 and
10.03 respectively, age 20 and 60 respectively, and household size of 4.
Find out the likelihood of these two individuals visiting a doctor. What
do you think is going on here?
Ans. R script

The likelihood of the two individuals visiting a doctor is 32.41 % and 61.65 %.
This shows that keeping everything constant and increasing the age and income
of the individual increases the probability of a person visiting a doctor.

Q8. Run a logit model instead of a linear probability model in part 5.


What can we infer from the results?
Ans. R Script
glm(formula = docvisit ~ loginc + age + goodh + male + hsize +
unemp + sozh + ft, family = binomial(link = logit), data = mydata)
Deviance Residuals:
Min
1Q Median
3Q
Max
-2.1022 -1.1511 0.6393 0.9268 1.4913

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.166536 0.246044 -0.677 0.498
loginc
0.241696 0.031736 7.616 2.62e-14 ***
age
0.005205 0.001144 4.551 5.35e-06 ***
goodh
-0.973606 0.026848 -36.263 < 2e-16 ***
male
-0.610408 0.027252 -22.399 < 2e-16 ***
hsize
-0.052709 0.009240 -5.705 1.17e-08 ***
unemp
-0.188675 0.047145 -4.002 6.28e-05 ***
sozh
0.044193 0.067146 0.658 0.510
ft
-0.235469 0.030354 -7.757 8.67e-15 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 42494 on 32836 degrees of freedom
Residual deviance: 39669 on 32828 degrees of freedom
AIC: 39687
Number of Fisher Scoring iterations: 4

Interc
ept

-0.166536

Loginc

0.005205

Age
goodh
male
hsize
unem
p
sozh

0.241696

-0.973606
-0.610408
-0.052709
-0.188675
0.044193
-0.235469

Ft

Y*
Probabil
ity

10

10

10

10

30

60

30

30

12

0.2928
91

0.4490
41

0.0233
6

2.2568
56

0.5727
04

0.6104
11

0.4941
6

0.9052
4

These are some predicted values from the model. The coefficients prove that the
probability of a person visiting a doctor increases if:

The persons income is higher


He/she is of old age
The person has bad health
The person is a female
Household size is less
The person is employed
The person does not receive welfare payments
The person works full-time

Q9. Calculate McFaddens pseudo R-squared.


Ans. R Script
> logLik(logit)
'log Lik.' -19834.49 (df=9)
> 1 - logit$deviance/logit$null.deviance
[1] 0.06648934

Q10. Is the overall model significant?


Ans. R Script
lrtest(logit)
Likelihood ratio test
Model 1: docvisit ~ loginc + age + goodh + male + hsize + unemp + sozh +
ft
Model 2: docvisit ~ 1
#Df LogLik Df Chisq Pr(>Chisq)
1 9 -19835
2 1 -21247 -8 2825.4 < 2.2e-16 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

The Lrtest test the hypothesis of the goodness of fit of two models.
Null Hypothesis: The smaller model provides better goodness of fit.
From the above results it can be easily seen that the probability value is very low
and thus the null hypothesis is rejected and we conclude that our model is
significant.

Q11. Get the predicted likelihood for the two individuals from part 7.
Ans. R script
predict(logit,indvalues,type="response")
1
2
0.2932925 0.6317290

The predicted likelihood is 29.33% and 63.17%.

Q12. Test whether the employment variables should be in the model or


not.
Ans. R Script
lrtest(logit,logit1)
Likelihood ratio test
Model 1: docvisit ~ loginc + age + goodh + male + hsize + unemp + sozh +
ft
Model 2: docvisit ~ loginc + age + goodh + male + hsize
#Df LogLik Df Chisq Pr(>Chisq)
1 9 -19835
2 6 -19866 -3 63.196 1.219e-13 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

From the above result, we see that the probability of model 2 is very small and
can be easily rejected at 1 % significance level, i.e. the p value is very low, so the
. Thus, the Null Hypothesis is rejected and we conclude that employment related
variables should belong to our model.

Q13. Calculate the average partial effects. Interpret your results.


Ans. R Script

Average partial effects explain how much on an average a dependent variable


will change by one-unit change in the independent variable.
From the results shown above, we can infer that goodh variable plays an
important role in changing the probability of a person visiting a doctor. On an
average, a person with bad health has 20.39 % more chance of visiting a doctor
than a person with good health. Similarly, gender also plays an important role in
determining the probability of a person visiting a doctor. A female is 12.9% more
likely to visit a doctor than a male. Unemployment and income also play a role in
determining the probability with both affecting the probability by 4% and 5%
respectively.

Vous aimerez peut-être aussi