Vous êtes sur la page 1sur 5

Homework: Poisson regression

We are interested in the number of accidents per service month for a sample of ships. The data can be found in the le ships.wmf. The endogenous variable is called ACC. The explicative variables are: TYPE: there are 5 ship types, labelled as A-B-C-D-E or 1-2-3-4-5. Type is a categorical variable, and 5 dummy variables can be created: TA, TB, TC, TD, TE. CONSTRUCTION YEAR: the ships are constructed in one of four periods, leading to the dummy variables T6064, T6569, T7074, T7579. SERVICE: a measure for the amount of service that the ship has already carried out. 1. Make a histogram of the variable ACC. Comment on its form. Is this the histogram for the conditional or unconditional distribution of ACC? Solution The histogram can be found in Figure 1 below:

Figure 1: A histogram of the variable ACC On rst sight, the shape of the histogram is more or less that of the histogram of a Poisson distributed variable, with the parameter small 1

(around 2 or 3). This is the histogram of the unconditional distribution of ACC.

2. Estimate the Poisson regression model including all explicative variables and a constant term. (Use estimation method: COUNT integer counting data) Solution Estimation of the full model, according to a Poisson Model, yields the following values for the coecients (TA was chosen to be the reference category for TYPE, and T6064 was chosen to be the reference category for CONSTRUCTION YEAR): C TB TC TD TE T6569 T7074 T7579 SERVICE Coecient 0.867530 0.989658 -1.219122 -0.858781 -0.242659 0.950927 1.266906 0.719230 4.48E-05 Std. Error 0.241285 0.212252 0.327417 0.287597 0.236351 0.176265 0.227427 0.277312 7.42E-06 t-Statistic 3.595456 4.662659 -3.723450 -2.986056 -1.026689 5.394863 5.570601 2.593581 6.042115 Prob. 0.0003 0.0000 0.0002 0.0028 0.3046 0.0000 0.0000 0.0095 0.0000

Take note that if TA and/or T6064 were included in the model as well, an error message would be generated, and no estimation would be performed.

3. Comment on the coecient for the variable SERVICE. Is it signicant? Solution The estimated value of the coecient for the variable SERVICE is very small, as is its standard error. This is most likely due to the variable having large variance. Calling up the descriptive statistics for the variable supports this ( = 9644.166). However, despite the small absolute value of the coecient, it is still highly signicant (p 5 105 ).

4. Perform a Wald test for the joint signicance of the construction year dummy variables. Solution Performing the Wald test on the hypothesis: H0 : T6569 = T7074 = T7579 = 0 yields a value of 40.56575 for the 2 -statistic, and hence, a p-value smaller than 5 107 , indicating that the construction year dummy variables are jointly extremely strongly signicant.

5. Given a ship of category A, constructed in the period 65-69, with SERVICE = 1000. Predict the number of accidents per service month. Also estimate (a) the probability that no accident will occur for this ship, and (b) the probability that at most one accident will occur. Solution Denote x = (0, 0, 0, 0, 1, 0, 0, 1000)t the encoding of the ship (the rst four numbers are the dummys for TYPE, where type A is the reference category as before, the next three are for CONSTRUCTION YEAR, with 60-64 the reference category, and the last entry is the value for SERVICE), (x) be the predicted number of accidents per service month. Then and let
+ 0 (x) = E[ACC | x] = ext

0 is the estimate of the intercept. Filling in the estimated values where , it is found that (x) = 6.444693. From this, the following two for probabilities can be estimated: (a)
P(ACC = 0 | x) = e(x) = 0.001588932

(b) P(ACC 1 | x) = P(ACC = 0 | x) + P(ACC = 1 | x) (x) e(x) + e(x)

= = 0.01182911

6. The computer output mentions: Convergence achieved after 9 iterations. What is this meaning? Solution This indicates that the 9th step of the iterative algorithm used to compute the ML estimators returned a solution which was closer to the previous solution than the convergence criterion.

7. What do we learn from the value Probability (LR stat)? What is the corresponding null hypothesis? Solution The value probability (LR) is here very low (smaller than 5 107 ) meaning that here, the variables used to construct the model are jointly extremely signicant. Writing this as a null hypothesis yields: H0 : = 0 with the vector of parameters (excluding the intercept), and as explained before, this hypothesis can be extremely strongly rejected.

8. Estimate now a Negative Binomial Model. EViews reports the log( 2 ) as the mixture parameter in the estimation output. (a) Compare the estimates of given by the two models. (b) Compare the pseudo R2 values of the two models. Solution Estimation of the full model, according to a Negative Binomial Model, yields the following values for the coecients (TA was chosen to be the reference category for TYPE, and T6064 was chosen to be the reference category for CONSTRUCTION YEAR): C TB TC TD TE T6569 T7074 T7579 SERVICE log( 2 ) Coecient 0.380217 0.997461 -1.112612 -0.882003 -0.147957 0.982162 1.857623 1.097159 6.41E-05 -1.104076 Std. Error 0.469879 0.539695 0.477577 0.444638 0.418851 0.419928 0.443221 0.498048 2.42E-05 0.454214 t-Statistic 0.809181 1.848192 -2.329703 -1.983644 -0.353246 2.338883 4.191192 2.202918 2.645928 -2.430740 Prob. 0.4184 0.0646 0.0198 0.0473 0.7239 0.0193 0.0000 0.0276 0.0081 0.0151

Once again, TA was chosen as reference category for TYPE and T6064 was chosen as reference category for CONSTRUCTION YEAR. (a) As can be seen, some of the estimated coecients have changed only TE , T7074 , slightly between models, while others (the intercept, T7579 , SERVICE ) have changed (much) more. (b) For the two estimated models, Poisson Model (PM) and Negative Binomial Model (NBM), the following values for pseudo R2 have been found: Model Pseudo R2 PM 0.713334 NBM 0.757713 This indicates that the Negative Binomial Model is slightly preferable, in terms of explicative power, than the Poisson Model.

9. Estimate now the Poisson model with only a constant term, so without explicative variables (empty model). Derive mathematically a formula for this estimate of the constant term (in the empty model), using the rst order condition of the ML-estimator. Solution Estimation of the estimation model, according to a Poisson Model, yields the following value for the estimated intercept: C Coecient 2.348570 Std. Error 0.053000 t-Statistic 44.31276 Prob. 0.0000

Analytically, this coecient can be found in the following way. Given the data {acci } (since no variables will be included), the total log-likelihood can be written as (using independency of the observations)

log L(acc1 , . . . , accn | 0 ) =


log P(ACC = acci | 0 )

where 0 is the intercept. Assuming a Poisson Model, and taking i = = e0 , it is found that P(ACC = acci | 0 ) = = and hence log L(0 ) =
i=1 n

e acci (acci )! ee e0 acci (acci )!


{e0 + 0 acci log(acci )!}

The maximum likelihood estimator of 0 is then dened as 0,ML = argmax log L(0 )

To get an analytic expression for this estimator, write down the rst-order condition of the estimator: d log L(0 ) d0

0 0 = i=1

{e0 + acci }

= ne0 + = n(acc e ) = 0 or
0 = log acc e0 = acc i=1 0

acci }

where acc is the sample mean of ACC.