Vous êtes sur la page 1sur 41

PANEL DATA METHODS FOR FRACTIONAL

RESPONSE VARIABLES WITH AN


APPLICATION TO TEST PASS RATES
Leslie E. Papke
Department of Economics
Michigan State University
East Lansing, MI 48824-1038
papke@msu.edu
Jeffrey M. Wooldridge
Department of Economics
Michigan State University
East Lansing, MI 48824-1038
wooldri1@msu.edu
This version: November 2005
1
Abstract
We develop methods for estimating panel data models for fractional response variables
with unobserved effects, and we apply the methods to the problem of estimating the effects of
school spending on test past rates for school districts in Michigan. We consider two cases.
The first is when spending is assumed to be strictly exogenous conditional on an unobserved
effect. We also show how to account for correlation between a time-varying explanatory
variable and time-varying unobservables when instrumental variables are available. The
response function ensures that fitted values are between zero and one and that the marginal
effects are diminishing as the index argument increases. An important part of the paper is to
show how to estimate average partial effects, which can be compared across many different
models and estimation techniques.
2
1. Introduction
Empirical studies attempting to explain fractional response variables have become more
sophisticated in recent years. Just a few examples of fractional responses include pension plan
participation rates, industry market shares, television ratings, fraction of land area allocated to
agriculture, and test pass rates. Researchers have begun to take seriously the functional form
issues that arise with a fractional response: a linear functional form for the conditional mean
might miss potentially important nonlinearties. Further, the traditional solution of using the
log-odds transformation fails when we observe responses at the corners, zero and one. Even in
cases where the variable is strictly inside the unit interval, we cannot easily recover the
expected value of the fractional response from a linear model for the log-odds ratio.
Based on these observations, in Papke and Wooldridge (1996) we proposed models for the
conditional mean of the fractional response that keep the predicted values in the unit interval.
We applied the method of quasi-maximum likelihood estimation (QMLE) to obtain robust
estimators with satisfactory efficiency properties. The most common of those methods, where
the mean function takes the logistic form, has since been applied in numerous empirical
studies, including Hausman and Leonard (1997), Liu, Liu, and Hammitt (1999), and Wagner
(2001). [In a private communication shortly after the publication of Pake and Wooldridge
(1996), in which he kindly provided Stata

code, John Mullahy dubbed the method of


quasi-MLE with a logistic mean function fractional logit, or flogit.]
Hausman and Leonard (1997) applied fractional logit to panel data on television ratings of
National Basketball Association games to estimate the effects of superstars on telecast ratings.
3
The extension from a pure cross section to pooling cross sections across time is relatively
straightforward. With panel data, the only extra complication in using pooled QMLE is in
ensuring that the standard errors are robust to arbitrary serial correlation (in addition to
misspecification of the conditional variance). But a more substantive issue arises with panel
data: How can we account for unobserved heterogeneity that is possibly correlated with the
explanatory variables?
Wagner (2003) analyzes a large panel data set of firms to explain the export-sales ratio as a
function of firm size. Wagner explicitly includes firm-specific intercepts in the fractional logit
model in the tradition of estimating fixed effects along with the parameters common across
units. While this allows unobserved heterogeneity to enter in a flexible way, it suffers from an
incidental parameters problem. Namely, with fixed T, the estimators of the fixed effects are
inconsistent, and this inconsistency transmits itself to the coefficients on the common slope
coefficients. Thus, the statistical properties of so-called fractional logit fixed effects are
largely unknown with small T. [Hausman and Leonard (1997) include team fixed effects in
their analysis, but these parameters can be estimated with precision because they have many
telecasts per team. Therefore, there is no incidental parameters problem in the Hausman and
Leonard setup.]
In this paper we extend our earlier work and show how to specify, and estimate, fractional
response models for panel data with a large cross-sectional dimension and relatively few time
periods. We explicitly allow for time-constant unobserved effects that can be correlated with
explanatory variables. We cover two cases. The first is when, conditional on an unobserved
effect, the explanatory variables are strictly exogenous. We then relax the strict exogeneity
assumption when we have available instrumental variables.
4
Rather than treating the unobserved effects as parameters to estimate, we employ the
Mundlak (1978) and Chamberlain (1980) device of modeling the distribution of the
unobserved effect conditional on the strictly exogenous variables. To accomodate this
approach, we exploit features of the normal distribution. Therefore, unlike in our early work,
where we focused mainly on the logistic response function, we use a probit response function.
In binary response contexts, the choice between the logistic and probit conditional mean
functions for the structural expectation is largely a matter of taste, although it has long been
recognized that, for handling certain kinds of endogenous explanatory variables, the probit
mean function has some advantages. We further exploit those advantages for panel data
models in this paper. As we will see, the probit response function results in very simple
estimation methods. While our focus is on fractional responses, our methods apply to the
binary response case with a continuous endogenous explanatory variable and unobserved
heterogeneity.
An important feature of our work is that we provide simple estmation of the partial effects
averaged across the population sometimes called the average partial effects (APEs) or
population averaged effects. These turn out to be identified under no assumptions on the
serial dependence in the response variable.
The rest of the paper is organized as follows. Section 2 introduces the model and
assumptions for the case of strictly exogenous explanatory variables, and shows how to
identify the APEs. Section 3 discusses estimation methods, including pooled QMLE and an
extension of the generalized estimating equation (GEE) approach. Section 4 relaxes the strict
exogeneity assumption, and shows how control function methods can be combined with the
Mundlak-Chamberlain device to produce consistent estimators . Section 5 contains our
5
application to studying the effects of spending on math test pass rates for Michigan, and
Section 6 contains a brief conclusion.
2. Models and Quantities of Interest for
Strictly Exogenous Explanatory Variables
We assume that a random sample in the cross section is available, and that we have
available T observations, t 1, . . . , T, for each random draw i. For cross-sectional observation
i and time period t, the response variable is y
it
, 0 y
it
1, where outcomes at the endpoints,
zero and one, are allowed. For a set of explanatory variables x
it
, a 1 K vector, we assume
Ey
it
|x
it
, c
i
x
it
c
i
, t 1, . . . , T, (2.1)
where is the standard normal cumulative distribution function (cdf). Assumption (2.1) is
a convenient functional form assumption. Specifically, the conditional expectation is assumed
to be of the index form, where the unobserved effect, c
i
, appears additively inside the standard
normal cdf, .
The use of in (2.1) deserves comment. In Papke and Wooldridge (1996), we allowed a
general function G in place of but then, for our application to pension plan
participation rates, we focused on the logistic function, z expz/1 expz. As we
proceed it will be clear that using in place of in (2.1) causes no conceptual or theoretical
difficulties; rather the probit function leads to computationally simple estimators in the
presence of unobserved heterogeneity or endogenous explanatory variables. If we assume that
a particular conditional expectation has a logistic form then a fractional logit approach is just
6
as straightforward; we discuss this briefly below.
Because is strictly monotonic, the elements of give the directions of the partial effects.
For example, dropping the observation index i, if x
tj
is continuous, then
Ey
t
|x
t
, c
x
tj
[
j
x
t
c. (2.2)
For discrete changes in one or more of the explanatory variables, we compute
x
t
1
c x
t
0
c, (2.3)
where x
t
0
and x
t
1
are two different values of the covariates.
Equations (2.2) and (2.3) reveal that the partial effects depend on the level of covariates
and the unobserved heterogeneity. Because x
t
is observed, we have a pretty good idea about
interesting values to plug in. Or, we can always average the partial effects across the sample
x
it
: i 1, . . . , N on x
t
. But c is not observed. A popular measure of the importance of the
observed covariates is to average the partial effects across the distribution of c, to obtain the
average partial effects (APEs). For example, in the continuous case, the APE with respect to
x
tj
, evaluated at x
t
, is
E
c
[
j
x
t
c [
j
E
c
x
t
c, (2.4)
which depends on x
t
(and, of course, ) but not on c. Similarly, we get APEs for discrete
changes by averaging (2.3) across the distribution of c.
Without further assumptions, neither nor the APEs are known to be identified. In this
section, we add two assumptions to (2.1). The first concerns the exogeneity of
x
it
: t 1, . . . , T. We assume that, conditional on c
i
, x
it
: t 1, . . . , T, is strictly
exogenous:
7
Ey
it
|x
i
, c
i
Ey
it
|x
it
, c
i
, t 1, . . . , T, (2.5)
where x
i
x
i1
, . . . , x
iT
is the set of covariates in all time periods. Assumption (2.5) is
common in unobserved effects panel data models, but it rules out lagged dependent variables
in x
it
, as well as other explanatory variables that may react to past changes in y
it
. Plus, it rules
out traditional simultaneity and correlation between time-varying omitted variables and the
covariates. [In Section 4, we will relax (2.5), provided we have valid instrumental variables.]
We also need to restrict the distribution of c
i
given x
i
in some way. While semiparametric
methods are possible, in this paper we propose a conditional normality assumption:
c
i
|x
i1
, x
i2
, . . . , x
iT
~Normal x
i
, o
a
2
, (2.6)
where x
i
T
1

t1
T
x
it
is the 1xK vector of time averages. As we will see, (2.6) leads to
straightforward estimation of the parameters [
j
up to a common scale factor, as well as
consistent estimators of the APEs. Adding nonlinear functions in x
i
in the conditional mean,
such as squares and cross products, is straightforward.
It is convenient to assume only the time average appears in Dc
i
|x
i
as a way of conserving
on degrees-of-freedom. But an unrestricted Chamberlain (1980) device is also possible, where
we allow each x
it
to have a separate vector of coefficients. For some purposes, it is useful to
write c
i
x
i
a
i
where a
i
|x
i
~Normal0, o
a
2
. [Note that o
a
2
Varc
i
|x
i
.] Naturally, if
we include time-period dummies in x
it
as is usually desirable we do not include the time
averages of these in x
i
.
Assumptions (2.1), (2.5), and (2.6) impose no additional distributional assumptions on
Dy
it
|x
i
, c
i
, and they place no restrictions on the serial dependence in y
it
across time.
Nevertheless, the elements of are easily shown to be identified up to a positive scale factor,
8
and the APEs are identified, too. A simple way to see this is to write
Ey
it
|x
i
, a
i
x
it
x
i
a
i
(2.7)
and so
Ey
it
|x
i
E x
it
x
i
a
i
|x
i
x
it
x
i
/1 o
a
2

1/2
(2.8)
or
Ey
it
|x
i

a
x
it

a
x
i

a
, (2.9)
where the a subscript denotes division of the orginal coefficient by 1 o
a
2

1/2
. The second
inequality in (2.8) follows from a well-known mixing property of the normal distribution.
[See, for example, Wooldridge (2002, Section 15.8.2) in the case of binary response; the
argument is essentially the same.] Because we observe a random sample on y
it
, x
it
, x
i
, (2.9)
implies that the scaled coefficients,
a
, [
a
, and
a
are identified, provided there are no perfect
linear relationships among the elements of x
it
and that there is time variation in all elements of
x
it
. (The latter requirement ensures that x
it
and x
i
are not perfectly collinear for all t. ) In
addition, it follows from the same arguments in Wooldridge (2002, Section 15.8.2) that the
average partial effects can be obtained by differentiating or differencing
E
x i

a
x
t

a
x
i

a
, (2.10)
with respect to the elements of x
t
. But (2.10) is consistently estimated by
N
1

i1
N

a
x
t

a
x
i

a
. (2.11)
Therefore, given consistent estimators of the scaled parameters, we can plug them into (2.11)
and consistently estimate the APEs. In the next section we discuss different estimation
strategies.
9
A logistic functional form can be used in place of a probit functional form if we do not
specify an underlying model for Ey
it
|x
i
, c
i
and instead simply assert that
Ey
it
|x
i

a
x
it

a
x
i

a
(2.12)
for parameters
a
,
a
,
a
. [The indexing by a here does not represent a particular scaling, but
we use it to be comparable with (2.9).] Then, the average partial effects are consistently
estimated by (2.11), but with replaced by . While (2.12) cannot be easily derived
from an underlying model for Ey
it
|x
it
, c
i
and Dc
i
|x
i
, as a practical matter (2.12) is likely to
provide as good an approximation as the probit model. The idea of using convenient
functional forms for expected values that depend on observables (or, at least, variables that can
be estimated) stems from the recent semi-parametric literature on estimating average partial
effects [or average structural functions in Blundell and Powell (2004)]. See Petrin and Train
(2003) for this kind of argument in a multinomial choice setting and Wooldridge (2005) in the
context of cross-sectional fractional response model with endogenous explanatory variables.
Because we do have reasonable assumptions under which (2.9) follows from underlying
models for Ey
it
|x
it
, c
i
and Dc
i
|x
i
, we focus on the probit response function in this paper.
3. Estimation Methods Under Strict
Exogeneity
Given (2.9), there are many consistent estimators of the scaled parameters. For simplicity,
define w
it
1, x
it
, x
i
, a 1 1 2K vector, and let 0
a
,
a

,
a

. One simple estimator


of 0 is the pooled nonlinear least squares (PNLS) estimator with regression function w
it
0.
10
The PNLS estimator, while consistent and N -asymptotically normal (with fixed T), is almost
certainly inefficient, for two reasons. First, it ignores the serial dependence in the y
it
, which is
likely to be substantial even after conditioning on x
i
. (In effect, a
i
is left in the error term, and
represents one form of serial correlation.)
Second, Vary
it
|x
i
is probably not homoskedastic because of the fractional nature of y
it
.
One possible alternative is to model Vary
it
|x
i
and then to use weighted least squares. In
some cases see Papke and Wooldridge (1996) for the cross-sectional case the conditional
variance can be shown to be
Vary
it
|x
i
t
2
w
it
01 w
it
0, (3.1)
where 0 t
2
1. Under (2.9) and (3.1), a natural estimator of 0 is a pooled weighted
nonlinear least squares estimator, where pooled NLS would be used in the first stage to
estimate the weights. But there is an even simpler estimator that is asymptotically equivalent,
the pooled Bernoulli quasi-MLE (QMLE), which is obtained from the pooled probit
log-likelihood; we will called this the pooled fractional probit estimator. This estimator is
trivial to obtain in econometrics packages that support standard probit estimation provided,
that is, the program allows for nonbinary response variables. The explanatory variables are
specified as 1, x
it
,x
i
. Typically, a generalized linear models command is available, as in
Stata

. In applying the Bernoulli QMLE, one needs to adjust the standard errors and test
statistics to allow for arbitrary serial dependence across t. The standard errors that are robust
to violations of (3.1) but assume serial independence are likely to be off substantially; most of
the time, they would tend to be too small. Typically, standard errors and test statistics
computed to be robust to serial dependence are also robust to arbitrary violations of (3.1), as
they should be. [The cluster option in Stata

is a good example.]
11
A test of independence between the unobserved effect and x
i
is easily obtained as a test of
H
0
:
a
0. Naturally, it is best to make this test fully robust to serial correlation and a
misspecified conditional variance.
In estimating 0, the pooled Bernoulli QMLE ignores the serial dependence in the joint
distribution Dy
i1
, . . . , y
iT
|x
i
, and this can be a major source of inefficiency. Yet modeling
Dy
i1
, . . . , y
iT
|x
i
and applying maximum likelihood methods, while possible, is hardly trivial,
especially for fractional responses that can have outcomes at the endpoints. Aside from
computational difficulties, full maximum likelihood estimator would produce nonrobust
estimators of the parameters of the conditional mean and the APEs. In other words, if our
model for Dy
i1
, . . . , y
iT
|x
i
is misspecified but Ey
it
|x
i
is correctly specified, the MLE will be
inconsistent for the conditional mean parameters and resulting APEs. [Loudermilk (2005) uses
a two-limit Tobit model in the case where a lagged dependent variable is included among the
regressors. In such cases, a full joint distributional assumption is very difficult to relax. The
two-limit Tobit model is ill-suited for our application because, although our response variable
is bounded from below by zero, there are no observations at zero.] Our goal is to obtain
consistent estimators under assumptions (2.1), (2.5), and (2.6) only. Nevertheless, we can gain
some efficiency by exploiting serial dependence in estimation in a robust way.
Multivariate weighted nonlinear least squares (MWNLS) is ideally suited for estimating
conditional means for panel data with strictly exogenous regressors in the presence of serial
correlation and heteroskedasticity. What we require is a parametric model of Vary
i
|x
i
, where
y
i
is the T 1 vector of responses. The model in (3.1) is sensible for the conditional variances,
but obtaining the covariances Covy
it
, y
ir
|x
i
is difficult, if not impossible, even if Vary
i
|x
i
, c
i

has a fairly simple form (such as being diagonal). Therefore, rather than attempting to find
12
Vary
i
|x
i
, we use a working version of this variance, which we expect to be misspecified
for Vary
i
|x
i
. This is the approach underlying the generalized estimating equation (GEE)
literature when applied to panel data, as described in Liang and Zeger (1986). In the current
context we apply this approach after having modeled Dc
i
|x
i
to arrive at the conditional mean
in (2.9).
It is important to understand that the GEE and MWNLS are asymptotically equivalent
whenever they use the same estimates of the matrix Vary
i
|x
i
. In other words, GEE is quite
familiar to economists once we allow the model of Vary
i
|x
i
to be misspecified. To this end,
let Vx
i
, y be a T T positive definite matrix, which depends on a vector of parameters, y, and
on the entire history of the explanatory variables. Let mx
i
, 0 denote the conditional mean
function for the vector y
i
. Because we assume the mean function is correctly specified, let 0
o
denote the value such that Ey
i
|x
i
mx
i
, 0
o
. In order to apply MWNLS, we need to
estimate the variance parameters. However, because this variance matrix is not assumed to be
correctly specified, we simply assume that the estimator y converges, at the standard N rate,
to some value, say y

. In other words, y

is defined as the probability limit of y (which exists


quite generally) and then we assume additionally that N y y

is bounded in probability.
This holds in regular parametric settings.
Because (3.1) is a sensible variance assumption, we follow the GEE literature and specify a
working correlation matrix. The most convenient working correlation structures, and those
that are programmed in popular software packages, assume correlations that are not a function
of x
i
. For our purposes, there are two structures that are attractive. The first is the so-called
exchangeability correlation pattern, where we act as if the standardized errors have a
constant correlation. To be precise, define, for each i, the errors as
13
u
it
y
it
Ey
it
|x
i
y
it
m
t
x
i
, 0
o
, t 1, . . . , T, (3.2)
where, in our application, m
t
x
i
, 0 w
it
0
a
x
it

a
x
i

a
. Generally, especially if
y
it
is not an unbounded, continuous variable, the conditional correlations, Corru
it
, u
is
|x
i
, are a
function of x
i
. Even if they were not a function of x
i
, they would generally depend on t, s. A
simple working assumption is that the correlations do not depend on x
i
and, in fact, are the
same for all t, s pairs. In the context of a linear model, this working assumption is identical
to the standard assumption on the correlation matrix in a so-called random effects model; see,
for example, Wooldridge (2002, Chapter 10).
If we believe the variance assumption (3.1), it makes sense to define standardized errors as
e
it
u
it
/ w
it
0
o
1 w
it
0
o
; (3.3)
under (3.1), Vare
it
|x
i
t
2
. Then, the exchangeability assumption is that the pairwise
correlations between pairs of standardized errors is contant, say . Remember, this is a
working assumption that leads to an estimated variance matrix to be used in MWNLS. Neither
consistency of our estimator of 0
o
, nor valid inference, will rest on exchangeability being true.
To estimate a common correlation parameter, let 0

be a preliminary, consistent estimator of


0
o
probably the pooled Bernoulli QMLE. Define the residuals as
it
y
it
m
t
w
it
, 0

and
the standardized residuals as e
it

it
/ w
it
0

1 w
it
0

. Then, a natural estimator of a


common correlation coefficient is
NTT 1
1

i1
N

t1
T

st
e
it
e
is
. (3.4)
Under standard regularity conditions, without any substantive restrictions on Corre
it
, e
is
|x
i
,
the plim of is
14
plim TT 1
1

t1
T

st
Ee
it
e
is

. (3.5)
If Corre
it
, e
is
happens to be the same for all t s, then consistently estimates this constant
correlation. Generally, it consistently estimates the average of these correlations across all
t, s pairs, which we simply define as

.
Given the estimated T T working correlation matrix, C , which has unity down its
diagonal and everywhere else, we can construct the estimated working variance matrix:
Vx
i
, y Dx
i
, 0

1/2
C Dx
i
, 0

1/2
, (3.6)
where Dx
i
, 0

is the T T diagonal matrix with w


it
0

1 w
it
0

down its diagonal.


[Note that dropping the variance scale factor, t
2
, has no effect on estimation or inference.] We
can now proceed to the second step estimation of 0
o
by multivariate WNLS. The MWNLS
estimator, say 0

, solves
min
0

i1
N
y
i
m
i
x
i
, 0

Vx
i
, y
1
y
i
m
i
x
i
, 0, (3.7)
where m
i
x
i
, 0 is the T 1 vector with t
th
element w
it
0
a
x
it

a
x
i

a
. Rather
than pose the estimation problem as one of minimizing a weighted sum of squared residuals,
the GEE approach works directly off of the first-order conditions, but this leads to the same
estimator.
Asymptotic inference without any assumptions on Vary
i
|x
i
is straightforward. As shown
in Liang and Zeger (1986) see also Wooldridge (2003, Problem 12.11) for the case where the
regressors are explicitly allowed to be random a consistent estimator of Avar N 0

0
o
has
the sandwich form,
15
N
1

i1
N

0
m
i

i
1

0
m
i
1
N
1

i1
N

0
m
i

i
1

i
1

0
m
i
N
1

i1
N

0
m
i

i
1

0
m
i
1
, (3.8)
where
0
m
i
is the T P Jacobian of mx
i
, 0 (P is the dimension of 0 evaluated at 0

,
V

i
Vx
i
, y, and
i
y
i
mx
i
, 0

is the T 1 vector of residuals. The matrix used for


inference about 0
o
is simply (3.8) but without the terms N
1
.
Expression (3.8) is fully robust in the sense that only Ey
i
|x
i
mx
i
, 0
o
is assumed. For
fractional responses with unobserved effects, there are no plausible assumptions under which
Vx
i
, y is correctly specified for Vary
i
|x
i
(up to the scale factor t
2
, so we do not consider a
nonrobust variance matrix estimator.
It should be clear that an entire class of MWNLS estimators can be defined by changing
Vx
i
, y. The GEE literature maintains the form of the working variance in this case, given
by (3.1) while varying the working correlatioin matrix. A less restrictive working correlation
matrix estimates a separate correlation coefficient for each distinct t, s pair. So,

ts
N
1

i1
N
e
it
e
is
, and then the working correlation matrix Cp is just the matrix of
correlation coefficients. Vx
i
, y can then be constructed as in (3.6).
While it is tempting to think that using a less restrictive working correlation matrix should
enhance efficiency, this cannot be proven when the matrix Corry
i
|x
i
is a function of x
i
.
Then, both working variance matrices are misspecified and it is not possible to compare the
asymptotic variances of the MWNLS estimators. In fact, it is not even possible to show that
estimating an exchangeable or unrestricted correlation matrix is better than setting Cp I
T
,
as is the case with the pooled Bernoulli quasi-MLE. Still, it seems reasonable that some
accounting for correlation across time periods, even if incorrect, is better than not accounting
16
for serial dependence. In some cases the unrestricted working correlation matrix will be
preferred. In determining which estimator is likely more asymptotically efficient, it is
legitimate to compare the robust standard errors obtained from different working correlation
matrices.
Given any consistent estimator 0

, we estimate the average partial effects by taking


derivatives or changes with respect to the elements of x
t
of
N
1

i1
N

a
x
t

a
x
i

a
. (3.9)
For example, if x
t1
is continuous, the APE is
N
1

i1
N

a1

a
x
t

a
x
i

a
, (3.10)
and we can further average this across x
it
, if desired, or plug in the average value of x
t
. An
asymptotic standard error for (3.10) is given in the appendix. For a quick comparision of the
linear model estimates and the fractional probit estimates (whether estimated by pooled QMLE
or GEE), it is useful to have a single scale factor (at least for the roughly continuous elements
of x
t
). This scale factor also averages out x
it
:
NT
1

i1
N

t1
T

a
x
it

a
x
i

a
, (3.11)
and then we can multiply this factor by the fractional probit coefficients, [

aj
(again, at least for
explanatory variables where it makes sense to compute a derivative). We might obtain a
different scale factor for each t, particularly if we have allowed estimated effects to change
across time in estimating a linear model.
If x
t1
is, say, a binary variable, the average partial effect at time t can be estimated as
17
N
1

i1
N

a
[

a1
x
it1

a1
x
i

a

a
x
it1

a1
x
i

a
, (3.12)
where [

a1
is the coefficient on x
t1
, x
it1
denotes all covariates except x
it1
, and

a1
is the
corresponding vector of coefficients. In other words, for each unit we predict the difference in
mean responses with (x
t1
1 and without (x
t1
0 treatment, and then average the
difference in these estimated mean responses across all units. Again, we can also average
(3.12) across t if we want an effect averaged across time as well as cross-sectional unit.
It is easily seen that everything about the previous analysis carries through if we replace
with throughout. As we discussed at the end of Section 2, using the logistic function
is tantamount to skipping over the taks of modeling Ey
it
|x
it
, c
i
and Dc
i
|x
i
and simply
asserting that (2.12) holds.
4. Models with Endogenous Explanatory
Variables
In our application to studying the effects of spending on test pass rates, there are reasons to
think spending could be correlated with time-varying unobservables, in addition to being
correlated with district-level heterogeneity. For example, districts have latitude about how to
spend rainy day funds, and this could depend on the abilities of the particular cohort of
students. [After all, districts are under pressure to obtain high pass rates on standardized tests.]
In this section, we propose a simple method for allowing a continuous endogenous explanatory
variable, such as spending.
18
How should we represent endogeneity in a fractional response model? For simplicity,
assume that we have a single endogenous explanatory variable, say, y
it2
; the extension to a
vector is straightforward provided we have sufficient instruments.
We now express the conditional mean model as
Ey
it1
|y
it2
, z
i
, c
i1
, v
it1
Ey
it1
|y
it2
, z
it1
, c
i1
, v
it1
o
1
y
it2
z
it1

1
c
i1
v
it1
, (4.1)
where c
i1
is the time-constant unobserved effect and v
it1
is a time-varying omitted factor that
can be correlated with y
it2
, the potentially endogenous variable. Equations similar to (4.1)
have been employed in related cross-sectional contexts, particularly by Rivers and Vuong
(1988) for the binary response case. Wooldridge (2005) showed how the Rivers and Vuong
approach extends readily to the fractional response case for cross-sectional data; in effect, we
would set t 1 and drop c
i1
from (4.1). The exogenous variables are z
it
z
it1
, z
it2
, where we
need some time-varying, strictly exogenous variables z
it2
to be excluded from (4.1). This is the
same as the requirement for fixed effects two stage least squares estimation of a linear model.
As before, we model the heterogeneity as a linear function of all exogenous variables,
including those omitted from (4.1). This allows the instruments to be systematically correlated
with time-constant omitted factors:
c
i1

1
z
i

1
a
i1
, a
i1
|z
i
~Normal0, o
a
1
2
. (4.2)
[Actually, for our application in Section 5, we have more specific information about how our
instrument is correlated with historical factors, and we will exploit that information. For the
development here, we use (4.2).] Plugging into (4.1) we have
Ey
it1
|y
it2
, z
i
, a
i1
, v
it1
o
1
y
it2
z
it1

1

1
z
i

1
a
i1
v
it1

o
1
y
it2
z
it1

1

1
z
i

1
r
it1
. (4.3)
19
Next, we assume a linear reduced form for y
it2
:
y
it2

2
z
it
o
2
z
i

2
v
it2
, t 1, . . . , T, (4.4)
where, if necessary, we can allow the coefficients in (4.4) to depend on t. The addition of the
time average of the strictly exogenous variables in (4.4) follows from the Mundlak (1978)
device. The nature of endogeneity of y
it2
is through correlation between r
it1
a
i1
v
it1
and
the reduced form error, v
it2
. Thus, y
it2
is allowed to be correlated with unobserved
heterogeneity and the time-varying omitted factor. We also assume that r
it1
given v
it2
is
conditionally normal, which we write as
r
it1
p
1
v
it2
e
it1
, (4.6)
e
it1
|z
i
, v
it2
~Normal0, o
e
1
2
, t 1, . . . , T. (4.5)
Because e
it1
is independent of z
i
, v
it2
, it is also independent of y
it2
. Again, using a standard
mixing property of the normal distribution,
Ey
it1
|z
i
, y
it2
, v
it2
o
e1
y
it2
z
it1

e1

e1
z
i

e1
p
e1
v
it2
(4.7)
where the e subscript denotes division by 1 o
e
1
2

1/2
.
The assumptions used to obtain (4.7) would not typically hold for y
it2
having discreteness
or substantively limited range in its distribution. In our application, y
it2
is the log of
per-student, district-level spending, and so the assumptions are at least plausible, and might
generally provide a good approximation. It is a straightfoward manner to include powers of
v
it2
in (4.7) to allow greater flexibility. Following Wooldridge (2005) for the cross-sectional
case, we could even model r
it2
given v
it2
as a heteroskedastic normal. In this paper, we study
(4.7).
In obtaining the estimating equation (4.7), there is an important difference between this
20
case and the case covered in Section 3. As can be seen from equation (2.9), which is the basis
for the estimation methods discussed in Section 3, the explanatory variables are, by
construction, strictly exogenous. That is, if w
it
1, x
it
, x
i
as in Section 3, then
Ey
it
|w
i1
, w
i2
, . . . , w
iT
Ey
it
|w
it
. In (4.7) we make no assumption about how the expectation
would change if we condition on y
is2
, s t. This is a strength of our procedure: we are
accounting for any contemporaneous endogeneity in (4.7) but we still allow for possible
feedback between unobserved idiosyncratic changes in y
it1
, as captured by v
it1
, and future
spending, say y
i,th
,
2
for h 1. Because we do not assume strict exogeneity of
y
it2
: t 1, . . . , T in (4.7), the GEE approach to estimation is generally inconsistent.
Therefore, we focus on the pooled Bernoulli QMLE in the second stage estimation.
We can now summarize how equation (4.7) leads to a simple two-step estimation
procedure for the scaled coefficients:
PROCEDURE 4.1: (i) Estimate the reduced form for y
it2
(pooled across t, or maybe for
each t separately; at a minimum, different time period intercepts should be allowed). Obtain
the residuals, v
it2
for all i, t pairs.
(ii) Use the pooled probit QMLE of y
it1
on y
it2
, z
it1
, z
i
, v
it2
to estimate o
e1
,
e1
,
e1
,
e1
and p
e1
.
Because of the two-step nature of Procedure 4.1, the standard errors in the second stage
should be adjusted for the first stage estimation, regardless of the estimation method used; see
the appendix. However, if p
e1
0, the first stage estimation can be ignored, at least using
21
first-order asymptotics. This means a test for endogeneity of y
it2
is easily obtained as an
asymptotic t statistic on v
it2
; it should be make robust to arbitrary serial correlation and
misspecified variance.
How do we interpret the scaled estimates that are obtained from Procedure 4.1? They
certainly give directions of effects and, because they are scaled by the same factor, relative
effects on continuous variables are easily obtained. For magnitudes, we want to estimate the
APEs. In model (4.1), the APEs are obtained by computing derivatives, or obtaining
differences, in
E
c
i1
,v
it1
o
1
y
t2
z
t1

1
c
i1
v
it1
(4.8)
with respect to the elements of y
t2
, z
t1
. From Wooldridge (2002, Section 2.2.5), (4.8) can be
obtained as
E
z i
,v
it2
o
e1
y
t2
z
t1

e1

e1
z
i

e1
p
e1
v
it2
; (4.9)
that is, we integrate out z
i
, v
it2
and then take derivatives or changes with respect to the
elements of z
t1
y
t2
. Because we are not making a distributional assumption about z
i
, v
it2
, we
instead estimate the APEs by averaging out z
i
, v
it2
across the sample, for a chosen t:
N
1

i1
N
o
e1
y
t2
z
t1

e1

e1
z
i

e1
p
e1
v
it2
. (4.10)
For example, since y
t2
is continuous, we can estimate its average partial effect as
o
e1
N
1

i1
N
o
e1
y
t2
z
t1

e1

e1
z
i

e1
p
e1
v
it2
. (4.11)
If desired, we can further average this across the z
it1
for selected values of y
t2
. We can average
across t, too, to otbain the effect averaged across time as well as across the cross section. As
22
before, we can also compute a single scale factor,
NT
1

i1
N

t1
T
o
e1
y
t2
z
t1

e1

e1
z
i

e1
p
e1
v
it2
, (4.12)
which gives us a number to multiply the fractional probit estimates by to make them
comparable to linear model estimates.
Obtaining a standard error for (4.11) is a challenge because of the two-step estimation,
averaging across i, and the nonlinear nature of the function. The appendix derives a valid
standard error using the delta method.
Before we move to our application, we note that the probit function can be replaced with
the logit function in (4.7), step (ii) of Procedure 4.1, and (4.10). That is, we can just use the
logit response function in estimating the parameters and then in computing the average partial
effects. As discussed at the end of Section 2, the drawback of this approach is that it is not
derivable from a model such as (4.1): we would simply be using the logistic function as an
approximation to the estimable conditional mean, Ey
it1
|y
it2
, z
i
, v
it2
. ]Of course, one can view
the probit response function in this way, too, but we can actually derive (4.7) from underlying
normality assumptions.]
5. Application to Test Pass Rates
We apply the methods described above to the problem of estimating the effects of spending
on math test outcomes for fourth graders in Michigan. Papke (2005) describes the policy
change in Michigan in 1994, where funding for schools was changed from a local, property-tax
23
based systemt to a statewide system supported primarly through a higher sales tax (and lottery
profits). For her econometric analysis, Papke used building-level data for the years 1993
through 1998. Here we use district-level data, for a few reasons. First, we extended the
sample through 2001, and, due to changes in the govermnent agencies in charge of collecting
and reporting the data, we were not able to obtain spending data at the building level. Second,
the district-wide data had many fewer missing observations over the period of interest. The
nonlinear models we apply are difficult to extend to unbalanced panel data a topic for future
research. Third, the instrumental variable that we use for spending, namely, the so-called
foundation grant allocated by a formula varies only at the district level. Consequently,
probably little is lost by using district-level data on all variables.
The data set we use in the estimation contains 501 school districts for the years 1995
through 2001. We use years prior to 1995 to obtain average spending measures. Because the
instrumental variable for spending, the foundation grant, is defined only starting in 1995, we
would not be able to add previous years for the instrumental variable analysis, anyway.
The response variable, math4, is the fraction of fourth graders passing the Michigan
Education Assessment Program (MEAP) fourth grade math test in the district. Papke (2005)
provides a discussion about why this is the most reliable measure of achievement: briefly, its
definition has remained the same over time and the nature of the math test has not radically
changed.
Papke (2005) found that lagged spending had as much of, if not a larger, effect than current
spending. In fact, one can imagine that, if spending in third and fourth grade can affect
achievement in fourth grade, why not spending in earlier years? Because we have spending
back to 1992 and have extended Papkes sample to the years 1999, 2000, and 2001, we can
24
obtain average per student spending in first, second, third, and fourth grade and still have seven
years of data for actual estimation. We convert spending into real dollars, and use a simple
average. In other words, our spending measure is avgrexp rexppp rexppp
1
rexppp
2

rexppp
3
/4, where rexppp is real expenditures per pupil. We use logavgrexp as the
explanatory variable. [We experimented with a weighted average with declining weights, as
well as a two-year average, and the results were similar.]
In addition to the spending variable and year dummies, we include the fraction of students
eligible for the free and reduced-price lunch programs (lunch) and district enrollment [in
logarithmic form, logenroll]. Table 1 contains summary statistics for the key variables for
1995 and 2001.
The linear unobserved effects model estimated by Papke (2005) can be expressed as
math4
it
0
t
[
1
logavgrexp
it
[
2
lunch
it
[
3
logenroll
it
c
i1
u
it1
(5.1)
where i indexes district and t indexes year. Estimating this model by fixed effects is identical
to adding the time averages of the three explanatory variables and using pooled OLS. That
makes the fixed effects estimates directly comparable to the quasi-MLE estimates where we
add the time averages of the explanatory variables to control for correlation between c
i1
and
the explanatory variables. Let x
it
logavgrexp
it
, lunch
it
, logenroll
it
. Then the fractional
probit model we estimate has the form
Emath4
it
|x
i1
, x
i2
, . . . , x
iT
0
at
x
it

a
x
i

a
, (5.2)
where the 0
at
emphasize that we are allowing a different intercept in each year. Recall from
Section 3 that we are able to only identify the scaled coefficients (indexed by a). Nevertheless,
the APEs are identified and depend precisely on the scaled coefficients, and we compute these
25
to make them comparable to the linear model estimates.
Table 2 contains estimates of the linear model in (5.1) and the fractional probit model in
(5.2). For brevity, we report only the coefficients on the three explanatory variables that
change across district and time. We use two methods to estimate the fractional probit model.
The first is the pooled Bernoulli quasi-MLE, where we treat the leftover serial dependence as a
nuisance to be addressed through corrected standard errors. To possibly enhance efficiency,
and as an informal model specification check, we also estimate the fractional probit model
using the generalized estimating equation approach described in Section 3. We use an
exchangeable working correlation matrix; the results using an unrestricted correlation matrix
were similar. In all cases, the standard errors are robust to general heteroskedasticity and serial
correlation; in particular, in the GEE estimation we allow for misspecification of the
conditional variance matrix.
The three sets of estimates tell a consistent story: spending has a positive and statistically
significant effect on math pass rates. The advantage of the linear model is that we can easily
interpret the magnitude of the effect. If log spending increases by .10 a 10 percent increase
in spending the pass rate is estimated to increase by about .038, or 3.8 percentage points, a
practically important effect. The fully robust t statistic on this estimate is 4.95.
In the pooled fractional probit estimation, the estimated coefficient is . 881 and it is very
statistically significant (fully robust t statistic 4.26). As we discussed in Section 3, the
magnitude of the coefficient is not directly comparable to the fixed effects estimate. The
adjustment factor in (3.11) is .337. Therefore, the average partial effect of spending on math
pass rates is about .297, which gives an effect on the pass rate almost one percentage point
below the linear model estimate. The estimates from generalized estimating equation approach
26
are very similar, with [

a,lavgrexp
. 885 and the rounded scale factor also equal to .337.
Interestingly, the fully robust standard errors for the pooled estimation (.207) and the fully
robust standard error for the GEE estimation (.206) are very close. In other words, using a
working correlation matrix in multivariate weighted nonlinear least squares does not appear to
enhance efficiency in this application.
We can also apply the instrumental variables methods from Section 4, but with a slight
modification to exploit our particular setup. The change in school funding in Michigan in
1994 brings with it a natural instrumental variable for spending. Starting in 1995, in place of
property tax revenues, each school was given a foundation grant. The grant amount in each
subsequent year is determined by per student spending in 1994. Low-spending districts were
given a minimum amount, and districts spending above the minimum were given increases that
differed based on how much above their spending differed from the floor. The result was a
nonlinear, nonsmooth relationship between the grant amount and spending in 1994. We can
use the foundation amount as an instrumental variable for spending provided (i) it is exogenous
in the math pass rate equation and (ii) it is partially correlated with spending. The second
condition is easily verified; we do so below. For the first requirement, we include log
spending in 1994, along with year dummy interactions with log spending in 1994, in the math
equation. The idea is that spending in 1994 might have a direct outcome on test score
peformance. (This might be minimal because we are using four years of averaged spending
data. Starting in 1998, no fourth graders would have been in school in 1994. Prior to that
time, we account for spending in 1994 by including it in average spending.) So, we augment
the model by including the new set of explanatory variables. Our identification assumption is
that spending in 1994 would affect performance is a smooth manner, whereas the foundation
27
grant is a nonsmooth function of 1994 spending; see Papke (2005) for more discussion. We
also include the time averages of lunch
it
and lenroll
it
to allow them to be correlated with the
district unobserved effect.
As the instruments for log spending we use the log of the foundation grant and we interact
this variable with a full set of year dummies. Therefore, we can write the first-stage regression
as
logavgrexp
it
p
t
m
t1
logfound
it
m
t2
logrexppp
i,1994

m
5
lunch
it
m
6
lenroll
it
m
5
lunch
i
m
6
lenroll
i
v
it2
(5.3)
so that there are different year intercepts and different slopes on the foundation grant and 1994
spending variables. We need to test whether the coefficients on the foundation variable are
statistically different from zero. We do that via a pooled OLS regression using a fully robust
variance matrix. The Wald test gives a zero p-value to four decimal places. The coefficients
on logfound range from .031 to .334. The test that all coefficients are the same also rejects
with a p-value of zero to four decimal places, so we use the foundation grant interacted with all
year dummies as IVs.
Given the strength of the foundation grant as an instrument for spending, we estimate the
model
math4
it
0
t
[
1
logavgrexp
it
[
2
lunch
it
[
3
logenroll
it

[
4t
logrexppp
i,1994

1
lunch
i

2
logenroll
i
v
it1
(5.4)
by instrumental variables. The results are reported in Column (1) of Table 3. Compared with
the estimates that treat spending as strictly exogenous conditional on c
i1
, the spending
coefficient increases by a nontrivial amount, to .555 (robust t 2.51). The effect of a 10%
increase in spending is now an increase of about 5.5 percentage points on the math pass rate.
28
Papke (2005), using school-level data, and Roy (2003), using very similar district level data,
also found that the IV estimates were above the estimates that treat spending as strictly
exogenous, although the effects estimated by Papke are smaller. [Roy (2003) does not include
1994 spending in the model but uses fixed effects IV, and considers 1996 through 2001 and
1998 through 2001 separately. Roys spending variable is spending lagged one year; here we
average current and three lags of spending.]
Finally, we estimate the effect of spending using the fractional probit model with spending
endogenous. This means we simply add the reduced form residuals, v
it2
, . to the pooled
fractional probit model, along with the other explanatory variables in (5.4). [For the linear
model in (5.4), this would be identical to the 2SLS estimates reported in Table 3.] Recall that
the fully robust t statistic on v
it2
, which is -2.17 in this case, is a test of the null hypothesis of
exogeneity of spending. Therefore, we find some evidence that spending is endogenous.
The coefficient estimate on the spending variable is 1.661 (robust t 2.99), which is almost
double the coefficient estimate when we assumed spending was strictly exogenous conditional
on an unobserved effect. But we must be careful, because these estimates are implicitly scaled
by different factors. The only sensible comparison is to compute the scaling factor and adjust
the coefficient accordingly. The average scale factor across all i and t is .337, so the average
partial effect of a 10 percent increase in spending is about .560. This is very similar to the
linear IV estimate, and suggests that, as in many nonlinear contexts, the linear model does a
very good job of estimating the average partial effect. As with the linear model estimate, the
fractional probit estimate that instruments spending is substantially larger than the estimate that
treats spending as strictly exogenous.
In order to determine the importance of using a nonlinear model to allow for diminishing
29
spending effects beyond the diminishing effect of an extra dollar already built in by using log
of spending we obtain the average partial effects at the 5
th
, 25
th
, 50
th
, 75
th
, and 95
th
percentiles of the spending distribution for the most recent year, 2001. In other words, we
compute the scale factors for the different levels of spending and average out the other
variables, using only the last time period. The results, using the pooled fractional probit
estimates, are given in Table 4.
As expected, the largest APE is at the lowest level of spending. The scaled coefficient
starting at the fifth percentile is .573, which is a large effect: a 10 percent increase in spending
increases the math pass rate by .0573, or about 5.7 percentage points. From here, the APE
decreases. It is .522 at the median level of avgrexp; at the 95
th
percentile, the effect has fallen
to .357. Therefore, there is pretty strong evidence that the same percentage increase in
spending has a larger effect for districts at low levels of spending, an effect that is lost in the
linear model estimation. [Incidentally, for the pooled fractional probit estimation, the fitted
values for 2001 range from .521 to .889; no district is predicted to have higher than a 90
percent pass rate.]
6. Concluding Remarks
We have provided methods for estimating unobserved effects panel data models for
fractional response variables. In addition to allowing the explanatory variables to be correlated
with unobserved heterogeneity, we have also considered the case where an explanatory
variable is correlated with time-varying unobservables. The resulting control function
30
approach, which is implemented in two stages, is computationally simple.
We applied the new methods to the problem of explaining math test pass rates for
Michigan fourth graders, extending the work of Papke (2005) and Roy(2003) to allow for a
nonlinear response. The models that allows spending to be endogenous, where we use the
foundation grant amount as an instrument for spending, provides larger estimated spending
effects than the models that assume spending is exogenous (conditional on a district
unobserved effect). Using more recent data and data at the district level rather than building
level, we find larger estimated effects than in Papke (2005). As often happens in nonlinear
models, the average partial effect from the nonlinear model is quite close to the estimated
effect from a nonlinear model. However, when we look across the distribution of spending,
the effect of 10% more spending starting at the fifth percentile, about 5.7 percentage points, is
quite a bit larger than the effect at the 95th percentile, about 3.6 percentage points. Translated
in dollar terms, an additional $500 per student directed at lower spending districts is predicted
to have a much larger impact on test pass rates than another $500 at higher spending districts.
References
Blundell, R.W. and J.L. Powell, Endogeneity in Semiparametric Binary Response
Models, Review of Economic Studies 71, 655-679.
Chamberlain, G. (1980), Analysis of Variance with Qualitative Data, Review of
Economic Studies 47, 225-238.
Hausman, J.A. and G.K. Leonard (1997), Superstars in the National Basketball
31
Association: Economic Value and Policy, Journal of Labor Economics 15, 586-624.
Liang, K.-Y., and S.L. Zeger (1986), Longitudinal Data Analysis Using Generalized
Linear Models, Biometrika 73, 13-22.
Liu, J.L., J.T. Liu, J.K. Hammitt, and S.Y. Chou (1999), The Price Elasticity of Opium in
Taiwan, 1914-1942, Journal of Health Economics 18, 795-810.
Loudermilk, M.S. (2005), Estimation of Fractional Dependent Variables in Dynamic
Panel Data Models with an Application to Firm Dividend Policy, forthcoming, Journal of
Business and Economic Statistics.
Mundlak, Y. (1978), On the Pooling of Time Series and Cross Section Data,
Econometrica 46, 69-85.
Papke, L.E. (2005), The Effects of Spending on Test Pass Rates: Evidence from
Michigan, Journal of Public Economics 821-839.
Papke, L.E. and J.M. Wooldridge (1996), Econometric Methods for Fractional Response
Variables with an Application to 401(k) Plan Participation Rates, Journal of Applied
Econometrics 11, 619-632.
Petrin, A. and K. Train (2003), Omitted Product Attributes in Discrete Choice Models,
National Bureau of Economic Research Working Paper No. 9452.
Rivers, D. and Q.H. Vuong (1988), Limited Information Estimators and Exogeneity Tests
for Simultaneous Probit Models, Journal of Econometrics 39, 347-366.
Roy, J. (2003), Impact of School Finance Reform on Resource Equalization and
Academic Peformance: Evidence from Michigan, Princeton University, Eduation Research
Section Working Paper No. 8.
Wagner, J. (2001), A Note on the Firm Size-Export Relationship, Small Business
32
Economics 17, 229-337.
Wagner, J. (2003), Unobserved Firm Heterogeneity and the Size-Exports Nexus: Evidence
from German Panel Data, Review of World Economics 139, 161-172.
White, H. (1980), A Heteroskedasticity-Consistent Covariance Matrix Estimator and a
Direct Test for Heteroskedasticity, Econometrica 48, 817-838.
Wooldridge, J.M. (2002), Econometric Analysis of Cross Section and Panel Data.
Cambridge, MA: MIT Press.
Wooldridge, J.M. (2003), Solutions Manual and Supplementary Materials for Econometric
Analysis of Cross Section and Panel Data. Cambridge, MA: MIT Press.
Wooldridge, J.M. (2005), Unobserved Heterogeneity and Estimation of Average Partial
Effects, in Identification and Inference for Econometric Models: Essays in Honor of Thomas
Rothenberg. D.W.K. Andrews and J.H. Stock (eds.), 27-55. Cambridge: Cambridge University
Press.
Appendix
In this appendix, we obtain asymptotic standard errors for the two-step estimators in
Section 4 and the average partial effects. We use the pooled Bernoulli QMLE, as that is the
estimator that is consistent if, even after we account for contemporaneous endogeneity of
spending, spending might nevertheless fail to be strictly exogenous. In the first stage, we
assume that a linear reduced form for y
it2
as been estimated by pooled OLS. A setup that
covers (4.4) as well as the method we apply in Section 5 is
33
y
it2
h
it
v
2
v
it2
, t 1, . . . , T, (A.1)
where h
it
can be any 1 M vector of exogenous variables. Then, under standard regularity
conditions,
N v
2
v
2
N
1/2

i1
N
r
i2
o
p
1, (A.2)
where
r
i2
A
2
1
H
i

v
i2
, (A.3)
H
i
is the T Mmatrix with t
th
row h
it
, A
2
EH
i

H
i
, and v
i2
is the T 1 vector of reduced
form errors. Next, we write
Ey
it1
|y
it2
, h
it
o
1
y
it2
w
it
k
1
p
1
y
it2
h
it
v
2
, (A.4)
where we drop the e subscripting on the scaled parameters for notational simplicity. After
all, it is only the scaled parameters we are estimating. The vector w
it
is the subset of
exogenous variables appearing in the structural equation. In (4.7), they would be 1, z
it1
, z
i
.
As usual, for identification we need w
it
to be a strict subset of h
it
. Our goal is to obtain the
asymptotic variance for the second-step estimators of o
1
, k
1
, p
1
. We collect these into the
parameter vector 0
1
. Then, using the pooled Bernoulli QMLE, the first order condition for 0
1
is

i1
N

t1
T
s
it1
0

1
; v
2


i1
N
s
i1
0

1
; v
2
0, (A.5)
where v
2
is the first-step pooled OLS estimator and s
it1
0
1
; v
2
is the score of the Bernoulli
quasi-log-likelihood for observation i, t with respect to 0
1
:
34
s
it1
0
1
; v
2

g
it

g
it
0
1
y
it1
g
it
0
1

g
it
0
1
1 g
it
0
1

, (A.6)
where g
it
y
it2
, w
it
, v
it2
and we supress (for now) the dependence of v
it2
on v
2
. This is
simply the score function for a probit likelihood, but y
it1
is not (necessarily) a binary variable.
Following Wooldridge (2002, Section 12.5.2), we can obtain the first-order representation for
N 0

1
0
1
:
N 0

1
0
1
A
1
1
N
1/2

i1
N
r
i1
0
1
; v
2
o
p
1 (A.7)
where
A
1
E
0
1
s
i1
0
1
; v
2
E

t1
T
g
it
0
1

2
g
it

g
it
g
it
0
1
1 g
it
0
1

, (A.8)
r
i1
0
1
; v
2
s
i1
0
1
; v
2
F
1
r
i2
v
2
, (A.9)
and
F
1
E
v
2
s
i1
0
1
; v
2
p
1
E

t1
T
g
it
0
1

2
g
it

h
it
g
it
0
1
1 g
it
0
1

. (A.10)
Therefore,
Avar N 0

1
0
1
A
1
1
Varr
i1
0
1
; v
2
A
1
1
. (A.11)
Note that when y
it2
is exogenous, p
1
0, and so, in that case, no adustment is necessary for
the first-stage estimation. This is typical when using control function methods to test for
endogeneity: under the null, it is very easy to compute a valid test statistic.
Because y
it2
is not necessarily strictly exogenous even after we include v
it2
in the equation,
the two scores s
i1
0
1
; v
2
and r
i2
v
2
are not generally uncorrelated. (The scores for time
35
period t are uncorrelated, but there can be correlation across different time periods.)
Generally, a valid estimator of Avar N 0

1
0
1
is

1
1
N
1

i1
N
r
i1
r
i1

1
1
(A.12)
where

1
N
1

i1
N

t1
T

it
0

it


it

it
0

1
1
it
0

, (A.13)

it
y
it2
, w
it
, v
it2
, r
i1

i1
F

1
r
i2
, r
i2

2
1
H
i

v
i2
, (A.14)
and
F

1
p
1
N
1

i1
N

t1
T

it
0

it

h
it

it
0

1
1
it
0

. (A.15)
Note that
1
is just the usual Hessian from the pooled Bernoulli quasi-log-likelihood that is,
the Hessian with respect to 0
1
divided by the cross-sectional sample size. The asymptotic
variance of 0

1
is estimated as
Avar0

1

1
1
N
1

i1
N
r
i1
r
i1

1
1
/N. (A.16)
Of course the asymptotic standard errors are obtained by the square roots of the diagonal
elements of this matrix.
Next, we obtain a standard error for the average partial effects reported in Section 5. First,
we obtain a standard error for the vector of scaled coefficients times the scale factor in (4.12),
which we write generically as
36

1
N
1

i1
N

t1
T

it
0

1
0

1
, (A.17)
where
it
y
it2
, w
it
, v
it2
, as before. In the model with y
it2
assumed exogenous, we drop the
term v
it2
. Let
1
denote the vector of scaled population coefficients, so

1


t1
T
Eg
it
0
1
0
1
. (A.18)
Then we need the asymptotic variance of N
1

1
. We use Problem 12.12 in Wooldridge
(2002b), recognizing that the full set of estimated parameters is 0

, v
2

(except when we
assume y
it2
is exogenous). Then, letting
jg
i
, h
i
,

t1
T
o
1
y
it2
w
it
k
1
p
1
y
it2
h
it
v
2
0
1
, we have
N
1

1
N
1/2

i1
N

t1
T
g
it
0
1
0
1

1
E

jg
i
, h
i
, N o
p
1. (A.19)
From above, we have the first-order representation for N ,
N N
1/2

i1
N
A
1
1
r
i1
r
i2
o
p
1 N
1/2

i1
N
k
i
o
p
1, (A.20)
where A
1
, r
i1
and r
i2
are defined earlier. So the asymptotic variance of N
1

1
is
Var

t1
T
g
it
0
1
0
1

1
Jk
i
, (A.21)
where J E

jg
i
, h
i
, . We only have left to find the Jacobian

jg
i
, h
i
, , and then
to propose obvious estimators for each term We find the Jacobian first with respect to 0
1
and
then with respect to v
2
. For both terms, we need the derivative of the standard normal pdf,
z, which is simply zz. Then
37

0
1
jg
i
, h
i
,

t1
T
g
it
0
1
I
P
1
g
it
0
1
0
1
g
it
, (A.22)
where I
P
1
is the P
1
P
1
identity matrix and P
1
is the dimension of 0
1
. Similarly,

v
2
jg
i
, h
i
, p
1
t1
T
g
it
0
1
g
it
0
1
0
1
h
it
(A.23)
is P
1
P
2
, where P
2
is the dimension of v
2
. It follows that

jg
i
, h
i
,

t1
T
g
it
0
1
I
P
1
g
it
0
1
0
1
g
it
p
1
t1
T
g
it
0
1
g
it
0
1
0
1
h
it
, (A.24)
and its expected value is easily estimated as
J N
1

i1
N

t1
T

it
0

1
I
P
1

it
0

1
0

it
p
1
t1
T

it
0

it
0

1
0

1
h
it
, (A.25)
where
it
y
it2
, w
it
, v
it2
. If y
it2
is assumed to be exogenous, v
it2
is dropped and consists of
only the first term. Finally, Avar N
1

1
is consistently estimated as
N
1

i1
N

t1
T

it
0

1
0

1

1
k

i
t1
T

it
0

1
0

1

1
k

, (A.26)
where all quantities are evaluated at the estimators given previously. This is the full vector of
APEs (assuming continuous explanatory variables). The asymptotic standard error for any
particular APE is obtained as the square root of the corresponding diagonal element in (A.26),
divided by N .
A similar argument gives the APE when we fix one of the regressors at a specific value. In
our case, this is the spending variable, y
t2
. Fix this at y
t2
o
, and let us consider the APE for the
last time period, T. Then
38
t
1
o
1
N
1

i1
N
o
1
y
T2
o
w
iT
k

1
p
1
v
iT2
(A.27)
and it can be shown that a consistent estimator of Avar N t
1
t
1
is
N
1

i1
N
o
1

iT
0

1
t
1
k

i
2
(A.28)
where k
i
is defined in (A.20) and is now
N
1

i1
N

iT
0

1
e
1

o
1

iT
0

iT
p
1

iT
0

iT
0

1
0

1
h
iT
, (A.29)
where e
1

1 0 . . . 0 0 . Note that
iT
y
T2
o
, w
iT
, v
iT2
.
39
Table 1: Sample Means and Standard Deviations, Selected Years
1995 2001
Expenditure Per Pupil, 2000 $ 6,154 (959) 6,963 (907)
Foundation Grant, 2000 $ 5,797 (1,003) 6,173 (670)
Fraction Eligible for Free and Reduced Lunch .280 (.152) .308 (.170)
Enrollment 3,076 (8,156) 3,078 (7,293)
Number of Observations 501 501
Table 2: Estimates Assuming Spending is Conditionally Strictly Exogenous,
1995-2001
Linear Fractional Probit Fractional Probit
(Fixed Effects) (Pooled QMLE) (GEE)
log(arexppp) .377 .881 .885
(.076) (.207) (.206)
lunch . 042 . 219 . 237
. 079 (.207) (.209)
log(enroll) .0021 .089 .088
(.0527) (.138) (.139)
Scale Factor .337 .337
Number of Districts 501 501 501
Notes: (i) The variable arexppp is the average of real expenditures per pupil for the current
and previous three years. (ii) All models contain year dummies for 1996 through 2001. (iii)
The fractional probit estimation includes the time averages of the three explanatory variables.
(iv) The standard errors, in parentheses, are robust to general second moment misspecification
(conditional variance and serial correlation).
40
Table 3: Estimates Allowing Spending to be Endogenous, 1995-2001
Linear Fractional Probit
(Instrumental Variables) (Pooled QMLE)
log(arexppp) .555 1.661
(.221) (.556)
lunch . 062 . 281
. 074 (.210)
log(enroll) .046 .279
(.070) (.182)
v
2
1. 325
() (.610)
Scale Factor .337
Number of Districts 501 501
Notes: (i) The variable arexppp is the average of real expenditures per pupil for the current
and previous three years. (ii) All models contain year dummies for 1996 through 2001, the log
of per pupil spending in 1994, interactions of this variable with a full set of time dummies, and
the time averages of the lunch and enrollment variables. (iii) The instrumental variables are the
log of the foundation grant and that variable interacted with a full set of year dummies. (iv)
The standard errors, in parentheses, are fully robust.
Table 4: Scale Factors at Different Levels of Spending, 2001
Percentile Real Expenditures, 2000 $ Scale Factor
5th 5,943 .345
25th 6,317 .328
50th 6,623 .314
75th 7,136 .289
95th 8,744 .215
Note: These scale factors are computed for the fractional probit estimates in Table 3.
41

Vous aimerez peut-être aussi