Académique Documents
Professionnel Documents
Culture Documents
Specification
This document is designed to clarify the difference between identification and a specification.
Terms such as simple difference, difference-in-differences, and multivariate regression (or OLS)
are typically used by evaluators to describe the impact evaluation method, that is, the
identification strategy. As we sometimes also use these same terms when talking about an
econometric specification it is worth explaining the difference between identification and
specification. Broadly, an identification strategy is the overarching impact evaluation method
being used to identify causal impact. Specification refers to the precise details of how an analysis
is done. Thus we can have several different specifications coming out of the same randomized
trial: in all of them the identification strategy is randomization, but the specification differs.
Specification
When we analyze data to estimate a relationship (for example, the wage gap between men and
women), we estimate a specific equation. This is known as the specification or functional form.
Sometimes the relationship between a characteristic (e.g. years of education) and an outcome (e.g.
income) is estimated to suggest a causal relationship, but not necessarily. It can also be used just
to identify a correlationa non-causal relationship. For example, there are many ways to look at
the difference between wages of males and wages of females, and many equations that could
estimate those differences.
It is possible to look at (a) the simple difference in wages between men and women (equation 1,
where 0 equals the wage for men, and 1 is the absolute difference for females), (b) how the wage
difference has changed over a period of timethe difference-in-differences (equation 2), or (c) the
difference, holding differences in education and career choice constantusing a multivariate
regression for the estimate (equation 3).
(1) = 0 + 1 + _ = (0 + 1 )0
(2) = 0 + 1 +
(3) = 0 + 1 + 2 + +
Which specification we prefer depends on our question: are we interested in the (1) overall
difference between men and womens wages, (2) how the growth of wages differs between men
and women, or (3) how wages differ for men and women in the same sector with the same amount
of education.
When we are analyzing results from an impact evaluation we often have choices about the precise
specification to use in analyzing results. Just as in the example above, we can compare simple
differences (this time between treatment and control groups rather than men vs women), we can
look at changes in outcomes for treatment vs comparison, or we could compare differences after
controlling for baseline characteristics like education and sector.
Identification
When conducting impact evaluation methods, we are attempting to test causal hypotheses. In doing
so, the choice of method reflects our identification strategy. As a general definition in economics,
the identification problem is the challenge of identifying the true value of a parameter. In impact
evaluation, Angrist and Krueger (1999) describe the problem as identifying the causal effects
from specific events or situations. They then define the identification strategy as how researchers
use non-experimental data to approximate the force of evidence generated by an actual
experiment. 1
It is also possible that methods such as multivariate regression are used in conjunction with
randomized evaluations: they are not used to identify or construct a statistically equivalent
comparison groupinstead, random assignment is the identification strategy. In such cases, the
methods describe how the analysis is conductedthe specification. Thus we might do a
randomized evaluation of microcredit and our main identification strategy is randomization. The
main part of the analysis will be a comparison of outcomes for those randomized to receive
microcredit and those who dont receive microcredit. But our precise specification may also
include controls for other factors such as age and education. Thus we can use a multivariate
regression specification within a randomized evaluation but the main identification comes from
randomization. Similarly, if we compare changes in profits for those with microcredit to changes
in profits for those without we are using a difference-in-difference specification within a
randomized identification strategy. Different specifications within a single randomized evaluation
usually produce very similar estimates of impact.
1
Angrist, Joshua D., and Alan B. Krueger (1999): Empirical Strategies in Labor Economics, in Handbook of Labor
Economics, ed. Orley C. Ashenfelter and David Card, vol. 3. North Holland, Amsterdam.
Conclusion
An equation (or series of equations) can describe the relationship between several factors. The
relationship can exist by definition (e.g. the gender wage gap equals the ratio of female wages to
male wages), or it can be estimated (using data); it can be linear or non-linear. For estimates, the
equation is called the specification. If we are trying to establish a causal relationship between two
factors, then we need to employ an identification strategy. Sometimes identification is achieved
through the evaluation design (as is the case in experimental methods). When comparing
evaluation methods, we discuss simple difference, difference-in-differences, regressions, etc.
While technically, the terms describe specifications, when used to describe the method of
evaluation, they should be understood as the non-experimental identification strategy.