Vous êtes sur la page 1sur 3

Identification vs.

Specification
This document is designed to clarify the difference between identification and a specification.
Terms such as simple difference, difference-in-differences, and multivariate regression (or OLS)
are typically used by evaluators to describe the impact evaluation method, that is, the
identification strategy. As we sometimes also use these same terms when talking about an
econometric specification it is worth explaining the difference between identification and
specification. Broadly, an identification strategy is the overarching impact evaluation method
being used to identify causal impact. Specification refers to the precise details of how an analysis
is done. Thus we can have several different specifications coming out of the same randomized
trial: in all of them the identification strategy is randomization, but the specification differs.

Specification

When we analyze data to estimate a relationship (for example, the wage gap between men and
women), we estimate a specific equation. This is known as the specification or functional form.

Sometimes the relationship between a characteristic (e.g. years of education) and an outcome (e.g.
income) is estimated to suggest a causal relationship, but not necessarily. It can also be used just
to identify a correlationa non-causal relationship. For example, there are many ways to look at
the difference between wages of males and wages of females, and many equations that could
estimate those differences.

It is possible to look at (a) the simple difference in wages between men and women (equation 1,
where 0 equals the wage for men, and 1 is the absolute difference for females), (b) how the wage
difference has changed over a period of timethe difference-in-differences (equation 2), or (c) the
difference, holding differences in education and career choice constantusing a multivariate
regression for the estimate (equation 3).

(1) = 0 + 1 + _ = (0 + 1 )0
(2) = 0 + 1 +
(3) = 0 + 1 + 2 + +

Which specification we prefer depends on our question: are we interested in the (1) overall
difference between men and womens wages, (2) how the growth of wages differs between men
and women, or (3) how wages differ for men and women in the same sector with the same amount
of education.

When we are analyzing results from an impact evaluation we often have choices about the precise
specification to use in analyzing results. Just as in the example above, we can compare simple
differences (this time between treatment and control groups rather than men vs women), we can
look at changes in outcomes for treatment vs comparison, or we could compare differences after
controlling for baseline characteristics like education and sector.

Identification

When conducting impact evaluation methods, we are attempting to test causal hypotheses. In doing
so, the choice of method reflects our identification strategy. As a general definition in economics,
the identification problem is the challenge of identifying the true value of a parameter. In impact
evaluation, Angrist and Krueger (1999) describe the problem as identifying the causal effects
from specific events or situations. They then define the identification strategy as how researchers
use non-experimental data to approximate the force of evidence generated by an actual
experiment. 1

When methods such as difference-in-differences, or OLS regression, are described as impact


evaluation methods, and there is no explicit mention of random assignment, it can safely be
deduced that these methods are being used as part of a non-experimental (non-randomized)
evaluation. In other words, there is a comparison group that was not randomly assigned. In these
cases, the identification strategy reflects how the non-randomly assigned comparison group is
either identified or treated in the analysis. So, for example, if multivariate regression is used as an
identification strategy for evaluating the impact of microcredit on profits we would compare profits
of women entrepreneurs with and without microcredit and control for other explanatory variables
we had such as age and education of the women. Our identifying assumption is that everything
that is different between women with and without microcredit that could impact profits is captured
by our control variables and the only difference is take up of microcredit.

Specification and Identification

It is also possible that methods such as multivariate regression are used in conjunction with
randomized evaluations: they are not used to identify or construct a statistically equivalent
comparison groupinstead, random assignment is the identification strategy. In such cases, the
methods describe how the analysis is conductedthe specification. Thus we might do a
randomized evaluation of microcredit and our main identification strategy is randomization. The
main part of the analysis will be a comparison of outcomes for those randomized to receive
microcredit and those who dont receive microcredit. But our precise specification may also
include controls for other factors such as age and education. Thus we can use a multivariate
regression specification within a randomized evaluation but the main identification comes from
randomization. Similarly, if we compare changes in profits for those with microcredit to changes
in profits for those without we are using a difference-in-difference specification within a
randomized identification strategy. Different specifications within a single randomized evaluation
usually produce very similar estimates of impact.

1
Angrist, Joshua D., and Alan B. Krueger (1999): Empirical Strategies in Labor Economics, in Handbook of Labor
Economics, ed. Orley C. Ashenfelter and David Card, vol. 3. North Holland, Amsterdam.
Conclusion

An equation (or series of equations) can describe the relationship between several factors. The
relationship can exist by definition (e.g. the gender wage gap equals the ratio of female wages to
male wages), or it can be estimated (using data); it can be linear or non-linear. For estimates, the
equation is called the specification. If we are trying to establish a causal relationship between two
factors, then we need to employ an identification strategy. Sometimes identification is achieved
through the evaluation design (as is the case in experimental methods). When comparing
evaluation methods, we discuss simple difference, difference-in-differences, regressions, etc.
While technically, the terms describe specifications, when used to describe the method of
evaluation, they should be understood as the non-experimental identification strategy.

Vous aimerez peut-être aussi