JHE Vol. 42 (July 2015)

Volume 42, July 2015 ISSN: 0167-6296
JOURNAL OF
HEALTH
ECONOMICS
JOURNAL OF HEALTH ECONOMICS Publication information: Journal of Health Economics (ISSN 0167-6296). For 2015, volumes 39–44 are scheduled for publication.
Subscription prices are available upon request from the Publisher or from the Regional Sales Office nearest you or from this journal’s website
(http://www.elsevier.com/locate/jhe). Further information is available on this journal and other Elsevier products through Elsevier’s website:
(http://www.elsevier.com). Subscriptions are accepted on a prepaid basis only and are entered on a calendar year basis. Issues are sent by
standard mail (surface within Europe, air delivery outside Europe). Priority rates are available upon request. Claims for missing issues should
Aims and Scope be made within six months of the date of dispatch.
This Journal seeks articles related to the economics of health and medical care. Its scope will include the following
topics: production of health and health services; demand and utilization of health services; financing of health services; Advertising information: If you are interested in advertising or other commercial opportunities please e-mail Commercialsales@elsevier.com
and your enquiry will be passed to the correct person who will respond to you within 48 hours.
measurement of health; behavioral models of demanders, suppliers and other health care agencies; health behaviors
and policy interventions; efficiency and distributional aspects of health policy; and such other topics as the Editors Funding body agreements and policies
may deem appropriate. Applications to problems in both developed and less-developed countries are welcomed. Elsevier has established agreements and developed policies to allow authors whose articles appear in journals published by Elsevier, to
comply with potential manuscript archiving requirements as specified as conditions of their grant awards. To learn more about existing
agreements and policies please visit http://www.elsevier.com/fundingbodies
Editors
J. CAWLEY, Department of Policy Analysis and Department of Economics, Cornell University, Ithaca, NY, USA. Orders, claims, and journal enquiries: Please contact the Elsevier Customer Service Department nearest you:
E-mail: johncawley@cornell.edu St. Louis: Elsevier Customer Service Department, 3251 Riverport Lane, Maryland Heights, MO 63043, USA; phone: (877) 8397126 [toll free
M. CHALKLEY, Centre for Health Economics, University of York, Heslington, York, UK. within the USA]; (+1) (314) 4478878 [outside the USA]; fax: (+1) (314) 4478077; e-mail: JournalCustomerService-usa@elsevier.com
Tokyo: Elsevier Customer Service Department, 4F Higashi-Azabu, 1-Chome Bldg, 1-9-15 Higashi-Azabu, Minato-ku, Tokyo 106-0044, Japan;
E-mail: martin.chalkley@york.ac.uk phone: (+81) (3) 5561 5037; fax: (+81) (3) 5561 5047; e-mail: JournalsCustomerServiceJapan@elsevier.com
M.E. CHERNEW, Department of Health Care Policy, Harvard Medical School, Boston, MA, USA. Singapore: Elsevier Customer Service Department, 3 Killiney Road, #08-01 Winsland House I, Singapore 239519; phone: (+65) 63490222;
E-mail: chernew@hcp.med.harvard.edu fax: (+65) 67331510; e-mail: JournalsCustomerServiceAPAC@elsevier.com
Oxford: Elsevier Customer Service Department, The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, UK; phone: (+44) (1865) 843434;
D. CUTLER, Department of Economics, Harvard University, Cambridge, MA, USA. E-mail: dcutler@harvard.edu fax: (+44) (1865) 843970; e-mail: JournalsCustomerServiceEMEA@elsevier.com
E. MEARA, The Dartmouth Institute of Health Policy & Clinical Practice, Dartmouth College, Lebanon, NH, USA.
E-mail: ellen.r.meara@dartmouth.edu Author enquiries
For enquiries relating to the submission of articles (including electronic submission) please visit this journal’s homepage at
N. RICE, Centre for Health Economics, University of York, Heslington, York, UK. http://www.elsevier.com/locate/jhe. For detailed instructions on the preparation of electronic artwork, please visit http://www.elsevier.com/
E-mail: nigel.rice@york.ac.uk artworkinstructions. Contact details for questions arising after acceptance of an article, especially those relating to proofs, will be provided
L. SICILIANI, Department of Economics and Related Studies, University of York, Heslington, York, UK. by the publisher. You can track accepted articles at http://www.elsevier.com/trackarticle. You can also check our Author FAQs at http://www.
elsevier.com/authorFAQ and/or contact Customer Support via http://support.elsevier.com.
E-mail: luigi.siciliani@york.ac.uk
A.D. STREET, Centre for Health Economics, University of York, Heslington, York, UK. Illustration services
E-mail: andrew.street@york.ac.uk Elsevier’s WebShop (http://webshop.elsevier.com/illustrationservices) offers Illustration Services to authors preparing to submit a manuscript
but concerned about the quality of the images accompanying their article. Elsevier’s expert illustrators can produce scientific, technical and
medical-style images, as well as a full range of charts, tables and graphs. Image ‘polishing’ is also available, where our illustrators take your
Associate Editors image(s) and improve them to a professional standard. Please visit the website to find out more.
J.E. ASKILDSEN, University of Bergen, Bergen, Norway.
K. BAICKER, Harvard School of Public Health, Boston, MA, USA. USA mailing notice: Journal of Health Economics (ISSN 0167-6296) is published bimonthly (January, March, May, July, September and
P.P. BARROS, Universidade Nova de Lisboa, Lisbon, Portugal. November) by Elsevier B.V. (P.O. Box 211, 1000 AE Amsterdam, The Netherlands). Periodicals postage paid at Jamaica, NY 11431 and
additional mailing offices (not valid for journal supplements).
A. BASU, University of Washington, Seattle, WA, USA. USA POSTMASTER: Send change of address to Journal of Health Economics, Elsevier Customer Service Department, 3251 Riverport Lane,
H. BLEICHRODT, Erasmus University, Rotterdam, The Netherlands. Maryland Heights, MO 63043, USA.
R.P. ELLIS, Boston University, Boston, MA, USA.
AIRFREIGHT AND MAILING in USA by Air Business Ltd., c/o Worldnet Shipping Inc., 156-15, 146th Avenue, 2nd Floor, Jamaica, NY 11434,
J. GLAZER, Tel Aviv University, Tel Aviv, Israel. USA.
S. GLIED, Columbia University, New York, NY, USA.
M. GROSSMAN, National Bureau of Economic Research, New York, NY, USA. The paper used in this publication meets the requirements of ANSI/NISO Z39.48-1992
J. GRUBER, Massachusetts Institute of Technology, Cambridge, MA, USA. (Permanence of Paper).
R. KAESTNER, University of Illinois at Chicago, Chicago, IL, USA. Printed by Henry Ling Ltd, Dorchester, UK.
M. KIFMANN, University of Hamburg, Hamburg, Germany.
A. LLERAS-MUNEY, University of California at Los Angeles, Los Angeles, CA, USA.
J. MULLAHY, University of Wisconsin-Madison, Madison, WI, USA.
O.A. O’DONNELL, Erasmus University, Rotterdam, The Netherlands. “For a full and complete Guide for Authors, please go to: http://www.elsevier.com/locate/Jhe”
P. OLIVELLA, Universitat Autònoma de Barcelona, Barcelona, Spain.
C. PROPPER, University of Bristol, Bristol, UK.
A. SCOTT, University of Melbourne, Victoria, Australia.
The Journal of Health Economics has no page charges
doi:10.1016/S0167-6296(15)00058-2
Journal of Health Economics 42 (2015) A1
Contents lists available at ScienceDirect
Journal of Health Economics

journal homepage: www.elsevier.com/locate/econbase
Editorial statement on negative findings
The Editors of the health economics journals named below 2. Authors engaging in “data mining,” “specification searching,”
believe that well-designed, well-executed empirical studies that and other such empirical strategies with the goal of produc-
address interesting and important problems in health economics, uti- ing results that are ostensibly “positive” (e.g. null hypotheses
lize appropriate data in a sound and creative manner, and deploy reported as rejected).
innovative conceptual and methodological approaches compatible
with each journal’s distinctive emphasis and scope have potential Henceforth we will remind our referees of this editorial philos-
scientific and publication merit regardless of whether such stud- ophy at the time they are invited to review papers. As always, the
ies’ empirical findings do or do not reject null hypotheses that may ultimate responsibility for acceptance or rejection of a submission
be specified. As such, the Editors wish to articulate clearly that the rests with each journal’s Editors.
submission to our journals of studies that meet these standards is
encouraged. American Journal of Health Economics
We believe that publication of such studies provides properly European Journal of Health Economics
balanced perspectives on the empirical issues at hand. Moreover, Forum for Health Economics & Policy
we believe that this should reduce the incentives to engage in two Health Economics Policy and Law
forms of behavior that we feel ought to be discouraged in the spirit Health Economics Review
of scientific advancement: Health Economics
International Journal of Health Economics and Management
1. Authors withholding from submission such studies that are oth- Journal of Health Economics
erwise meritorious but whose main empirical findings are highly
likely “negative” (e.g. null hypotheses not rejected).
http://dx.doi.org/10.1016/j.jhealeco.2015.06.002
0167-6296/© 2015 Published by Elsevier B.V.
Journal of Health Economics 42 (2015) 1–16

Information disclosure and peer effects in the use of antibiotics

Illoong Kwon ∗ , Daesung Jun
Graduate School of Public Administration, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul 151-742, Republic of Korea
a r t i c l e i n f o a b s t r a c t
Article history: Mandatory information disclosure may allow sellers to observe and respond to other sellers’ attributes
Received 21 June 2013 (seller peer effects) as well as informing consumers of the sellers’ attributes (consumer learning effect).
Received in revised form 3 June 2014 Using the data from mandatory information disclosure of antibiotic prescription rates for the common
Accepted 24 October 2014
cold in Korea, this paper shows that while average prescription rates decreased after the disclosure, more
Available online 24 February 2015
than 30% of the clinics increased their antibiotic prescriptions. Moreover, clinics that were prescribing
relatively fewer antibiotics than other local clinics before the disclosure requirement were more likely to
JEL classification:
increase their prescription rate. The average prescription rates also declined less in markets with stronger
I1
L1
clinic competition. These results are consistent with seller peer effects.
D8 © 2015 Elsevier B.V. All rights reserved.
Keywords:
Information disclosure
Peer effects
Antibiotic overuse
1. Introduction improve quality (called consumer learning effects) or (ii) sellers learn
their competitors’ attributes from mandatory information disclo-
When sellers have more information than buyers on the sure and influence each other’s quality (called seller peer effects).
attributes of a product, the sellers can overstate the product’s qual- Even though consumer learning effects suggest that mandatory
ity and overcharge the buyers. Such an information asymmetry information disclosure should increase the quality of all sellers,
problem can lead to the collapse of markets (Stigler, 1961), dis- we show that seller peer effects may decrease the quality of some
tort investment decisions, and undermine the quality and safety of sellers.
products and services including health care, foods, education, and Therefore, when introducing mandatory information disclo-
the environment. Therefore, there is an increasing use of manda- sure, it is important for policy makers to understand the existence
tory information disclosure as a regulatory mechanism to address and the extent of seller peer effects. For example, in markets where
this information asymmetry problem and to improve the quality of sellers themselves do not know the attributes of other sellers, an
products and services. information disclosure policy can introduce seller peer effects as
However, mandatory information disclosure can reveal the well as consumer learning effects. Moreover, if the disclosed infor-
attributes of products and services not only to consumers but also mation is difficult for consumers to find or interpret, seller peer
to other competing sellers. That is, even though the previous litera- effects can dominate consumer learning effects.
ture has largely focused on the effects of information disclosure to Before we proceed further, it is worth clarifying the definition
consumers, mandatory information disclosure can directly affect of peer effects in this paper. We define peer effects as a situation
the interaction among sellers. In particular, when a seller learns where an individual’s behavior or decisions are influenced by oth-
that most other sellers were providing lower quality services, the ers’ behavior in a relevant peer group, called “endogenous peer
seller may reduce its quality after the information disclosure. effects” by Manski (1993). Such peer effects can arise from an intrin-
In this paper, we consider a simple theoretical framework to sic social preference for behaving like others. Such peer effects
distinguish whether (i) consumers learn the attributes of sellers can also arise from rational decisions to obtain higher economic
from mandatory information disclosure and pressure the sellers to payoffs. For example, following the behavior of others can be cost-
efficient and rational (Bikhchandani et al., 1992). While some may
argue that peer effects arising from social preference are the true
∗ Corresponding author. Tel.: +82 2 880 8551. peer effects, we are more interested in whether consumers learn
E-mail addresses: ilkwon@snu.ac.kr (I. Kwon), dswin27@snu.ac.kr (D. Jun). about sellers’ behavior from mandatory information disclosure or
0167-6296/© 2015 Elsevier B.V. All rights reserved.
2 I. Kwon, D. Jun / Journal of Health Economics 42 (2015) 1–16
the competing sellers do. Therefore, in this paper, we do not dis- 2. Background and previous literature
tinguish between peer effects based on social preference and peer
effects based on rational or strategic choice. 2.1. Information disclosure
Empirically, we examine the effects of the 2006 mandatory pub-
lic disclosure of the antibiotic prescription rates for the common Most previous literature has focused on the effect of information
cold of every clinic and hospital in Korea. In 2012, the director gen- disclosure on consumers, or consumer learning effects. For example,
eral of the World Health Organization (WHO) warned that overuse without mandatory information disclosure, consumers may not be
of antibiotics has led to widespread drug-resistant pathogens that able to observe the quality of a product. Then, as Akerlof (1970)
are more difficult, toxic, and costly to treat.1 However, antibiotics shows, firms cannot benefit from high quality and may leave the
are still frequently prescribed for the common cold, often because market, which can lead to the collapse of the whole market, called
of patient demands and hospital competition, even though they are the lemon problem. In this case, quality information disclosure to
not useful for fighting infections caused by viruses like the com- consumers would benefit high quality firms, and provide incentives
mon cold, most sore throats, and bronchitis (Bennett et al., 2011; to improve quality.
Robohm and Ruff, 2012). Thus, on February 9th of 2006, the Min- Also, quality information disclosure can allow consumers to
istry of Health and Welfare in Korea began disclosing antibiotic identify high quality firms more easily, and make them more sensi-
prescription rates for the common cold online through the public tive to differences in quality. Then, information disclosure can lead
disclosure website of the Health Insurance Review and Assessment to more competition among firms and may improve the quality of
Service (HIRA). products (see, e.g., Stigler, 1961; Butters, 1977; Salop and Stiglitz,
On average, we find that the antibiotic prescription rates for 1977; Jin and Leslie, 2003).2
the common cold have decreased from 60% to 51% after the However, the empirical evidence on the effect of information
information disclosure. Surprisingly, however, we uncover a large disclosure on quality (or other performance measures) is generally
amount of heterogeneity among the clinics. More than 30% of mixed. For example, Chipty and Witte (1998) find that information
clinics have increased their antibiotic prescription rates after the availability on the quality of child care has no significant effect on
information disclosure. In particular, among clinics whose antibi- the quality of the care. However, Jin and Leslie (2003) find that
otic prescription rates were in the lowest quartile of local clinics information disclosure on restaurants’ hygiene has significantly
before disclosure, almost half of them increased their prescription improved their hygiene.
rates after disclosure. This finding is more consistent with seller (or In the Health Care industry, Vladeck et al. (1988) do not find any
clinic) peer effects. That is, when a clinic finds out that other clinics significant differences in the occupancy rates between high- and
were prescribing relatively more antibiotics than itself, it is more low-mortality rate hospitals after the release of the HCFA (Health
likely to increase its antibiotic prescriptions. Care Financing Administration) data on hospital-specific mortal-
Alternatively, consumers may prefer higher antibiotic pre- ity, while Mennemeyer et al. (1997) do find a small but significant
scription rates, and may have pressured the lower-than-average effect. Longo et al. (1977) examine the impact of an obstetrics con-
antibiotic prescribing clinics to increase their prescription rates. sumer report on hospital behavior in Missouri, and find that half of
However, the evidence shows that for those clinics that were pre- the hospitals improved the quality of their hospital care. Shekelle
scribing antibiotics relatively more than other local clinics before et al. (2008) provide a systematic survey of more recent studies,
the information disclosure, consumers started visiting those clinics but show mixed results as well. In the electricity industry, many
less after the disclosure. Moreover, in townships where consumers states in the US require electricity providers to disclose price and
responded more negatively to the antibiotic prescription rates, the fuel mix so that consumers can compare prices and environmen-
average antibiotic prescription rates decreased more. These results tal impacts. However, these disclosure policies have not induced
suggest not only that consumers learned from the information dis- much consumer switching (Bird, 2009).
closure, but also that informed consumers prefer lower antibiotic Note that the previous literature on mandatory information dis-
prescription rates for the common cold. closure has mainly focused on the changes in consumers’ behavior
We also find that in townships with relatively more clinics, the from learning new information on product quality (consumer learn-
average antibiotic prescription rates after the information disclo- ing effect), which can induce the changes in firms’ behavior. Few
sure decreased less. This result suggests that stronger competition studies, however, have considered the direct effect of informa-
led to relatively higher antibiotic prescription rates and that the tion disclosure on firms’ behavior. Some exceptions include the
clinic peer effects triggered by mandatory information disclosure studies on the effect of information disclosure on firms’ collu-
have reinforced this competition effect. sion (see, e.g., Albaek et al., 1997; Njoroge, 2003).3 However, these
Overall, the empirical evidence supports both consumer learn- studies do not explain why firms often oppose mandatory infor-
ing effects and seller peer effects. The previous literature has mation disclosure.4 Consequently, when the effect of mandatory
implicitly assumed that sellers can observe their competitors’
attributes even before mandatory information disclosure, and has
focused on consumer learning effects. This paper contributes to 2
On the other hand, quality information disclosure may allow consumers to per-
the literature by showing that when sellers cannot observe the ceive the difference between firms, and increase product differentiation among
attributes of their competitors’ products and services, mandatory firms. Then information disclosure would reduce competition (Nelson, 1974; Jin
information disclosure can allow the sellers to learn their com- and Leslie, 2003).
3
petitors’ attributes and potentially trigger perverse peer effects. There is also a theoretical literature that shows firms would disclose their quality
voluntarily if they know each others’ quality, called the unraveling effect (Grossman
Because the seller peer effects can cancel out some of the consumer
and Hart, 1980; Milgrom, 1981). Therefore, it is a theoretical puzzle why firms
learning effects, our results may also explain why some previous in reality do not disclose their quality (see Board, 2009). Our empirical evidence
studies have found no significant effect of information disclosure. suggests that firms may not know each others’ quality (see also Matthews and
Postlewaite, 1985; Shavell, 1994).
4
For example, in 2000 the National Hospital Association opposed a proposal
to impose mandatory information disclosure on fatal and other serious medical
errors. (CNN News February 22, 2000) In 1998, the National Restaurant Association
1
Available from http://www.cbsnews.com/8301-504763 162-57398949- strongly opposed the mandatory display of hygiene “grade cards” (Food Council
10391704/who-antibiotic-overuse-so-prevalent-scraped-knee-could-be-deadly/ News, Vol. 5, Issue 1, January 2002). In 2006, the Korean Congress attempted
I. Kwon, D. Jun / Journal of Health Economics 42 (2015) 1–16 3
information disclosure is insignificant, studies often blame con- new information to the competing clinics (that is, the compet-
sumers’ mistrust, disinterest, or lack of understanding (see, e.g., ing clinics did not know the disclosed information before), in the
Hibbard and Jewett, 1997; Marshall et al., 2000). presence of peer effects, the disclosed information would trigger
Jun and Chung (2011) also analyze the effect of information dis- the social multiplier effects. However, the regulation would not
closure on antibiotic prescription rates in Korea. However, they change clinic or market characteristics, and consequently should
focus on the change in average prescription rates, and do not ana- not change the exogenous or correlated effects. Thus, the change
lyze clinic heterogeneity or consumer response. More importantly, in antibiotic prescription rates after the regulation would be due
they do not consider the difference between consumer learning to the endogenous peer effects, not to the exogenous or correlated
effects and seller peer effects. effects. On the other hand, the information disclosure would also
change the consumers’ demand function. Therefore, we still need
2.2. Peer effects to distinguish between the changes due to clinic peer effects and
the changes due to consumer learning effects.
Information disclosure regulations can provide new informa- Mas and Moretti (2009) distinguish between two types of
tion to competing firms as well as to consumers. In the presence peer effects. When a worker is observed by a high productiv-
of peer effects, learning competitors’ behavior can directly affect ity worker, they find that the productivity of the worker being
firms’ behavior. observed increases. However, when a worker observes another
As Fortin et al. (2007) summarize, peer effects can arise for highly productive worker, the productivity of the first worker does
several reasons. In the context of antibiotic prescription for the not increase. In other words, they find a significant social pres-
common cold, doctors may feel less guilt and prescribe more antibi- sure effect, but no social conformity effect. It is not clear whether
otics when they find that other doctors are prescribing potentially this pattern will hold in other work environments, but their study
ineffective antibiotics as well (the social conformity effect).5 Also, shows that it is important to distinguish between the social pres-
when doctors find out that their peers are prescribing antibiotics sure effect and the social conformity effect in discussing peer
for the common cold, they can learn that prescribing antibiotics for effects.
the common cold may have benefits without strong side effects,
may not lead to drug-resistant bacteria, and may not induce strong 2.3. Antibiotic overuse and the common cold
regulatory resistance (the social learning effect). Finally, when other
doctors prescribe antibiotics to attract more patients instead of The common cold, or Acute Upper Respiratory Tract Infection
educating the patients on the ineffectiveness and potential harms (ARTI), is one of the most common illnesses known to humans and
of antibiotics, doctors may feel it is unfair and become more likely one of the most common reasons patients visit hospitals. Annu-
to prescribe antibiotics to restore equity (the fairness effect).6 ally, about $227 million are estimated to be spent on antibiotics
As Schelling (1978) and Akerlof (1980) point out, these peer for the treatment of the common cold in the United States.8 How-
effects can generate a “social multiplier effect”, as observing other ever, antibiotics do not treat upper respiratory infections caused by
doctors’ antibiotic prescription rates would encourage and rein- viruses like the common cold. Controlled clinical trials have consis-
force the increase in prescription rates even further (see also tently shown that antibiotics therapy does not treat the common
Glaeser et al., 2003; Fischer and Huddart, 2008). Malani et al. (2008) cold. In addition, antibiotics may have caused many complications
argue that overuse of antibiotics is due to a social norm estab- and side-effects. In particular, the overuse of antibiotics has led to
lished among doctors and patients, and is difficult to change in the a rise in antibiotic-resistant bacteria (Gonzales et al., 2001). Infec-
short-term. tions due to penicillin-resistant bacteria are especially difficult to
Empirically, there is increasing evidence of peer effects in treat. In the United States, 4–7 billion dollars are spent on the treat-
economics.7 However, the observed correlation among clinics’ pre- ment of resistant infections each year (Lautenbach et al., 2001;
scription rates does not necessarily imply peer effects, because Bennett et al., 2011).
clinics in a given market may be subject to similar unobserved char- The causes for the overuse of antibiotics, especially for the
acteristics and shocks (exogenous effects and correlated effects). As common cold, are controversial. Patients may demand antibi-
Manski (1993) shows, it is generally difficult to distinguish endoge- otics to mask symptoms and gain psychological comfort (Butler
nous peer effects from exogenous or correlated effects. et al., 1998). Doctors may overprescribe antibiotics to retain their
In this paper, we exploit the introduction of an information patients (Brody, 2005). Also, when it is not clear whether the infec-
disclosure regulation. Note that the regulation can disclose new tion is caused by a virus (that cannot be treated by antibiotics)
information (on prescription rates in our context) not only to con- or bacteria (that can be treated by antibiotics), doctors may sim-
sumers but also to competing clinics. If the regulation does disclose ply prescribe antibiotics to avoid time-consuming medical tests to
discern whether the infection is bacterial or viral. Or doctors may
prescribe antibiotics in order to avoid potential lawsuits and con-
to pass a bill mandating a notice for antibiotics on the prescription, but failed flicts for not providing antibiotics when it turns out to be a bacterial
mostly due to opposition by the hospitals. (See http://www.yakup.co.kr/news/ infection later. Currie et al. (2012) show that financial kickbacks
index.html?cat=11&cat2=51&cat3=&mode=view&nid=82641&num start=5170 (or rebates) from pharmaceutical companies for prescribing antibi-
&pmode=, in Korean.) otics are also important causes of overprescription in China.
5
Doctors often unwillingly prescribe antibiotics due to pressure from patients,
knowing that antibiotics are ineffective for the treatment of the common cold. In
According to OECD Health Data 2011, Korea has the sixth high-
other words, prescribing antibiotics for the common cold can have psychic costs to est rate of antibiotic use among OECD countries. In particular, the
the doctors. Gordon (1989) and Myles and Naylor (1996) argue that in the case of average antibiotic prescription rates for the common cold in 2005
tax evasion, individuals can derive a psychic payoff, or feel less guilt, from adhering were over 60%. Consequently, the prevalence of S pneumoniae with
to the average pattern of their reference group.
6 reduced susceptibility to penicillin is an alarming 70% in Korea,
Spicer and Becker (1980) show that those who believe that they are treated
unfairly by the tax system are more likely to evade taxes. compared to 25% in the United States (Conly, 1998).
7
See, for example, Gaviria and Raphael (2001) for drug use, Wilson (2007) for
cigarette smoking, Sacerdote (2001) for GPA, Carrell et al. (2008) for academic
cheating, Duflo and Saez (2002) for investment decisions, Fortin et al. (2007) for
8
tax evasion. Cited from http://www.npcentral.net/ce/colds/cold.shtml.
In part to reduce the overuse of antibiotic prescriptions, in 2000 (i) average antibiotic prescription rates would decrease;
the Korean government prohibited doctors from selling medica- (ii) an individual clinic’s antibiotic prescription rate can decrease if
tions and allowed them to write prescriptions only. Before 2000, it is relatively higher than other clinics’ or if the social pressure
doctors were allowed to dispense/sell medications to their patients. effect is sufficiently strong;
As a result, under the old system doctors had financial incentives (iii) an individual hospital’s antibiotic prescription rate can increase
to prescribe and dispense more medications including antibiotics. if it is relatively lower than other clinics’ and if the social pressure
In 2000, the Health Insurance Review and Assessment Service effect is sufficiently weak.
(HIRA) was established to monitor and encourage proper drug
prescription. Despite these efforts to prevent overuse of antibi-
otics, antibiotic prescription rates have remained high (Ministry 3.2. Consumer learning effects
of Health and Welfare, 2006). On January 5th, 2006, the Seoul
Administration Court gave a verdict mandating the disclosure of On the demand side, from mandatory information disclosure,
the antibiotic prescription rates for the common cold of every clinic consumers may learn individual clinics’ actual antibiotic prescrip-
and hospital in Korea. tion rates, and update their belief on each clinic’s quality. We
assume that average consumers regard lower antibiotic prescrip-
tion rates for the common cold as a good signal for the clinic’s
3. Theoretical framework quality. Therefore, after the information disclosure, the market
demand for relatively lower (higher) prescribing clinics would
3.1. Peer effects increase (decrease). Then consumer learning along with clinic com-
petition would lead clinics to reduce their antibiotic prescription
On the supply side, we assume that antibiotic prescription rates rates. This is called the consumer learning effect.
are subject to peer effects among clinics. However, peer effects can Note that consumers’ ideal levels of antibiotic prescription rates
arise only when the behavior (that is, antibiotic prescription rates) for the common cold can differ, and some consumers may want
is observable by peers. Ali et al. (2011), for example, show that ado- even higher antibiotic prescription rates for their clinics. However,
lescents’ sleep habits and breakfast consumption (unobservable by market demand reflects the aggregate (or average) of individual
their peers) are not influenced by peers, but that participation in demands. Thus, as long as consumers’ average ideal level of antibi-
sports or eating at fast food restaurants (observable by their peers) otic prescription rates for the common cold is lower than clinics’
are influenced by peers. actual antibiotic prescription rates, the market would regard lower
Suppose that clinics could not observe other clinics’ antibiotic antibiotic prescription rates as a good signal for a clinic’s quality.
prescription rates before the information disclosure. Then manda- Yoo et al. (2009) show in a 2009 survey that 80% of Korean con-
tory information disclosure would allow clinics to observe the sumers think clinics are prescribing too many antibiotics. They also
antibiotic prescription rates of other clinics, and trigger peer effects show that only 10.7% of consumers want antibiotic prescriptions
among the clinics. for the common cold. Therefore, while consumers may want some
According to Mas and Moretti (2009), there can be two types level of antibiotic prescription for the common cold, it appears that
of peer effects. The first type is the social conformity effect. Before most clinics are prescribing more antibiotics than the average con-
the information disclosure, when clinics could not observe other sumer wants. For example, as discussed above, doctors may simply
clinics’ antibiotic prescription rates, each clinic would prescribe prescribe antibiotics to avoid time-consuming medical tests and
antibiotics based on its own norm or expectations of other clin- consumer education to determine whether an infection is bacte-
ics’ antibiotic prescription rates. After the information disclosure, rial or viral. Or doctors may prescribe antibiotics in order to avoid
if a clinic finds out that many other local clinics were prescribing potential lawsuits and conflicts for not providing antibiotics when
relatively more (fewer) antibiotics than itself, the social confor- it turns out to be a bacterial infection, or possibly to gain financial
mity effect would lead the clinic to increase (decrease) its antibiotic kickbacks from pharmaceutical companies.
prescription rates. In the context of antibiotic prescription rates for the common
Note that the social conformity effect may arise from a social cold, mandatory information disclosure can also educate con-
preference for behaving like others or from rational decisions. For sumers about the difference between viral and bacterial infection
example, when a clinic finds out that it is prescribing more antibi- and the ineffectiveness of antibiotics for viral infections such as the
otics than the average, it may reduce its antibiotic prescription rates common cold. Thus, mandatory information disclosure can lower
because of possible social or regulatory pressure, learning of the consumers’ ideal level of antibiotic prescription rates for the com-
increased danger of drug-resistant bacteria or the potential side mon cold as well. Then, low (high) antibiotic prescription rates
effects of antibiotics. for the common cold would become an even better (worse) sig-
The second type of peer effect is the social pressure effect that nal for clinic quality, and this would put stronger market pressure
would pressure clinics to do the socially ‘right’ thing. In the context on clinics to reduce their antibiotic prescription rates. Also, con-
of antibiotic prescription for the common cold, the socially right sumers would become less likely to visit clinics for the common
thing would be not to prescribe antibiotics. cold in general, and more likely to get over-the-counter medicine
Since the social conformity effects for individual clinics are likely from pharmacies instead. That is, the total market demand for clinic
to cancel each other out, mandatory information disclosure would visits is likely to decrease.
lead to less average antibiotic prescription rates among clinics due Moreover, assuming that consumers prefer lower antibiotic pre-
to the social pressure effect. scription rates for the common cold (, which we will test empirically
Then, in the case where clinics could not observe other clin- below), the consumer learning effect should reduce both the aver-
ics’ antibiotic prescription rates before the information disclosure, age and individual antibiotic prescription rates of all clinics. In
we can summarize the peer effects triggered by the information contrast, from Proposition 1, clinic peer effects predict that some
disclosure in the following proposition. clinics would increase antibiotic prescription rates, while the aver-
age prescription rates may fall.
Proposition 1. Suppose that mandatory information disclosure Then, in the case where consumers could not observe clin-
triggers peer effects among clinics. Then, ics’ antibiotic prescription rates before the information disclosure,
(a) Clinic Peer Effect (b) Consumer Learning Effect

Prescription Prescription
Rate Rate
(before disclosure)
(before disclosure)
(after disclosure)
(after disclosure)
Competition Competition
Fig. 1. Effect of information disclosure and interaction with competition.
we can summarize the consumer learning effects triggered by the would decrease even more when there is stronger clinic compe-
information disclosure as follows: tition in the market. (See Fig. 1(b) for an illustration.)
To summarize,
Proposition 2. Suppose that mandatory information disclosure
triggers consumer learning effects. Then, Proposition 3.
(i) market demand for relatively lower (higher) prescribing clinics (i) If mandatory information disclosure triggers clinic peer effects,
would increase (decrease); it would reduce the average antibiotic prescription rates less (in
(ii) total market demand would decrease; absolute value) when competition is stronger.
(iii) the average antibiotic prescription rates would decrease; (ii) If mandatory information disclosure triggers consumer learning
(iv) individual clinic’s antibiotic prescription rates would decrease. effects, it would reduce the average antibiotic prescription rates
even more (in absolute value) when competition is stronger.
3.3. Competition effects
As discussed in the beginning, one of the main justifications for
Suppose that before the mandatory information disclosure reg- the mandatory information disclosure policy has been based on
ulation, neither the clinics nor the consumers could observe other the interaction between consumer learning effects and the com-
clinics’ antibiotic prescription rates. As discussed earlier, clinics petition effect. That is, when consumers are informed of the true
may prescribe antibiotics for cost saving or financial kickbacks. quality of products and services, they would choose sellers with
Also, when consumers cannot observe or compare antibiotic pre- higher quality products and services. Therefore, with mandatory
scription rates, prescribing more antibiotics may increase demand information disclosure, competition among sellers would force the
as it can mask the symptoms, and reduce the number of tests. Then, sellers to increase the quality of products, and/or drive the low
with more clinics, stronger competition can lead to higher antibi- quality product sellers out of the market.
otic prescription rates. For example, Fogelberg and Karlsson (2012) However, Proposition 3 suggests that if the sellers themselves,
and Bennett et al. (2011) show that stronger competition has a not the consumers, are informed of the true quality of their
positive effect on antibiotic prescription rates. competitors’ products and services by mandatory information dis-
Suppose that after the mandatory information disclosure reg- closure, stronger competition can reduce the effect of mandatory
ulation, clinics observe other clinics’ antibiotic prescription rates information disclosure, which undermines the conventional justi-
but consumers cannot, possibly because the disclosed information fication for a mandatory information disclosure policy.
is too difficult for consumers to find or interpret. In the presence of It is also worth emphasizing that Proposition 3 provides a pos-
clinic peer effects, stronger competition would lead to even higher sible way to distinguish between consumer learning effects and
antibiotic prescription rates because if one clinic increases its pre- clinic peer effects even when individual clinic level prescription
scription rates, other clinics would increase their prescription rates data are not available. Propositions 1 and 2 show that individual
as well, in accordance with the social multiplier effect. That is, while clinics’ antibiotic prescription rates would respond to the informa-
Proposition 1 predicts that the average antibiotic prescription rates tion disclosure differently depending on whether the information
will fall due to the social pressure effect, if competition among clin- disclosure triggers clinic peer effects or consumer learning effects.
ics becomes stronger, the prescription rates after the disclosure However, both effects predict that the market average antibiotic
would be relatively larger. Therefore, when competition is strong, prescription rates will fall. Therefore, unless researchers have data
information disclosure would decrease antibiotic prescription rates on individual clinic level antibiotic prescription rates, it can be
less in absolute value. (See Fig. 1(a) for an illustration.) difficult to distinguish between clinic peer effects and consumer
Now suppose that after mandatory information disclosure, con- learning effects.
sumers can observe and compare the antibiotic prescription rates Proposition 3, however, shows that the interaction effect
of local clinics, and that they prefer lower prescription rates. With between competition and information disclosure on average
more clinics, consumers have more choices for clinics, and can antibiotic prescription rates would differ depending on whether
switch clinics more easily. Then, after the information disclosure, the clinic peer effect or the consumer learning effect dominates.
stronger competition among clinics would lead to lower antibiotic Therefore, even with market level data on average antibiotic pre-
prescription rates. That is, if information disclosure triggers con- scription rates, one can potentially distinguish between the two
sumer learning effects, the average antibiotic prescription rates effects.
4. Data Our empirical analysis is based on the quarterly individual clinic

level data on the antibiotic prescription rates for the common cold
Empirically, we analyze individual clinics’ antibiotic prescrip- of every clinic in the city of Seoul, Korea between 2005.Q1 and
tion rates for the common cold before and after the mandatory 2009.Q2. The prescription rate is defined as the number of antibiotic
information disclosure regulation in Korea. In March 2005, People’s prescriptions divided by the number of patient visits per quarter.
Solidarity for Participatory Democracy (PSPD), one of the largest The denominator and the numerator of the prescription rate are not
civil rights organizations in Korea, petitioned for information dis- reported separately. However, we were able to collect the informa-
closure on individual clinics’ antibiotic prescription rates, but the tion on the number of patient visits for the common cold between
petition was denied by the Ministry of Health and Welfare. In June 2005.Q1 and 2006.Q3.
2005, PSPD filed a formal administrative litigation against the gov- We merged the data with the 2007 clinic characteristics data
ernment. On January 5th, 2006, the Seoul Administration Court that contain the medical specialty, location, number of doctors, and
delivered a verdict mandating the disclosure of antibiotic prescrip- clinic age. The location shows both the township (“dong”) and the
tion rates for acute upper respiratory tract infection, or the common district (“gu”). There are 25 districts in Seoul, and within these dis-
cold tricts there are 424 townships in total. We then merged the data
with the 2009 township and district characteristics such as total
“in order to protect the rights of citizens in the matter of choos-
population, area size, share of population with age over 65, and
ing their own treatment options and also to help consumers
share of college graduates.12 Because clinic and township charac-
make better-informed choices on their health care providers by
teristics do not change much over time during our sample period,
disseminating comparative information on antibiotic prescrip-
we do not measure them every quarter.
tion rates.”
The data contain 19 tertiary general hospitals, 43 general hos-
On February 9th, 2006, the Ministry of Health and Welfare began pitals, 77 hospitals, and 2,802 clinics. Tertiary general hospitals
disclosing the antibiotic prescription rates for the common cold of require referrals, except for emergencies. Also, for the common
every clinic and hospital which had written out more than one- cold, most people go to local clinics within walking distance from
hundred prescriptions for antibiotics per quarter. This information their home or work. Therefore, we will restrict our analysis to the
was made available online through the public disclosure website clinics only, resulting in 42,020 observations. However, the results
of the Health Insurance Review and Assessment Service (HIRA) at do not change even when we include all the other hospital types.13
http://www.hira.or.kr. Recall that the prescription rates of the clinics that have written
Note, however, that visiting the HIRA website online has been out less than one-hundred prescriptions for antibiotics per quarter
the only practical way for consumers to find out the disclosed are not disclosed online. We exclude these observations in most of
antibiotic prescription rates. Even though Korea has the highest our analyses, but will include them as a comparison group in testing
internet penetration rates among OECD countries, not all con- the robustness of our results. Also, clinics that are reported either
sumers are familiar with internet use. Moreover, not all consumers before 2006.Q1 only or after 2006.Q1 only are excluded from the
are aware of the existence of such a website. Therefore, it is not data. These restrictions raise concerns about changes in the compo-
entirely clear how many consumers have gained new information sition of the clinics. Therefore, we will control for clinic fixed effects,
through this disclosure. For example, a 2007 survey shows that and also replicate our analysis with the fully balanced sample only,
only 21.5% of consumers knew about the information disclosure which accounts for 63 percent of the observations.
and only 7% of consumers have actually visited the HIRA website Table 1 shows the summary statistics of the selected variables.
to check the antibiotic prescription rates. In contrast, 95% of doctors
knew about the information disclosure.9
Fig. 2 shows how the antibiotic prescription rates are being pre- 5. Clinic peer effects
sented online as of April 2014. Though some of the visual designs
have changed since 2006, the main contents on the webpage have 5.1. Change in average antibiotic prescription rates
not changed much.10 From the HIRA main website, consumers can
select a township and display a list of clinics in the selected town- Fig. 3(a) shows the average prescription rates for the common
ship ranked by antibiotic prescription rates, as shown in Fig. 2(a). cold over time. Recall that the prescription rates were disclosed
Then, consumers can select a clinic and display the antibiotic pre- to the public online on February 9th, 2006, or the first quarter of
scription rates for the common cold of the selected clinic, as shown 2006. The average prescription rate fell by almost 10% from the
in Fig. 2(b). Also, the same webpage shows warnings that antibi- second quarter of 2005 to the second quarter of 2006. To check
otics do not treat the common cold and that ‘good’ clinics prescribe whether this decrease in the prescription rate is due to a change in
antibiotics only when they are necessary. Note that clinics in the clinic composition, Fig. 3(b) shows the same graph for the balanced
same township can be considered as a natural peer group. And sample only, but the results do not change much.
consumers may learn not only the antibiotic prescription rates of In Table 2 we estimate the effect of information disclosure more
their local clinics, but also the fact that antibiotics do not treat the formally. “Disclosure” is a dummy variable equal to one if the date is
common cold.11 strictly after 2006.Q1 and zero otherwise. In column (1) of Table 2,
we run a simple OLS model with the disclosure dummy, quarterly
9
Ministry of Health and Welfare 2007-1-12 press kit “Results from a Survey on right decision. However, such risk adjustments were not made. On the other hand,
Medical Service and Provision After Information Disclosure of Antibiotics Prescrip- the clinics can prescribe antibiotics to such patients under different diagnostic codes.
12
tion Rates”. The township characteristics are available from Seoul Statistics Information.
10 13
As of April 2014, various new performance ratings for several diseases (e.g. dia- Given that the data report the antibiotic prescription rates for the common cold,
betes), operations (e.g. breast cancer), and prescriptions (e.g. antibiotics for acute most clinics in the data are specialized in general medicine, internal medicine, pedi-
otitis media in children) can also be found and compared on the same website. atrics, and otorhinolaryngology. However, some clinics in the data are specialized in
However, this information was not disclosed during our sample period (2005–2009). surgery, tuberculosis, and dermatology. Thus, we will include the clinic or the spe-
11
For some patients, such as infants or those with a prior history of lower respira- cialty fixed effects in our analysis. Also, focusing on general medicine and internal
tory infections, prescribing antibiotics for the common cold proactively can be the medicine specialties only (47% of the sample) does not change the results.
Fig. 2. Online disclosure of antibiotic prescription rates for the common cold. Note: The English translations in the figure are made and inserted by the authors.
dummies, and 23 medical specialty dummies.14 The estimated to −8.98 % . These estimates are close to the estimates from earlier
effect of information disclosure is a 9.67 percentage point reduction studies. For example, Jun and Chung (2011) estimate the effect to
in the average antibiotic prescription rate for the common cold. In be around −9.53% to −6.49 % .
column (2), we control for clinic characteristics and township char- From Propositions 1(i) and 2(iii), the decline in the average
acteristics. In column (3), we control for township fixed effects. In antibiotic prescription rates is consistent with both clinic peer
column (4), we control for clinic fixed effects. Finally, in column effects and consumer learning effects. That is, clinics may have
(5), we use the balanced sample only. These models show that the reduced their antibiotic prescription rates either because of social
estimated effect of information disclosure is in the range of −9.67% pressure to do the right thing or because of market pressure to keep
the informed patients.
However, we cannot rule out the possibility that the decline was
driven by other concurrent unobserved shocks such as a sudden
14
Including year dummies or a linear time trend does not change the results.
Table 1
Summary statistics of selected variable.
N Mean SD Min Max
Antibiotic Antibiotic prescription (%) 42,020 52.97 28.72 0 100

Prescription # of patient visits 17,220 1419 1402 109 29,334
Clinic Medical speciality

Characteristics (2007) # of doctors 41,293 1.18 0.64 0 13
# of doctors and staff 41,296 4.76 4.14 1 79
Hospital age 41,315 11.02 8.74 1 53
Township (dong) # of clinics 42,020 8.74 3.75 1 19

Characteristics (2009) Population 41,878 26,539 8649 1095 51,446
# of clinics per 1000 41,878 3.78 4.59 0.27 73.06
Age over 65 (%) 41,878 9.31 1.96 5.01 17.2
Age under 9 (%) 41,878 8.59 1.64 3.94 14.31
District (gu) College degree (%) 42,020 23.19 7.12 11.74 38.20
decrease in the more severe type of common cold. Jun and Chung rates of every clinic were disclosed at least once during our sam-
(2011) provide difference-in-difference estimates using those clin- ple period. Then, the difference-in-difference estimates are likely
ics whose antibiotic prescription rates were not disclosed online as to underestimate the true effect of the mandatory information dis-
a comparison group. Recall that the clinics that have written out less closure. Therefore, unless specified otherwise, we will focus on the
than one-hundred prescriptions for antibiotics per quarter were disclosed samples with the clinic fixed effects as in column (4) of
not disclosed online. In columns (1)–(3) of Table 3, we replicate Table 2 for a base specification, and use the difference-in-difference
the results from Jun and Chung (2011) with the clinic fixed effects. model in column (3) of Table 3 to check the robustness of the results.
“Open” is a dummy variable to indicate whether the prescription Also, clinics may have known about the mandatory informa-
rates are disclosed online or not. Note that column (2) of Table 3 tion disclosure and changed their prescription rates even before the
shows that the effect of disclosure for the comparison group is actual disclosure on February 9th, 2006. Recall that the litigation
much smaller. Alternatively, in column (3), we use the full sample, for the information disclosure was filed in June 2005, and that the
and estimate the effect of the interaction term between disclosure verdict for the disclosure was delivered on January 5th, 2006. But
and the open dummy variable. From these difference-in-difference our disclosure dummy variable in Tables 2 and 3 assumes that the
estimates, the effect of mandatory information disclosure is about event took place in the beginning of the second quarter in 2006.
−5.2 % . Then, our estimates is likely to under-estimate the true effect of
However, even when a clinic’s antibiotic prescription rate is not mandatory information disclosure.
disclosed online in one quarter, the clinic’s prescription rates can However, the court verdict on January 5th, 2006 came as a sur-
be disclosed online in other quarters when the number of its pre- prise because the government had argued that prescription rates
scriptions becomes more than one hundred. In fact, the prescription were a part of business secrets and were exempted from any
Table 2
Change in antibiotic prescription rates for the common cold (dependent variable = antibiotic prescription rate (%)).
(1) (2) (3) (4) (5)
Disclosure −9.6725*** −9.6682*** −9.5442*** −9.3494*** −8.9836***

(0.3090) (0.2992) (0.2782) (0.1397) (0.1611)
No. of doctors −1.0189*** −1.1080***
(0.3158) (0.3193)
No. of doctors and staff −0.1581*** −0.0802
(0.0518) (0.0517)
Clinic age 0.0019 0.0232
(0.0157) (0.0160)
No. of clinics 0.1124***
(0.0404)
Population (1000) 0.0223
(0.0187)
Age 65 and older (%) 0.4215***
(0.0744)
Quarter = 2 0.7353* 0.6824* 0.6760** 0.6215*** 0.5091***
(0.3760) (0.3634) (0.3375) (0.1657) (0.1929)
Quarter = 3 2.1484*** 1.8128*** 1.8223*** 1.7544*** 1.7320***
(0.3992) (0.3860) (0.3585) (0.1764) (0.2036)
Quarter = 4 −0.0702 −0.1454 −0.1786 −0.1767 −0.2094
(0.3943) (0.3814) (0.3541) (0.1737) (0.2030)
Medical specialities Yes Yes Yes No No
Fixed effect No No Township Clinic Clinic
Observations 42,020 41,160 41,286 42,020 28,035

R-squared 0.0236 0.1053 0.1070 0.1057 0.1089
Notes: Column (3) controls for the township fixed effects. Columns (4) and (5) control for the clinic fixed effects. In column (5), we use the balanced sample only.
*
Significant at 10%.
**
Significant at 5%.
***
Significant at 1%.
.04
(a) full sample
65
.03
prescription rate(%)
60
Density
55
.02
50
.01
2005q1 2006q1 2007q1 2008q1 2009q1
time
(b) balanced sample
0
-100 -50 0 50 100
65
change in prescription rate

Fig. 4. Distribution of the changes in prescription rates.

60
5.2. Clinic heterogeneity

55
Even though the average antibiotic prescription rates have

decreased after the mandatory information disclosure, it turns
50
out that there are large variations. For example, when we plot a
2005q1 2006q1 2007q1 2008q1 2009q1 histogram for the simple difference in the average antibiotic pre-
time
scription rates before and after the information disclosure for each
Fig. 3. Average antibiotic prescription rates.
clinic, Fig. 4 shows that the changes in prescription rates vary
widely across clinics. In particular, 30% of the clinics have increased
their antibiotic prescription rates after the regulation.
information disclosure requirement. Therefore, it is unlikely that Proposition 2(iv) shows that assuming consumers prefer fewer
clinics changed their prescription rates before January 2006. On the antibiotic prescriptions, consumer learning effects should lead to a
other hand, it is possible that some clinics started changing their fall in the prescription rates of every clinic. However, Proposition
prescription rates right after the verdict even before the actual dis- 1(ii) and (iii) show that clinic peer effects can cause individual clin-
closure. In fact, Fig. 3 shows that the average prescription rates ics’ prescription rates to either increase or decrease depending on
started to decline in the first quarter of 2006. Thus, in column (4) of whether their prescription rates are relatively higher or lower than
Table 3, we have excluded the sample from the first quarter of 2006. their peers’. Therefore, Fig. 4 is more consistent with Proposition 1,
Alternatively, in column (5), we redefined the disclosure dummy suggesting the existence of clinic peer effects.
to one if it is strictly after the fourth quarter of 2005. However, the More specifically, Fig. 5 shows each individual clinic’s prescrip-
results do not change much. tion rate over time in one particular township. Note that those
clinics with higher-than-average pre-disclosure prescription rates
Table 3 are more likely to decrease their rates post-disclosure (e.g. ID = E,
Change in antibiotic prescription rates for the common cold: robustness (dependent F). Also, those clinics with lower-than-average prescription rates
variable = antibiotic prescription rate (%)).
pre-disclosure are more likely to increase their rates (e.g. ID = C, D).
(1) (2) (3) (4) (5) These patterns are consistent with the clinic peer effects as dis-
Open = 1 Open = 0 All Open = 1 Open = 1 cussed in Proposition 1(ii) and (iii), even though not all clinics
Disclosure −9.3494*** −3.4096*** −4.0870*** −9.2608*** follow these patterns.
(0.1397) (0.6693) (0.3606) (0.1418) To check whether the patterns in Fig. 5 generalize to other town-
Open 3.4230*** ships, we first create dummy variables, P25, P50, P75, and P100
(0.5727)
where P25 = 1 if a clinic’s antibiotic prescription rate in 2005.Q1
Disclosure × open −5.2779***
(0.3899) is lower than the 25 percentile in a township, P50 = 1 if a clinic’s
Disclosure 1 −9.3696*** antibiotic prescription rate is between the 25 percentile and 50
(0.1492) percentile in a township, and so on. Then, for the townships with
Fixed effect Clinic Clinic Clinic Clinic Clinic
more than 100 observations, we estimate the following model by
Observations 42,020 7,594 49,614 39,400 42,020 each township:
R-squared 0.1057 0.0049 0.0747 0.1068 0.0947
Notes: All regressions include quarterly dummies and clinic fixed effects. Column (1) prescription rateit = ˇ0 + ˇ1 Disclosuret ∗P25i
uses the sample of clinics whose prescription rates are disclosed online (Open = 1).
Column (2) uses the sample of clinics whose prescription rates are not disclosed + ˇ2 Disclosuret ∗P50i + ˇ3 Disclosuret ∗P75i
online because they had less than 100 prescriptions (Open = 0). In column (3), we
use the full sample and estimate the interaction effect between “Disclosure” and + ˇ4 Disclosuret ∗P100i
“Open” dummies. In column (4), we exclude the observations in 2006Q1. In column
(5), we redefine the “Disclosure” dummy to be one if date is strictly after 2005Q4.
+ (Quarter Dummiest ) + ıi + it , (1)
*
**
significant at 5%. where Disclosuret =1 if date is after 2006.Q1, and ıi is a clinic fixed
***
Significant at 1%. effect.
A B C D
100
50
0
E F G H
100
50
0
2005q12006q12007q12008q12009q12005q12006q12007q12008q12009q1
I J
100
50
0
2005q12006q12007q12008q12009q12005q12006q12007q12008q12009q1
time
Graphs by clinic ID
Fig. 5. Change in prescription rate: by each clinic in a township.
Fig. 6 shows the histograms for the estimated ˇ1 , ˇ2 , ˇ3 and ˇ4 From Fig. 6(d), a majority (90%) of the township’s top 25% pre-
for each township. Note that ˇ1 measures the change in antibiotic disclosure clinics decreased their rates post-disclosure. However,
prescription rates for those clinics whose prescription rates before from Fig. 6(a), almost a half (47%) of the lowest 25% pre-disclosure
the information disclosure were in the lowest 25 percentile in the clinics increased their rates post-disclosure.
township. Likewise, ˇ4 measures the change in antibiotic prescrip- Alternatively, in Table 4, we estimate a probit model where the
tion rates for those clinics whose pre-disclosure rates were in the dependent variable is one if a clinic has increased its antibiotic pre-
top 25% in the township. scription rates after the information disclosure regulation, and zero
(a) 0<prescription<25th (b) 25th<prescription<50th

.06
.04
.02 .03
.04
Density
Density
.02
.01
0
-100 -50 0 50 100 -100 -50 0 50 100

change in prescription rate change in prescription rate
(c) 50th<prescription<75th (d) 75th<prescription<100

.04
.03
.02 .03
.02
Density
Density
.01
.01
0
-100 -50 0 50 100 -100 -50 0 50 100

change in prescription rate change in prescription rate
Fig. 6. Change in prescription rates: by prescription rate before information disclosure.

Table 4
Increase in antibiotic prescription rates: probit analysis (dependent variable = 1 if a clinic has increased prescription rate after the information disclosure, = 0 otherwise.)
(1) (2) (3) (4)
P50 −0.2753 ***

−0.2709 ***
(0.0843) (0.1047)
P75 −0.4398*** −0.3572***
(0.0913) (0.1125)
P100 −0.6571*** −0.6148***
(0.0881) (0.1111)
Prescription rate −0.8416*** −0.7941***
Ranking (before disclosure) (0.1074) (0.1342)
Medical speciality Yes Yes Yes Yes
Random effect Township Township Township Township
Observations 2272 1538 2272 1538
Notes: All regressions control for number of doctors, clinic age, number of clinics, population, the share of age 60 and older, medical speciality dummies and the township
random effects. In columns (2) and (4), we use the balanced sample only. P50 is equal to one if prescription rate before the information disclosure was between 25 percentile
and 50 percentile, and zero otherwise. P75 and P100 are defined in a similar way.
*
**
Significant at 5%.
***
Significant at 1%.
otherwise. Column (1) of Table 4 shows that clinics are relatively are consistent with our hypothesis that information disclosure
more likely to increase their rates if they were relatively lower regulation can allow the clinics to learn their competitors’ prescrip-
within a township pre-disclosure. Also, when we control for the tion rates and trigger perverse peer effects.
relative ranking of antibiotic prescription rates within a township
before disclosure, column (3) shows similar results. In column (2)
and (4), we restrict the analysis to the balanced sample, but the 5.3. Regression to the mean
results do not change.
Therefore, not all clinics have decreased antibiotic prescription Regression to the mean can be an alternative explanation for
rates after the information disclosure regulation. In particular, clin- the finding that those clinics with lower than average antibiotic
ics with relatively lower pre-disclosure prescription rates are much prescription rates pre-disclosure are more likely to increase pre-
more likely to increase their rates after the regulation. These results scription rates post-disclosure. To evaluate the extent of regression
to the mean bias, we focus on the sample after 2007.Q1. Then we
hypothetically assume that there was an information disclosure in
Table 5
Regression to the mean (dependent variable = antibiotic prescription rates (%)).
2008.Q1. From Fig. 3, the market prescription rates appear to have
reached a new steady state equilibrium by 2007Q1. Thus, a hypo-
(1) (2) (3) thetical disclosure in 2008Q1 should not have any further effect on
(a) 2007Q1–2009Q2 the prescription rates.
Disclosure (hypothetical) 0.1772 3.8382*** 3.2614*** In Table 5(a), we measure the relative ranking (or CDF) of
(0.1324) (0.2923) (0.3125)
each clinic’s prescription rates within its township in 2007.Q1
Ranking (at 2007Q1) 75.7823*** 76.2029***
(1.3518) (1.6701) (before the hypothetical information disclosure), called “Rank-
Disclosure × ranking −6.4400*** −5.0534*** ing”. Then, we estimate the effect of the interaction term between
(0.4431) (0.4717) the hypothetical disclosure dummy and the ranking. Column (1)
in Table 5(a) shows that the hypothetical disclosure has no sig-
Random effect Clinic Clinic Clinic
nificant effect on the antibiotic prescription rates as expected.
Observations 21,854 21,223 15,547 Column (2) in Table 5(a), however, shows that the interaction term
between the hypothetical disclosure dummy and the prescription
(b) 2005Q1–2006Q4
rate ranking has a negative and significant effect. That is, those
Disclosure (actual) −10.3428*** −1.4340*** −1.3909***
(0.1770) (0.3762) (0.4411) clinics with relatively higher prescription rates in 2007.Q1 have
Ranking (at 2005Q1) 71.2326*** 68.5265*** decreased their prescription rates after the hypothetical informa-
(1.3069) (1.6671) tion disclosure in 2008.Q1. Because there was no real disclosure in
Disclosure × ranking −15.0551*** −14.8017***
2008.Q1, this effect is likely to be due to the regression to the mean
(0.5615) (0.6612)
bias.
Random effect Clinic Clinic Clinic For comparison, in Table 5(b), we estimate the same model for
the sample before 2007.Q1. Because the real information disclo-
Observations 19,306 17,363 12,240
sure was implemented in 2006.Q1, the effect of the interaction
Notes: All regressions control for the number of doctors, the number of doctors and
term between the (real) disclosure dummy and the prescription
staff, clinic age, number of clinics, population, share of age 65 and older, quarterly
dummies, and medical speciality dummies. In column (3), only the balanced samples
rate ranking (in 2005.Q1) would include both the regression to the
are included. In (a), “Ranking” is the relative ranking of antibiotic prescription rates mean bias and the real policy effect. From column (2) in Table 5(a),
within the township in 2007Q1 (=1 if the highest, =0 if the lowest). “Disclosure” is the coefficient of the interaction term for the hypothetical dis-
one if time is after 2008Q1. In (b), “Ranking” is the ranking of antibiotic prescription closure is −6.4, while from column (2) of Table 5(b), that for the
rates within the township in 2005Q1 (=1 if the highest, =0 if the lowest). “Disclosure”
real disclosure is −15.05. Therefore, it seems that even after tak-
is one if time is after 2006Q1 and zero otherwise.
*
Significant at 10%. ing into account the regression to the mean bias, those clinics
**
Significant at 5%. with relatively higher antibiotic prescription rates before the (real)
***
Significant at 1%. information disclosure are more likely to reduce their prescription
Table 6
Competition and information disclosure (dependent variable = antibiotic prescription rates (%)).
(1) (2) (3) (4)
Open=1 All
Disclosure −9.9817*** −9.6624*** −4.0114*** −19.7177***

(0.1800) (0.2283) (0.3618) (2.7591)
Disclosure × competition 1.6304*** 1.8257***
(0.2998) (0.4443)
Open 3.4005*** −2.3096
(0.5738) (1.5321)
Disclosure × open −6.0444*** 10.0319***
(0.4116) (2.7685)
Disclosure × open × competition 1.7853*** 1.8781***
(0.3385) (0.4461)
Fixed effect Clinic Clinic Clinic Clinic
Observations 41,878 27,949 49,405 28,044

R-squared 0.1067 0.1097 0.0756 0.1105
Notes: All regressions include quarterly dummies and clinic fixed effects. “Competition” is measured by the number of clinics per 1000 population. Columns (1) and (2) are
estimated for those clinics whose antibiotic prescription rates are disclosed (Open=1). Columns (3) and (4) incude those clinces whose antibiotic prescription rates are not
disclosed (Open=0). Columns (2) and (4) use the balanced sample only.
*
**
Significant at 5%.
***
Significant at 1%.
rates after the disclosure.15 Again, these results are consistent with disclosure dummy and other market characteristics. Therefore, we
clinic peer effects. will first estimate the heterogeneity in market responses, and ana-
lyze the robustness of the competition effect when controlling for
5.4. Competition effects the interaction effects with other market characteristics.
As discussed in Proposition 3, competition can have different
6. Consumer learning effects
effects on the change in antibiotics prescription rates depending
on who is learning from the information disclosure or whether So far we have focused on the evidence for clinic peer effects on
information disclosure triggers clinic peer effects or consumer the supply side. In this section, we analyze the extent of consumer
learning effects. If consumers learn from the disclosure, stronger learning effects on the demand side.
competition would make the change in the prescription rate more
negative. However, if mainly the clinics learn from the disclosure, 6.1. Consumer preference and learning
stronger competition would make the change less negative. (See
Fig. 1.) Recall that from mandatory information disclosure, consumers
Thus, in Table 6, we measure competition by the number of clin- can learn not only the antibiotic prescription rates of their local
ics per 1000 population, and estimate the effect of competition on clinics but also the fact that antibiotics do not treat the common
the change in antibiotic prescription rates due to the information cold. Then, as discussed in Proposition 2, consumers would visit
disclosure. Table 6 column (1) shows that the interaction between clinics less after the information disclosure. Also, if consumers pre-
the disclosure dummy and competition has a positive and signif- fer lower antibiotic prescription rates, they would visit those clinics
icant effect on the prescription rates. In column (2), we use the with relatively high prescription rates even less frequently.
balanced sample only, but the results are robust. That is, when there Therefore, in Table 7, we estimate how the number of patient
are more competing clinics, the information disclosure regulation visits for the common cold has changed after the information dis-
decreases the prescription rates less in absolute value. closure, especially depending on the clinics’ relative ranking by
In columns (3) and (4), we use the non-disclosed clinics as a antibiotic prescription rates.16
comparison group (open = 0), and estimate the interaction effect Column (1) of Table 7 shows that after the information disclo-
between disclosure and competition. Despite our previous caveat sure, the number of patient visits for the common cold decreased
that this comparison group is likely to underestimate the effect by 15.9%. That is, consumers may have learned that antibiotics do
of information disclosure, the coefficient of the interaction term not treat the common cold, and stopped going to the clinics for the
among disclosure, open, and competition is positive and signifi- common cold. For example, as shown in Fig. 2(b), the information
cant. That is, higher competition reduces the effect of disclosure disclosure website displays a clear message that antibiotics do not
in absolute value. Overall, these results are more consistent with treat the common cold.
clinic peer effects as discussed in Proposition 3. Moreover, column (2) of Table 77 shows that the coefficient
Alternatively, the effect of information disclosure may vary sys- of the interaction between disclosure dummy and lag ranking of
tematically across markets. And our competition measure may be antibiotic prescription rates is negative and significant.17 That is,
correlated with some of the other market characteristics. Though the number of patient visits to those clinics with relatively high
our clinic fixed effects should control for other market character- antibiotic prescription rates have decreased even more. This result
istics, they do not control for the interaction effect between the suggests that consumers did learn and compare local clinics’ antibi-
otic prescription rates. In particular, this result is consistent with
15
Alternatively, we have used the sample (2005Q1–2005Q4) before the actual
16
disclosure, and repeated the placebo test pretending that there was information As Fig. 2(a) shows, the clinics were disclosed in the reverse order of antibiotic
disclosure in 2005Q2. The results are essentially the same as Table 5(a). For example, prescription rate ranking.
17
the estimated coefficient for the interaction between disclosure and prescription We control for the lag ranking because only the prescription rates in the previous
rate ranking is −6.162. quarter are disclosed online.
Table 7
Patient visits and antibiotic prescription rates (dependent variable = log(number of patient visits for the common cold)).
(1) (2) (3) (4) (5) (6)

All All Before After Before After
Disclosure −0.1591*** −0.0872***

(0.0068) (0.0134)
Ranking (t − 1) 0.2145*** −0.0070 −0.1417***
(0.0248) (0.0320) (0.0458)
Disclsoure × −0.1598***
Ranking (t − 1) (0.0200)
Deviation+ −0.0010 −0.0020**
(0.0007) (0.0008)
Deviation− −0.0002 −0.0029***
(0.0006) (0.0008)
Fixed effect Clinic Clinic Clinic Clinic Clinic Clinic
Observations 17,220 14,473 9550 4923 9550 4923

R-squared 0.2848 0.3769 0.3317 0.6068 0.3319 0.6090
Notes: All regressions include quarterly dummies and clinic fixed effects. Columns (3) and (5) are estimated for data before the information disclosure. Columns (4) and (5)
are estimated for data after the information disclosure. Ranking (t − 1) is the lag relative ranking of antibiotic prescription rates in a township where 0 is the lowest and 1
is the highest. Deviation is the lag difference between the clinic’s antibiotic prescription rate and the township median antibiotic prescription rate. Deviation+ is equal to
Deviation if Deviation > 0 and zero otherwise. Deviation− is equal to Deviation if Deviation < 0 and zero otherwise.
*
**
Significant at 5%.
***
Significant at 1%.
our assumption that average consumers prefer lower antibiotic pre- Table 8
Lag vs. contemporenous ranking of antibiotic prescription rates (dependent vari-
scription rates for the common cold.
able = log(number of patient visits for the common cold)).
In columns (3) and (4), we estimate the model separately for
before and after the information disclosure. From column (3), (1) (2)
Before After
before the information disclosure, the relative ranking of a clinic’s
antibiotic prescription rates has no significant effect on the num- Ranking (t − 1) −0.0067 −0.1486***
ber of patient visits. This result confirms that before the disclosure, (0.0320) (0.0467)
Ranking (t) −0.0168 −0.0471
consumers did not know individual clinics’ antibiotic prescription
(0.0320) (0.0628)
rates. Fixed effect Clinic Clinic
However, from column (4) of Table 7, after the information dis-
Observations 9550 4923
closure, the relative ranking of a clinic’s antibiotic prescription rates R-squared 0.3317 0.6069
has a negative and significant effect. Again, this result is consis-
Notes: All regressions include quarterly dummies and clinic fixed effects. Column
tent with consumer learning effects and consumers’ preference for
(1) is estimated for data before the information disclosure. Column (2) is estimated
lower antibiotic prescription rates. for data after the information disclosure. Ranking (t − 1) is the lag relative ranking of
Alternatively, consumers may avoid clinics that deviate from antibiotic prescription rates in a township where 0 is the lowest and 1 is the highest.
the norm, including those clinics with much lower prescription Ranking (t) is the contemporaneous ranking.
*
rates than average. Thus, we measure the difference between each Significant at 10%.
**
Significant at 5%.
clinic’s prescription rate and the township median, called “Devia- ***
Significant at 1%.
tion”. Then, we define Deviation+ as equal to Deviation if Deviation
is positive and zero otherwise. Likewise, we define Deviation− as
equal to Deviation if Deviation is negative and zero otherwise. evidence suggests that consumers actually prefer lower prescribing
Columns (5) and (6) of Table 7 show that both Deviation+ and clinics.20
Deviation− have negative and significant effects on the number of
patient visits after the information disclosure. And the coefficients 6.2. Consumer learning vs. clinic manipulation
of the two variables are not statistically different.18 That is, con-
sumers seem to prefer lower antibiotic prescription rates for all An important caveat for our analysis is that clinics may have
clinics. manipulated the disclosed antibiotic prescription rates. For exam-
Note that we have interpreted the finding that pre-disclosure ple, because the denominator for antibiotic prescription rate is the
low-prescribing clinics are more likely to increase their rates number of patient visits, clinics can reduce antibiotic prescription
post-disclosure as evidence for clinic peer effects. However, rates by asking patients to visit clinics more frequently. However,
such a finding can also arise if the consumers prefer higher as column (1) of Table 7 shows, the number of patient visits has
antibiotics prescription rates, and insist that low-prescribing decreased, rather than increased, after the information disclosure.
clinics should increase their antibiotics prescription rates. In Alternatively, clinics can change the patients’ diagnostic codes
fact, patient demand for antibiotics has been blamed as the for those patients prescribed with antibiotics. Because only the
main reason for the overuse of antibiotics.19 However, our antibiotic prescription rates for the common cold, or acute upper
respiratory tract infection, are disclosed, clinics can change the
diagnostic codes, for example, to a lower respiratory tract infection
18
The p-value is 0.45. such as pneumonia, and continue to prescribe antibiotics. Then,
19
“A number of factors influenced the tendency to overprescribe [antibiotics],
including mostly patient demand, but also time pressure to end patient visits sooner,
fear of malpractice lawsuits if a prescription is denied,...” Forbes (July 9, 2012)
20
available from http://www.forbes.com/sites/gerganakoleva/2012/07/09/private- Currie et al. (2012) also find that the overuse of antibiotics in China is not demand
physicians-drive-up-antibiotic-resistance-helped-along-by-patients/. driven but is largely a supply-side phenomenon.
Table 9
The effect of clinic and consumer characteristics on learning (dependent variable = log(number of patient visits for the common cold)).
(1) (2) (3) (4) (5) (6) (7)
Disclosure −0.0872 ***

−0.0847 ***
−0.0929 ***
−0.0866 ***
−0.0868 ***
−0.0864 ***
−0.0873***
(0.0134) (0.0135) (0.0140) (0.0134) (0.0135) (0.0134) (0.0140)
Ranking (t − 1) 0.2145*** 0.2793*** 0.1550** 0.4100*** −0.1748 0.2077*** 0.0334
(0.0248) (0.0441) (0.0610) (0.0826) (0.1300) (0.0278) (0.1770)
INT (=disclosure × ranking) −0.1598*** −0.2077*** −0.1076*** −0.2932*** −0.0589 −0.1480*** −0.2867***
(0.0200) (0.0242) (0.0290) (0.0348) (0.0488) (0.0204) (0.0655)
INT × clinic age 0.0034*** 0.0032***
(0.0010) (0.0010)
INT × pop density −0.0019* −0.0010
(0.0010) (0.0010)
INT × college grad 0.0057*** 0.0059***
(0.0012) (0.0013)
INT × age 9 under −0.0118** −0.0012
(0.0052) (0.0056)
INT × pediatric −0.0673*** −0.0819***
(0.0220) (0.0234)
Fixed effect Clinic Clinic Clinic Clinic Clinic Clinic Clinic
Observations 14,473 14,251 13,410 14,473 14,425 14,473 13,206

R-squared 0.3769 0.3791 0.3746 0.3783 0.3771 0.3774 0.3795
Note: All regressions include quarterly dummies and clinic fixed effects. Clinic age and pediatric dummy variable are measured by each clinic in 2007. Population density
and share of age 9 under are measured by each township in 2009. The share of college graduates is measured by each district in 2009.
*
**
Significant at 5%.
***
Significant at 1%.
the reported number of patient visits for the common cold would Since we do not have data on individual consumer charac-
decline as shown in column (1) of Table 7. teristics, in Table 9, we estimate how the average consumer
However, if clinics have manipulated the patients’ diagnostic characteristics in each township and clinic characteristics affect
codes in such a way, both the reported number of patient visits for consumer learning. As a benchmark, in column (1) of Table 9,
the common cold and the antibiotic prescription rates for the com- we estimate the effect of an interaction term between disclosure
mon cold would decline together at the same time at the clinic level. dummy and clinics’ antibiotic prescription ranking. The coefficient
But both columns (2) and (4) of Table 7 show that when the ranking of this interaction term measures how much consumers’ response
of antibiotic prescription rates of a clinic declined, the number of to the antibiotic prescription ranking has changed after the infor-
patient visits for that clinic increased. mation disclosure, and thus provides a measure for consumer
Moreover, manipulation of diagnostic codes would affect the learning.
number of patient visits and the antibiotic prescription rates at the In columns (2)–(6), we estimate how this measure for con-
same time. On the other hand, because the antibiotic prescription sumer learning depends on clinic and consumer characteristics.
rates in the previous quarter are disclosed online, consumer learn- Column (2) shows that the interaction term with the clinic age is
ing effects suggest that the number of patient visits would depend positive and significant. Because consumers respond negatively to
on the lag antibiotic prescription rates, not on the contemporane- the antibiotics ranking after the information disclosure, this result
ous rates. implies that the consumer learning is smaller in absolute value if
Thus, in columns (1) and (2) of Table 8, we control for both the clinics are older. One interpretation is that for older clinics, con-
lag ranking of antibiotic prescription rates and contemporane- sumers have observed their quality for a long time and have a strong
ous rates at the same time. Note that the contemporaneous rates prior on their quality. Then, the newly disclosed information would
are not significant either before or after the disclosure. However, not change consumers’ posterior belief on their quality much. Thus,
the lag prescription rates are significant only after the disclosure. when old clinics’ antibiotic prescription ranking increases, con-
Therefore, while we cannot rule out the possible manipulation of sumers may respond less negatively.
disclosed prescription rates, our results appear to be driven by Column (3) in Table 9 shows that the interaction term with pop-
consumer learning effects rather than manipulation of disclosed ulation density has a negative and significant effect, which implies
prescription rates.21 that consumer learning is larger in absolute value when population
density within a township is high. Recall that a 2007 government
6.3. Heterogeneity in consumer learning survey showed that only 7% of consumers have actually visited the
information disclosure website. The significant evidence for con-
We consider consumer learning as a process where consumers sumer learning despite the low website visit rates suggests that
use the disclosed information to (Bayesian) update their belief on consumers may have learned from some of their neighbors. Thus,
clinics’ quality. Because consumers’ priors and weights on the dis- our result is consistent with a hypothesis that in more densely pop-
closed information are subjective, the extent of learning can be ulated areas, consumer learning through neighbors is likely to be
heterogeneous depending on consumer and clinic characteristics. more important.
Column (4) shows that the interaction term with the share of
college graduates has a positive and significant effect. That is, the
consumer learning effect is smaller in absolute value in townships
21
Alternatively, consumers may have interpreted high antibiotic prescription with a more highly educated population. This result may seem
rates as a signal for more infectious patients and avoided that clinic. This is unlikely,
counter-intuitive. However, a 2010 survey shows that more highly
however, because the awareness of hospital-acquired infection has unfortunately
been very low in Korea, especially during our sample period. educated respondents are more likely to believe that antibiotics can
Table 10
The effect of clinic and consumer characteristics on learning (dependent variable = antibiotic prescription rates (%)).
(1) (2) (3) (4) (5) (6) (7)
D (=disclosure) −9.3494 ***

−11.9383 ***
−7.4970 ***
−12.3032 ***
−5.6907 ***
−8.2533 ***
−8.9290***
(0.1397) (0.2726) (0.4109) (0.4833) (0.7831) (0.1926) (1.0503)
D × competition 1.5377*** 1.2226*** 1.6463*** 1.2601*** 1.2523*** 0.7942**
(0.3000) (0.3108) (0.2998) (0.3069) (0.2981) (0.3140)
D × clinic age 0.1514*** 0.1505***
(0.0159) (0.0165)
D × pop density −0.1008*** −0.0668***
(0.0153) (0.0157)
D × college grad 0.1000*** 0.0809***
(0.0193) (0.0198)
D × age 9 under −0.4825*** −0.1705*
(0.0857) (0.0908)
D × pediatric −8.4123*** −8.3416***
(0.3495) (0.3666)
Fixed effect Clinic Clinic Clinic Clinic Clinic Clinic Clinic
Observations 42,020 41,189 38,956 41,878 41,878 41,878 38,311

R-squared 0.1057 0.1092 0.1062 0.1073 0.1074 0.1197 0.1223
Note: All regressions include quarterly dummies and clinic fixed effects. Clinic age and pediatric dummy variable are measured by each clinic in 2007. Population density
and share of age 9 under are measured by each township in 2009. The share of college graduates is measured by each district in 2009.
*
**
Significant at 5%.
***
Significant at 1%.
treat the common cold.22 Also, more highly educated consumers less after information disclosure regulation for older clinics and for
may put more weight on their own prior belief and relatively less townships with more educated consumers.
weight on the newly disclosed information by the government. To confirm such predictions, in Table 10, we estimate how the
Thus, the consumer learning effect can be smaller for more highly changes in antibiotic prescription rates vary with clinic and con-
educated consumers. sumer characteristics. More specifically, as in Table 9, we interact
Column (5) shows that the interaction term with the share of clinic and consumer characteristics with the disclosure dummy
children has a negative and significant effect. That is, consumer variable. In our base specification, column (1), the average antibi-
learning is larger in absolute value in townships with more chil- otic prescription rates have decreased by 9.34% after the mandatory
dren. This result suggests that parents with children are either information disclosure. Column (2) shows that the interaction term
putting more weight on the disclosed information or are more sen- between the disclosure dummy and clinic age is positive and sig-
sitive to the perceived quality of clinics. Thus, they respond more nificant. That is, the antibiotic prescription rates decreased less for
sensitively to the disclosed antibiotics prescription ranking. Simi- older clinics. Note that this pattern is consistent with the pattern
larly, column (6) shows that patient visits to pediatric clinics have of consumer learning found in column (2) of Table 9.
become more sensitive to the antibiotics ranking after the disclo- Likewise, the estimates from columns (3)–(7) of Table 10 are
sure, compared with the other specialities. remarkably consistent with those found in the corresponding
In column (7), we control for all the interactions at the same columns (3)–(7) in Table 9. That is, Table 9 shows that the extent of
time. The qualitative results do not change. But the interaction consumer learning from information disclosure is larger in more
with the share of children becomes insignificant largely due to the densely populated townships with relatively more children or
control for the interaction with the pediatric dummy variable. The pediatric clinics but with fewer college graduates. Then, given
interaction with population density also becomes insignificant, but consumer preference for lower antibiotic prescription rates, antibi-
the sign of the coefficient does not change. otic prescription rates should decrease more in such townships.
These results suggest that there is significant heterogene- Consistent with this prediction, columns (3)–(7) of Table 10 show
ity in consumer learning, depending on clinic and consumer that the antibiotic prescription rates have decreased more in more
characteristics.23 densely populated townships with relatively more children or
pediatric clinics but with fewer college graduates. Also note that
the positive effect of the interaction between disclosure and com-
6.4. Heterogeneity in consumer learning and change in antibiotic petition is robust in all specifications.
prescription rates
7. Conclusion
The patterns of heterogeneity in consumer learning are impor-
tant as they can help policy makers to predict when the information Mandatory information disclosure is an increasingly popular
disclosure regulation will be more effective. For example, our evi- regulatory device to reduce the information asymmetry problem
dence shows that for older clinics and more educated consumers, between sellers and buyers. Consequently, the previous literature
the extent of learning from the information disclosure is smaller. has focused on whether consumers learn from the information
Thus, we can predict that the antibiotic prescription rates would fall disclosure and pressure sellers to increase their quality, called con-
sumer learning effects. This paper, however, shows that mandatory
information disclosure can also allow sellers to observe their com-
22
petitors’ attributes, and trigger peer effects among them. More
Korea Food & Drug Administration PressKit (2011-04-26).
23
We have also analyzed the effects of the share of elderly in the population and
specifically, in the context of antibiotic prescription rates for the
the size of local tax revenue. But the effects were not significant, and are not reported common cold, this paper shows that some clinics have increased,
due to space constraints. instead of decreasing, their antibiotic prescription rates when they
found out that other local clinics were prescribing more antibiotics Duflo, E., Saez, E., 2002. Participation and investment decisions in a retirement plan:
than they were. Therefore, even though the average prescription the influence of colleagues’ choices. Journal of Public Economics 85, 121–148.
Fischer, P., Huddart, S., 2008. Optimal contracting with endogenous social norms.
rates have decreased after the mandatory information disclosure, American Economic Review 98 (4), 1459–1475.
the decline was smaller when there were more peer clinics. Sara, F., Karlsson, J., 2012. “Competition and Antibiotics Prescription,” IFN Working
In the literature on peer effects, this paper provides an alterna- Paper No. 939.
Fortin, B., Lacroix, G., Villeval, M.-C., 2007. Tax evasion and social interactions. Jour-
tive way of identifying peer effects. While most previous studies nal of Public Economics 91 (11), 2089–2112.
on peer effects have attempted to find an exogenous change in the Gaviria, A., Raphael, S., 2001. School based peer effects and juvenile behavior. Review
behavior of peers, it is difficult to find an exogenous shock that of Economics and Statistics 83 (2), 257–268.
Glaeser, E., Shleifer, A., 2003. The rise of the regulatory state. Journal of Economic
affects some peers but not others. This paper suggests that the
Literature 41, 401–425.
exogenous change in the observability of peer behavior can be an Gonzales, R., Malone, D.C., Maselli, J.H., Sande, M.A., 2001. Excessive antibiotic use
alternative way of identifying the peer effects, which can avoid the for acute respiratory infections in the United States. Clinical Infectious Diseases
33 (6), 757–762.
“reflection problem” discussed by Manski (1993).
Gordon, J.P.P., 1989. Individual morality and reputation costs as deterrents to tax
This paper also finds significant evidence for consumer learning evasion. European Economic Review 33 (4), 7–805.
effects. After the information disclosure, consumers were less likely Grossman, S.J., Hart, O.D., 1980. Disclosure laws and takeover bids. Journal of Finance
to visit clinics with higher antibiotic prescription rates for the com- 35 (2), 323–334.
Hibbard, J.H., Jewett, J.J., 1997. Will quality report cards help consumers? Health
mon cold. This result suggests that overuse of antibiotics may not Affairs 16 (3), 218–228.
be driven by patient demands, but by hospital competition and an Jin, G.Z., Leslie, P., 2003. The effect of information on product quality: evidence
information asymmetry problem. We also find significant hetero- from restaurant hygiene grade cards. Quarterly Journal of Economics 118 (2),
409–451.
geneity in consumer learning. In particular, after the information Jun, D., Chung, G., 2011. Analysis on the effect of information disclosure – antibiotic
disclosure, consumers became more sensitive to the ranking of prescription rates for the common cold in hospitals and clinics in Seoul. Korean
antibiotic prescription rates in townships with younger clinics, Association for Policy Studies 20 (2), 109–142.
Lautenbach, E., Patel, J.B., Bilker, W.B., Edelstein, P.H., Fishman, N.O., 2001. Extended-
higher population density, lower education, and more children or spectrum ˇ-lactamase-producing Escherichia coli and Klebsiella pneumoniae:
pediatric clinics. Consequently, we find that antibiotic prescription risk factors for infection and impact of resistance on outcomes. Clinical Infec-
rates have also declined more in such townships. These results may tious Diseases 32 (8), 1162–1171.
Longo, D.R., Garland, L.G., Wayne Schramm, Judy Fraas, Barbara Hoskins, Vicky
explain why some previous studies have found that the informa-
Howell, 1977. Consumer reports in health care: do they make a difference in
tion disclosure policy had a significant effect while others have not. patient care? Journal of the American Medical Association 278 (19), 1579–1584.
These results also suggest when the information disclosure policy Marshall, M.N., Shekelle, P.G., Leatherman, S., Brook, R.H., 2000. The public release of
performance data: what do we expect to gain: a review of the evidence. Journal
can be most effective.
of the American Medical Association 283 (14), 1866–1874.
One of the limitations of this study is that we do not analyze Malani, A., Buchman, T.G., Dushoff, J., Effron, M.B., 2008. Antibiotic overuse: the
what caused clinic peer effects. For example, such peer effects may influence of social norms. Journal of the American College of Surgeons 265.
arise from rational selfish decisions such as social learning or strate- Manski, C.F., 1993. Identification of endogenous social effects: the reflection prob-
lem. Review of Economic Studies 60 (3), 531–542.
gic interactions, or from intrinsic social preference. Such an analysis Mas, A., Moretti, E., 2009. Peers at work. American Economic Review 99 (1), 112–145.
could be an interesting topic for future studies. Matthews, S., Postlewaite, A., 1985. Quality testing and disclosure. RAND Journal of
Economics, 328–340.
Mennemeyer, S.T., Morrisey, M.A., Howard, L.Z., 1997. Death and reputation: how
References consumers acted upon HCFA mortality information. Inquiry 34 (2), 117–128.
Milgrom, P., 1981. Good news and bad news: representation theorems and applica-
Akerlof, G.A., 1970. The market for ‘lemons’: quality uncertainty and the market tions. Bell Journal of Economics 12 (2), 380–391.
mechanism. Quarterly Journal of Economics 84 (3), 488–500. Myles, G.D., Naylor, R.A., 1996. A model of tax evasion with group conformity and
Akerlof, G.A., 1980. A theory of social custom, of which unemployment may be one social customs. European Journal of Political Economy 12 (1), 49–66.
consequence. Quarterly Journal of Economics 94 (4), 749–775. Nelson, P., 1974. Advertising as information. Journal of Political Economy 81 (4),
Albaek, S., Mollgaard, P., Overgaard, P.B., 1997. Government-assisted oligopoly 729–754.
coordination? A concrete case. Journal of Industrial Economics 45 (4), Njoroge, K., 2003. Information pooling and collusion: implications for the livestock
429–443. mandatory reporting act. Journal of Agricultural and Food Industrial Organiza-
Ali, M.M., Heiland, F.W., 2011. Weight-related behavior among adolescents: the role tion 1.
of peer effects. PLoS ONE 6 (6), e21179. Robohm, C., Ruff, C., 2012. Diagnosis and treatment of the common cold in pedi-
Bennett, D., Che-Lun Hung, Tsai-Ling Lauderdale, 2011. Health care competition and atric patients. Journal of the American Academy of Physician Assistants 25 (12),
antibiotic use in Taiwan. Harris School of Policy 12. 43–47.
Bikhchandani, S., Hirshleifer, D., Welch, I., 1992. A theory of fads, fashion, custom, Sacerdote, B., 2001. Peer effects with random assignment: results for Dartmouth
and cultural change as informational cascades. Journal of Political Economy 100 roommates. Quarterly Journal of Economics 116 (20), 681–704.
(5), 992–1026. Salop, S., Stiglitz, J., 1977. Bargains and Ripoffs: a model of monopolistically com-
Bird, L., 2009. Disclosure Issues: Renewable Energy Purchasing. Mimeo. petitive price dispersion. Review of Economic Studies 44 (3), 493–510.
Board, O., 2009. Competition and disclosure. Journal of Industrial Economics 57 (1), Schelling, T.C., 1978. Micromotives and Macrobehavior. Norton, New York/London.
197–213. Shavell, S., 1994. Acquisition and disclosure of information prior to sale. RAND
Brody, H., 2005. Patient ethics and evidence-based medicine – the good healthcare Journal of Economics, 20–36.
citizen. Cambridge Quarterly of Healthcare Ethics 14, 141–146. Shekelle, P.G., Lim, Y.-W., Mattke, S., Damberg, C., 2008. Does public release of
Butler, C.C., Rollnick, S., Maggs-Rappaport, R.F., Slott, N., 1998. Understanding the performance results improve quality of care? A systematic review. The Health
culture of prescribing: qualitative study of general practitioners’ and patients’ Foundation, London, UK.
perception of antibiotics for sore throats. BMJ 317 (7159), 637–642. Spicer, M.W., Becker, L.A., 1980. Fiscal inequity and tax evasion: an experimental
Butters, G.R., 1977. Equilibrium distributions of sales and advertising prices. Review approach. National Tax Journal 33 (2), 171–175.
of Economic Studies 44 (3), 465–491. Stigler, G.J., 1961. The economics of information. Journal of Political Economy 69 (3),
Carrell, S.E., Malmstrom, F.V., West, J.E., 2008. Peer effects in academic cheating. 213–225.
Journal of Human Resources 43 (1), 173–207. Vladeck, B.C., Goodwin, E.J., Myers, L.P., Sinisi, M., 1988. Consumers and hospital
Chipty, T., Witte, A.N., 1998. “Effects of Information Provision in a Vertically Differ- use: the HCFA ‘death list’. Health Affair 7 (1), 122–125.
entiated Market,” NBER Working Paper No. 6493. Wilson, J., 2007. Peer effects and cigarette use among college students. Atlantic
Conly, J., 1998. Controlling antibiotic resistance by quelling the epidemic of overuse Economic Journal 35 (2), 233–247.
and misuse of antibiotics. Canadian Family Physician 44, 1769–1784. Yoo, H.J., Song, E., Lee, K.U., Lee, E.K., Lee, J.A., 2009. Misuse of antibiotics and related
Currie, J., Lin, W., Meng, J., 2012. “Antibiotic Abuse in China: Supply or Demand?,” awareness of consumers. Journal of Korean Association for Crisis and Emergency
Working Paper. Management 1, 98–122.

Recessions, healthy no more?

Christopher J. Ruhm ∗,1
University of Virginia and National Bureau of Economic Research, Charlottesville, VA 22904-4893, United States
Article history: Over the 1976–2010 period, total mortality shifted from strongly procyclical to being weakly or unrelated
Received 24 September 2014 to macroeconomic conditions. The association is likely to be poorly measured when using short (less
Received in revised form 10 March 2015 than 15 year) analysis periods. Deaths from cardiovascular disease and transport accidents continue to
Accepted 11 March 2015
be procyclical; however, countercyclical patterns have emerged for fatalities from cancer mortality and
Available online 20 March 2015
external causes. Among the latter, non-transport accidents, particularly accidental poisonings, play an
important role.
Keywords:
© 2015 Elsevier B.V. All rights reserved.
Mortality
Health
Recessions
Macroeconomic conditions
1. Introduction the state unemployment rate was estimated to decrease total

mortality by 0.5% and motor vehicle and cardiovascular disease
Health is usually thought to worsen when the economy weak- (CVD) deaths by 3.0% and 0.5%, with reductions also observed
ens, but substantial recent research suggests that mortality actually for fatalities from influenza/pneumonia, liver disease, non-vehicle
declines during such periods. Following Ruhm (2000), most recent accidents and homicides. By contrast, cancer mortality was unaf-
studies utilize longitudinal data and panel techniques to control for fected and suicides were estimated to rise by 1.3%3 . Using similar
many confounding factors, including time-invariant area-specific empirical methods, the procyclicality of total mortality has been
determinants and characteristics that vary over time in a uni- confirmed for Germany (Neumayer, 2004), Spain (Tapia Granados,
form manner across locations1 . Using data from a variety of 2005), France (Buchmueller et al., 2007), Mexico (Gonzalez and
countries and time periods, these investigations provide strong Quast, 2011), Canada (Ariizumi and Schirle, 2012), OECD countries
evidence of procyclical fluctuations in total mortality and sev- (Gerdtham and Ruhm, 2006), and Pacific-Asian nations (Lin, 2009)4 .
eral specific causes of death2 . In Ruhm’s (2000) analysis of the Motor vehicle and CVD fatalities are procyclical in almost all stud-
U.S., covering 1972–1991, a one percentage point increase in ies, with more variation in mortality from other causes5 .
∗ Tel.: +1 434 243 3729.

E-mail address: ruhm@virginia.edu yses examine how macroeconomic conditions affect morbidity. Exceptions include
1
I thank seminar participants at the College of William and Mary, New School for Ruhm (2003) and Charles and DeCicca (2008).
3
Social Research, National Bureau of Economic Research, Bureau of Economic Analy- Thus, mental health and physical health may move in the opposite direction.
4
sis, American Health Economics Conference and the Southeastern Health Economics Economou et al. (2008) find that total mortality is negatively but insignificantly
Study Group for helpful comments and the University of Virginia Bankard Fund for related to unemployment rates for 13 EU countries but that the unemployment coef-
financial support. ficient reverses sign when controlling health behaviors (smoking, drinking, calorie
1
Earlier investigations (e.g. Brenner, 1971, 1979) typically used time series data consumption) and other potential mechanisms (like pollution rates).
5
for a single geographic location. This research has been criticized on methodological Stuckler et al. (2009) obtain evidence from 26 EU countries of positive, negative
grounds (e.g. Kasl, 1979; Gravelle et al., 1981) and suffers from the fundamental and neutral relationships between unemployment rates and suicides, deaths from
problem that any lengthy time-series may contain omitted confounding factors that transport accidents, and total mortality; however, the statistical methods focus on
are spuriously correlated with health. Ruhm (2012) provides a detailed discussion rates of changes in mortality and unemployment, making it difficult to compare
of these issues. the results with other related research. Analyses undertaken as early as the 1920s
2
Mortality rates are the most common proxy for health: they represent the most uncovered positive relationships between economic activity, total mortality and
severe negative health outcome, are well measured and diagnosis generally does not several specific causes of death (Ogburn and Thomas, 1922; Thomas, 1927; Eyer,
depend on access to the medical system. However, changes in non-life-threatening 1977), as have some recent analyses using different methods (e.g. Fishback et al.,
health conditions are not accounted for. Due to limited data availability, few anal- 2007; Tapia Granados and Diez Roux, 2009).
18 C.J. Ruhm / Journal of Health Economics 42 (2015) 17–28
Some investigations suggest that mortality has become less where Mkjt is the mortality rate from source k in state j at year
procyclical or countercyclical in recent years. Using methods and t, U is the state unemployment rate, X a vector of covariates, ˛
data similar to Ruhm (2000), Stevens et al. (2011) find that a one a state fixed-effect, a general time effect, T a state-specific lin-
percentage point increase in the state unemployment rate was ear time trend, ε is the error term, and ˆ provides the estimated
associated with a 0.40% reduction in total mortality from 1978 to macroeconomic effect of key interest9 .
1991, but a smaller 0.19% decrease when extending the analysis The year effects (kt ) hold constant determinants of death that
through 20066 . McInerney and Mellor (2012) estimate that a one- vary uniformly across locations over time (e.g. advances in widely
point rise in joblessness lowered the mortality rates of persons 65 used medical technologies or behavioral norms); the location fixed-
and over by 0.27% during 1976–1991, but raised them 0.49% from effects (˛kj ) account for those that differ across states but are
1994 to 2008. Svensson (2007) uncovers a positive relationship time-invariant (such as persistent lifestyle disparities between
between Swedish unemployment rates and heart attack deaths residents of Nevada and Utah). Since the supplementary time-
from 1987 to 20037 . varying state characteristics (Xjt ) do not necessarily control for all
Changes in health behaviors provide a potential mechanism for time-varying determinants of death, the models also include state-
the mortality response. Consistent with this, reductions in drink- specific trends (Tjkt )10 . The 1976–2010 analysis period reflects the
ing, obesity, smoking and physical inactivity during bad economic availability of consistent data on state unemployment and mor-
times have been demonstrated (Ruhm and Black, 2002; Ruhm, tality rates. The macroeconomic impact is then identified from
2005; Gruber and Frakes, 2006; Freeman, 1999; Xu, 2013), and within-location variations in mortality rates, relative to changes
Edwards (2011) shows that individuals spend more time socializ- in other states and after controlling for demographic characteris-
ing and caring for relatives during such periods. However, research tics and state-trends. Since the impact of national business cycles is
using recent data again raises questions about the strength and absorbed by the time effects, discussions of macroeconomic effects
direction of these relationships. Charles and DeCicca (2008) indi- refer to changes within-states rather than at the national level.
cate that male obesity is countercyclical; Arkes (2009) obtains a One way of investigating whether the impact of macroecono-
similar result for teenage girls (but not boys); Arkes (2007) shows mic conditions on mortality has changed is to compare predicted
that teenage drug use increases in bad times; Dávlos et al. (2012) effects differ across sub-periods. However, since such estimates are
uncover a countercyclical pattern for some types of alcohol abuse often sensitive to the choice of starting or ending years, two alter-
and dependence; Colman and Dave (2013) suggest that increased native strategies are employed. First, models for total mortality are
leisure-time exercise during periods of economic weakness is more estimated with differing starting and ending dates, and with vary-
than offset by reductions in work-related physical exertion. Such ing lengths of the analysis period. The second, and main, method
findings are provocative although, as shown below, they should be specifies analysis periods of fixed duration and then sequentially
viewed with skepticism because the analysis periods are too short estimates models for all alternative sample windows permitted by
(eight years or less) to provide definitive results. the data. Most commonly, 20-year periods are used with results
Using U.S. data covering 1976–2010, the present study obtained for 16-windows ranging from 1976–1995 to 1991–2010.
examines whether the relationship between macroeconomic con- Figures are frequently provided with point estimates (and
ditions and mortality has changed over time. Comparability with sometimes confidence intervals) on the unemployment rate coeffi-
previous investigations is maximized by using empirical meth- cient presented for each analysis window. Tables are also supplied
ods that conform closely to that research8 . Three primary results showing unemployment coefficients and standard errors for the
emerge. First, total mortality has shifted from being strongly pro- first and last of the 20-year periods (1976–1995 and 1991–2010),
cyclical to being weakly related or unrelated to macroeconomic denoted by ˆ and ŝ , respectively, where equals 1 (2) in the first
conditions. Evidence from prior research that deaths decline when (last) period. I test whether the macroeconomic effect has changed
the economy deteriorates largely reflects the inclusion of early by providing estimates for ˆ = ˆ 2 − ˆ 1 .
sample years, when this was the case. Second, the results obtained Using Eq. (1), (ek ˆ − 1) × 100% provides the predicted per-
using relatively short (less than 15 year) periods show consider- centage change in mortality from source k resulting from a one
able instability and should probably be viewed as unreliable. Third, percentage point increase in the unemployment rate. While these
fatalities due to cardiovascular disease and, to a smaller degree, estimates show the relative size of the macroeconomic effect, they
transport accidents continue to be procyclical, whereas strong do not directly indicate changes in the absolute number of pre-
countercyclical patterns for cancer and some external sources of dicted fatalities because, for example, large relative effects may
death (particularly accidental poisonings) have emerged. imply small absolute changes for sources that are responsible for
few deaths. These relative effect sizes are translated into absolute
numbers through estimates of:
2. Research design
ek
ˆ
− 1 × k D (2)
This analysis uses variations of previously employed panel data
where ˆ = ˆ 2k − ˆ 1k , D is the average annual number of deaths
methods (e.g. by Ruhm, 2000) to analyze the relationship between
(2222,313) and k is the share of deaths due to source k over the
macroeconomic conditions and mortality rates. The estimating
1976–2010 period.
equation is:

ln Mkjt = ˛kj + Xjt ˇ + Ujt + kt + Tjkt + εkjt (1)
9
Unemployment rates are used to proxy macroeconomic conditions; however, a
procyclical variation in mortality does not imply that the loss of a job improves
health. To the contrary, Sullivan and von Wachter (2009) show that job loss is
associated with increases in individual mortality rates.
6 10
The estimated reduction rises to 0.33% over the 1978–2006 period when using Mortality trends vary considerably across sources of death, with large secular
age-adjusted mortality rates. reductions for total mortality and that from cardiovascular disease and external
7
Using time-series methods for the U.S. from 1961 to 2010, Lam and Piérard sources, a relatively flat trend for cancer, and an increase for other disease deaths.
(2014) also argue that total and cardiovascular mortality have become less pro- State-year population weights were also sometimes incorporated but unweighted
cyclical over time, while motor vehicle fatalities remain strongly procyclical. estimates are generally preferred (Wooldridge, 1999; Butler, 2000; Solon et al.,
8
One exception is the use of an uncommonly detailed set of age controls. 2015) and so are focused upon below.
C.J. Ruhm / Journal of Health Economics 42 (2015) 17–28 19
A potential concern is that calculations based on (2) do not ICD-8 and ICD-9 and between ICD-9 and ICD-10 coding systems;
account for the possibility that a portion of the trend in macro- however, the correspondence is imperfect. These issues are typi-
economic effects on overall mortality rates could reflect secular cally minor when looking at broad causes of death (e.g. those from
changes in the shares of deaths due to specific sources. This was cardiovascular disease) but are important for many specific sources
examined using a variation of the Oaxaca (1973) and Blinder’s of mortality. To provide information on this, the National Cen-
(1973) decomposition method. The method and results, which are ter for Health Statistics has calculated “estimated comparability
summarized in the online appendix, indicate that almost all of the ratios” indicating the relative number of deaths in 1996 attributed
macroeconomic effect was due to changes in the coefficients, rather to a specific cause using ICD-9 and ICD-10 classifications (Anderson
than in mortality shares, so that predictions obtained from (2) are et al., 2001) and, similarly, for 1976 using ICD-8 versus ICD-9 codes
useful. (Klebba and Scott, 1980).
When the estimated comparability ratios are close to one (i.e.
3. Data and descriptive statistics a similar number of deaths are reported using either ICD system),
issues of data comparability are likely to be minor and well cap-
Annual average state unemployment rates, the main prox- tured by the inclusion of regression year fixed-effects. For example,
ies for macroeconomic conditions were obtained from the U.S. the estimated comparability ratios are 1.013 and 1.003 for CVD
Department of Labor’s Local Area Unemployment Statistics Database and cancer fatalities, when using ICD-8 and ICD-9 codes, and 0.998
(www.bls.gov/lau/lauov.htm), which provides monthly estimates and 1.007 for ICD-9 and ICD-10 categories. However, the poten-
of total employment and unemployment rates for census regions tial problems are greater for some numerically important causes
and divisions, states, metropolitan statistical areas, counties, and of death, and for others that have been analyzed in previous
some cities11 . Concepts and definitions underlying the LAUS data research15 . Due to these concerns, the analysis of disease mortality
come from the Current Population Survey. Mortality data are from is restricted to the major categories of CVD and malignant neo-
the Center for Disease Control and Prevention’ Compressed Mor- plasms, as well as a generic grouping of all other disease types16 . A
tality Files (CMF) (www.cdc.gov/nchs/data access/cmf.htm), which fuller investigation is provided for subcategories of external deaths,
contain information for every death of a U.S. resident including: including those from transport accidents, non-transport (other)
state and county of residence, year of death, race and sex, Hispanic accidents, intentional self-harm (suicide), and homicide/legal
origin (after 1998), age group (16 categories), underlying cause of intervention. Because non-transport accidents will be shown to be
death (ICD codes and CDC recodes). Data prior to 1988 are pub- particularly important, separate analysis is conducted for the sub-
licly available; those from 1989 to 2010 were obtained by special components: falls, drowning/submersion, smoke/fire/flames, and
agreement with the CDC. Population data (the denominator in the poisoning/exposure to noxious substances17 .
mortality rate calculations) from 1981 on come from the National Appendix Table A1 details the ICD codes used to classify causes
Cancer Surveillance Epidemiology and End Results (SEER) program of death. Means and sample standard errors for mortality rates (per
(http://www.seer.cancer.gov/data)12 . These were supplemented 100,000 population)and state characteristics are detailed in the
by census estimates, included in the CMF files, for 1976–1980. online appendix. Appendix Table A2 illustrates how the sources
In addition to total annual mortality rates, sex-specific death of death changed over the analysis period, showing numbers
rates were constructed, as were fatality rates for five age groups and shares of fatalities during 1976–1995 and 1991–2010. As
(<25, 25–44, 45–64, 65–74, and ≥75 year olds) and deaths from expected, given increased life expectancy, the proportion of mor-
major diseases and external causes13 . The SEER data were addi- tality accounted for by the elderly has grown substantially. Declines
tionally used to construct independent variables for the share of in the share of cardiovascular deaths has been offset by increases in
the state population who were female, nonwhite, Hispanic, and mortality from other diseases. The fraction from external sources
aged <1, 1–19, 45–54, 55–64, 65–74, 75–84 and ≥85 years old14 . changed little, with reductions from fatal transport accidents and
The analysis of cause-specific mortality introduces complica- homicides being compensated for by increases in non-transport
tions. From 1976 to 1978, cause of death was categorized using the accidents, particularly poisoning deaths.
8th revision of the International Classification of Diseases (ICD-8
codes). ICD-9 codes were used between 1979 and 1998, and ICD-10 4. The declining procyclicality of total mortality
categories since 1999. Crosswalks have been established between
Fig. 1 supplies three ways of examining whether the procycli-
cality of total mortality has diminished over time, by estimating Eq.
11
(1) for different time periods. Solid lines show point estimates and
Some recent studies of macroeconomic patterns of health behaviors have ana-
dotted lines the 95% confidence intervals.
lyzed county-level or MSA data (e.g. Charles and DeCicca, 2008; An and Liu, 2012).
This has potential advantages (e.g. examining smaller regional economies) and Fig. 1A displays unemployment rate coefficients where the anal-
disadvantages (e.g. greater measurement error). For this investigation, the major ysis period begins in 1976 and ends in years ranging between 1995
disadvantage is that a consistent data series of county unemployment rates only and 2010. The magnitude of the estimated macroeconomic effect
begins in 1990 and the Department of Labor cautions against using county level
declines monotonically but modestly as the sample is extended to
data prior to that time. Also, Lindo (2015) provides evidence that the health effects
of macroeconomic conditions are understated when using more disaggregated (e.g. more recent periods, ranging from −0.0043 when the last year is
county rather than state) data. Preliminary analysis revealed similar results using 1995 to −0.0034 when it is 2010. All of the coefficients are sig-
state and county data starting in 1990. nificantly different from zero and these results, which are largely
12
The SEER data are designed to supply more accurate population estimates for
intercensal years than standard census projections, and to adjust for population
shifts in 2005, resulting from Hurricanes Katrina and Rita. Differences between the
15
SEER and CMF population estimates are miniscule prior to 2000 but are sometimes For instance, the ICD-10 to ICD-9 comparability ratios are 0.698, 1.232 and 1.554
reasonably large (up to 3%) after 2003. for influenza/pneumonia, kidney disease (nephritis, nephrotic syndrome, nephro-
13
I examined other age-specific death rates, including infant mortality rates, in sis) and Alzheimer’s disease.
16
preliminary analysis, but focus on these age groupings since the large majority of Ruhm (2013) provides a preliminary analysis examining three sub-components
deaths occur to those who are relatively old. of CVD, five categories of malignant neoplasms and five types of other diseases.
14 17
Hispanic population shares are not provided prior to 1981. Therefore, shares for These account for 65% of deaths due to non-transport accidents. The most
1976–1980 were extrapolated as a linear trend for changes occurring between 1981 important remaining category, “other and unspecified transport accidents and their
and 1986. sequelae”, is not comparable over time.
Fig. 1. Unemployment coefficients for total mortality using different analysis samples. (A) Sample begins in 1976 and continues through specified year. (B) Sample begins
in specified Year and Continues through 2010. (C) Analysis sample covers 20-year period.
similar to those of previous research, do not alter the conclusion especially negative health consequences of the great recession of
that mortality is procyclical. 2007–2009.
The sensitivity of findings to the choice of sample periods can The choice of 20-year sample windows is arbitrary and may
be seen more explicitly in Fig. 1B, where the sample always ends conceal an increased procyclical variation of mortality toward the
in 2010 but the starting year varies between 1976 and 1991. The end of the data period. This possibility is investigated in Fig. 2,
unemployment coefficient attenuates from −0.0034 for the entire which replicates Fig. 1C, but for periods of between 5-years and
sample period to between −0.0029 and −0.0009 when the starting 20-years. Two findings deserve mention. First, at shorter durations,
year is 1978 or later18 . Perhaps more importantly, the data fail to the estimates become more volatile and less precise. For instance,
reject the null hypothesis of no macroeconomic effect for periods when using 5-year windows, the unemployment coefficients fluc-
beginning after 1988. tuate wildly for even small changes in timing (e.g. from 0.0120
Fig. 1C displays results using 20-year sample windows begin- for 1996–2000 to −0.0077 for 1999–2003) but almost always fail
ning in the specified year. The left-most entry shows that the to reject the null hypothesis of no macroeconomic effect. Sec-
unemployment coefficient for 1976–1995 is −0.0043, while the ond, the standard errors have typically increased for more recent
farthest right result shows that it is −0.0010 for 1991–2010. Total samples. As a result, the estimates obtained using 10-year or 15-
mortality is significantly procyclical (negative unemployment rate year analysis windows, while less volatile than those using 5-year
coefficients) for all 20-year windows starting between 1976 and periods, still lack sufficient precision to determine whether the
1987, but the predicted effect diminishes steadily for windows possible partial reversion of the macroeconomic effects in recent
beginning after 1982 and is small and insignificant for those years (toward more procylical mortality) is real or reflects sta-
starting in 1988 through 1991. This pattern is not caused by tistical noise19 . An important implication is that the findings of
18 19
The full sample estimate is in line with previous results. Ruhm (2000) obtains a When using 10-year periods, the average standard error is 46% larger for analysis
slightly larger 0.5% reduction in total mortality but the current estimate is close to windows beginning between 1989 and 2001 than for those starting between 1976
the 0.3% decrease obtained in Stevens et al.’s (2011) preferred specification. and 1988 (0.0018 versus 0.0013).
Fig. 2. Unemployment coefficients for total mortality using different sample windows.
many recent investigations of macroeconomic variations in health

outcomes and behaviors should be viewed with extreme cau-
tion because the analysis periods are too short to provide reliable
estimates20 .
Fig. 3 shows that the estimated change in the relationship
between macroeconomic conditions and total mortality is fairly
insensitive to controlling for state-specific linear time trends or
weighting the data. (This figure, and those that follow, show point
estimates for 20-year analysis windows that begin in the year spec-
ified on the X-axis.) In all cases, the estimated procyclicality of
mortality declined over time, with most of the change dating from
the early to mid-1980s and with statistically insignificant unem-
ployment rate coefficients for some or all recent analysis windows.
Removing state-specific trends makes this effect more pronounced
than in the “preferred” specifications, which use unweighted data
and control for state trends; weighting the data and including
trends makes it somewhat less so21 . In all cases, the change in the
Fig. 3. Unemployment coefficients for total mortality using alternative estimation
methods.
20
unemployment coefficient between 1976–1995 and 1991–2010 is
Charles and DeCicca’s (2008) analysis of male obesity used data from 1997 to
2001; Arkes’ (2007, 2009) investigation of teenage body weight utilized informa-
positive and statistically significant.
tion from 1997 to 2004, Dávlos et al.’s (2012) study of alcohol abuse and dependence Table 1 shows the unemployment coefficients, from estimat-
compared 2001–2002 and 2004–2005, Colman and Dave’s (2013) research on work ing Eq. (1), for total, sex-specific and age-specific mortality. Results
and leisure-time physical activity covered 2003–2010, Cotti and Tefft’s (2011) anal-
ysis of alcohol-related vehicle fatalities used data from 2003 to 2009, and Tekin et al.
(2013) investigated a variety of health outcomes and behaviors from 2005 to 2011.
21
For example, the unemployment rate coefficient (standard error) for the and −0.0019 (0.0009) for unweighted data without trends, weighted data without
1991–2010 analysis window is 0.0022 (0.0017), 0.0039 (0.0020), −0.0010 (0.0010) trends, unweighted data with trends and weighted data with trends.
Table 1
Estimated macroeconomic effects on specific sources of mortality.
Type of mortality 1976–1995 1991–2010 Difference
All −0.0043 (0.0009) ***

−0.0010 (0.0010) 0.0033 (0.0012)***
Sex-specific
Males −0.0044 (0.0010)*** −0.0001 (0.0012) 0.0043 (0.0015)***
Females −0.0041 (0.0010)*** −0.0018 (0.0011) 0.0022 (0.0014)
Age-specific (Years)
<25 −0.0165 (0.0025)*** 0.0024 (0.0035) 0.0189 (0.0037)***
25–44 −0.0078 (0.0028)*** 0.0016 (0.0035) 0.0094 (0.0042)**
45–64 −0.0020 (0.0011)* 0.0018 (0.0013) 0.0038 (0.0017)**
65–74 −0.0037 (0.0009)*** −0.0019 (0.0012) 0.0018 (0.0014)
≥75 −0.0041 (0.0010)*** −0.0019 (0.0013) 0.0022 (0.0012)*
Note: Dependent variable is the natural log of the specified state mortality rate, obtained from the Compressed Mortality Files, for 1976 to 2010 (n = 1785). The first two
columns show the coefficient on the state unemployment rate for 20-year subsamples (n = 1020) covering 1976–1995 and 1991–2010. The regressions also include vectors
of state and year dummy variables, state-specific linear time trends, and controls for the share of the state population who are: female, nonwhite, Hispanic, and aged <1,
1–19, 45–54, 55–64, 65–74, 75–84 and ≥85 years old. The third column shows the difference between the unemployment coefficients for the 1991–2010 and 1976–1995
subsamples. Robust standard errors, clustered at the state level, are shown in parentheses. *** p < 0.01, ** p < 0.05, * p < 0.1.
in the first column refer to 1976–1995, those in the second to The trends also appear to be relatively pronounced for the young
1991–2010, with the final column showing the difference between and middle-aged. We are unable to reject the null hypothesis of a
the two. A one point rise in unemployment predicts a statistically zero unemployment rate effect in 2010 for all age groups. Specif-
significant 0.43% reduction in total mortality during 1976–1995 ically, a one percentage point increase in joblessness reduced the
compared to a small and insignificant 0.10% decrease in 1991–2010 predicted death rates of <25, 25–45, 45–64, 65–74, and ≥75 years
(see the first row). The 0.33% difference between these two periods old by 1.6%, 0.8%, 0.2%, 0.4%, and 0.4% in 1976–1995 but increased
is statistically significant and indicates that the procyclicality of them by 0.2%, 0.2%, 0.2%, −0.2%, and −0.2% in 1991–2010. We are
mortality has largely disappeared in recent years22 . unable to reject the null hypothesis of a zero unemployment rate
To address the possibility that the observed secular trends effect in 1991–2010 for all age groups. The patterns for all possi-
reflect a change in the relationship between unemployment rates ble 20-year windows are qualitatively similar (see Fig. 4B). The 95%
and macroeconomic conditions, rather than in the health effects of confidence intervals (not shown) exclude positive unemployment
economic conditions, I estimated specifications that controlled for rate coefficients for all two-decade periods beginning prior to 1986
nonemployment (the percentage of the 16 and over civilians unem- for ≥75 year olds, before 1989 for <25 year olds, and earlier than
ployed or out of the labor force) rather than unemployment rates23 . 1983 for those aged 65–74. Conversely, a zero coefficient is rarely
In these models a one percentage point increase in the nonem- rejected for 45–64 year olds.
ployment rate predicted a statistically significant 0.39% reduction
in total mortality during 1976–1995 and an insignificant 0.02%
5. Heterogenous effects across sources of death
decrease in 1991–201024 . The highly significant 0.37% difference
is slightly larger than that obtained using unemployment rates.
Table 2 and Figs. 5 and 6 stratify disease versus external sources
The remainder of Table 1 and Fig. 4 summarize subgroup analy-
of death, and then separately examine three disease and four exter-
ses, stratified by gender and age. In Fig. 4, and subsequently, thicker
nal causes. The three disease categories: cardiovascular, cancer
lines indicate sources with relatively high mortality shares25 . The
and other diseases, accounted for 42%, 23% and 29% of deaths
evidence suggests larger secular changes in macroeconomic effects
over the 1976–2010 period. The four external sources: transport
for men than women. In 1976–1995, a one point rise in unem-
accidents, other (non-transport) accidents, suicides and homicides
ployment predicted a 0.44% reduction in male mortality and a
were responsible for 2.2%, 2.4%, 1.4% and 0.9%. Finally, four specific
0.41% decrease for females. This effect completely disappeared
types of non-transport accidents are considered – falls, drown-
by 1991–2010 for men but fell only half as much for women.
ing/submersion, smoke/fires/flames and poisoning/exposure to
The declining procyclicality of mortality has been particularly pro-
noxious substances – which constituted 0.7%, 0.2%, 0.2% and 0.5%
nounced for males since 1982, while showing a steadier reduction
of fatalities.
for females (see Fig. 4A).
Levels and trends of the macroeconomic effects differ markedly
for mortality from disease versus external causes. A one point
rise in joblessness lowered predicted disease mortality by 0.33%
in 1976–1995 versus 0.14% in 1991–2010, a stastically insignifi-
22
Coefficients on the other time-varying state level covariates are provided in the cant change. By contrast, a much larger 1.5% reduction in external
online appendix. deaths was estimated for the earlier years versus a statistically sig-
23
For instance, declines in labor force participation rates were particularly pro-
nounced during the “great recession” that began in 2007, when compared to other
nificant 0.8% increase in the later ones, and the difference is highly
economic downturns (Shierholz, 2012). significant. Fig. 5 suggests an almost monotonic but modest atten-
24
A one percentage point increase in the unemployment rate predicts a 0.73 uation over time in the unemployment coefficient for deaths from
percentage point rise in nonemployment over the full period, in models control- disease, with statistically significant negative estimates obtained
ling for state demographic characteristics, time trends, and state and year dummy
for all two decade periods starting before 1986. Conversely, the
variables. Changes in interstate migration are also unlikely to explain the results.
Migrants tend to be healthy and to move from areas of higher to lower unemploy- predicted effect for external causes was negative and fairly sta-
ment rates (Halliday, 2007), introducing a countercyclical mortality effect. Mortality ble for 20-year periods beginning between 1976 and 1982, but
might therefore have become less procyclical if migration rates were increasing over attenuated steadily for analysis windows starting from 1982 to
time. However, migration rates instead peaked around 1980 and have fallen sharply 1990, with a statistically insignificant effect for those with first
since then (Malloy et al., 2011).
25
For example, in Fig. 4B, the line for ≥75 year olds is thick because they account for
years between 1984 and 1989, and significantly positive unem-
51% of mortality, from 1976 to 2010, whereas that for <25 year olds is thin because ployment rate coefficients obtained for those initiating in 1990 or
they are responsible for less than 4% of deaths. 1991. These results help to explain the sharp reversal in the effects
Fig. 4. Unemployment coefficients for sex-specific and age-specific mortality. (A) Sex-specific mortality. (B) Age-specific mortality.
Table 2
Estimated macroeconomic effects on cause-specific mortality.
Cause of death 1976–1995 1991–2010 Difference
Diseases −0.0033 (0.0010)*** −0.0014 (0.0010) 0.0019 (0.0013)

Cardiovascular disease −0.0036 (0.0013)*** −0.0041 (0.0017)** −0.0005 (0.0019)
Cancer 0.0002 (0.0013) 0.0027 (0.0011)** 0.0024 (0.0011)**
Other diseases −0.0060 (0.0020)*** −0.0011 (0.0036) 0.0048 (0.0027)*
External causes −0.0148 (0.0020)*** 0.0078 (0.0028)*** 0.0226 (0.0023)***

Transport accidents −0.0265 (0.0035)*** −0.0086 (0.0048)* 0.0180 (0.0030)***
Other accidents −0.0173 (0.0033)*** 0.0086 (0.0049)* 0.0259 (0.0048)***
Suicides 0.0041 (0.0031) 0.0171 (0.0046)*** 0.0130 (0.0054)**
Homicides −0.0063 (0.0065) 0.0160 (0.0115) 0.0223 (0.0100)**
Other accidents −0.0173 (0.0033)*** 0.0086 (0.0049)* 0.0259 (0.0048)***

Falls −0.0119 (0.0053)** 0.0016 (0.0084) 0.0135 (0.0095)
Drowning/submersion −0.0015 (0.0065) 0.0143 (0.0155) 0.0159 (0.0147)
Smoke/fire/flames −0.0308 (0.0099)*** −0.0019 (0.0179) 0.0289 (0.0169)*
Poisoning/noxious −0.0148 (0.0161) 0.0422 (0.0251)* 0.0570 (0.0290)**
Note: See note on Table 1. *** p < 0.01, ** p < 0.05, * p < 0.1.
Fig. 5. Unemployment coefficients for disease versus external of deaths.

Fig. 6. Unemployment coefficients for deaths from specific diseases and external causes. (A) Specific diseases. (B) External causes. (C) Other accidents.
of macroeconomic conditions on deaths of younger persons, who Research for earlier time periods (e.g. Ruhm, 2000; Neumayer,
disproportionately die from external causes26 . 2004; Miller et al., 2009) documents a strong procyclicality of
cardiovascular deaths but with little macroeconomic variation in
5.1. Diseases cancer fatalities, and attributes this to the likelihood that short-
term behavior changes (e.g. smoking, diet and exercise) more
There are striking disparities across types of diseases. Cancer strongly influence the risk of CVD than cancer deaths. A conceivable
mortality was unrelated to the economy in 1976–1995 but strongly explanation for the findings just described is that the relation-
countercyclical by 1991–2010, whereas CVD mortality remained ship between macroeconomic conditions and health behaviors has
strongly procyclical throughout (see the top panel of Table 2 and remained relatively stable, while cancer mortality has become
Fig. 6A). The procyclicality of other disease mortality declined over more sensitive to the availability of financial resources and access
time but the change between 1976–1995 and 1991–2010 was not to (procyclical) health care due to improvements in expensive med-
quite significant at the 0.05 level27 . ical treatments and technologies28 .
26 28
For example, <45 year olds accounted for 56% of external deaths (from 1976 to The cost per cancer case rose from $47,000 in 1983 to $70,000 in 1999 (Philipson
2010) but less than 10% of total mortality. et al., 2012), with many expensive new medical treatments and chemotherapy
27
This result is sensitive to weighting. Using weighted data, the unemployment agents coming into use in the 1990s and early 2000s (Cutler, 2008). The contin-
rate coefficient was −0.0029 in 1976–1995 and −0.0020 in 1991–2009. The differ- ued procyclicality of CVD mortality could also occur for other reasons, such as a
ence was 0.0009 with a standard error of 0.0031. (stable) deleterious health effect of air pollution (Heutel and Ruhm, 2013).
Table 3
Change in effect of macroeconomic conditions on predicted number of deaths.
Type of mortality Share of deaths Predicted in no. of deaths
Point estimate 95% Conf. Interval
All deaths 1.00000 7253*** 1882–12,637
Sex-specific
Males 0.5121 4900*** 1625–8185
Females 0.4879 2402 −487–5298
Age-specific (years)
<25 0.0389 1655*** 1009–2305
25–44 0.0572 1202** 138–2276
45–64 0.1865 1586** 177–3000
65–74 0.2031 799 −454–2054
≥75 0.5140 2565* −206–5343
Diseases 0.9293 3988 −1221–9210

CVD 0.4155 −466 −3846–2927
Cancer 0.2263 1277** 128–2329
Other diseases 0.2875 3104* −326–6552
External causes 0.0707 3582*** 2845–4322

Transport accidents 0.0218 880*** 593–1168
Other accidents 0.0243 1418** 889–1942
Suicide 0.0138 402** 75–733
Homicide 0.0094 469** 56–891
Other accidents 0.0243 1418*** 899–1942

Falls 0.0069 207 −78–497
Drowning 0.0019 67 −54–191
Fires 0.0019 124* −18–271
Poisoning 0.0054 709** 1–1459
Note: Predicted changes are for a one-percentage point increase in unemployment. Share of deaths is for 1976–2010. “Predicted in # of deaths” is calculated as (ek ˆ
−
1) × k D, where — 2 − 1 , for the predicted unemployment coefficient in period , with = 1 in 1976–1995 and = 2 in 1991–2010; D is the average annual number
of deaths during 1976–2010 (2222,313); k is the share of deaths from source k. 95% confidence intervals are estimated as (e(k±1.96×s
ˆ k ) − 1)k D, for sk the standard error
on ˆ k .
***
p < 0.01,
**
p < 0.05,
*
p < 0.1.
5.2. External causes the unemployment coefficient rose from −0.0148 in 1976–1995
to 0.0422 in 1991–2010, with a strong countercyclical pattern
There is considerable heterogeneity in the effects for specific emerging for 20-year windows beginning after the early 1980s.
sources of external deaths (see the second panel of Table 2 and There are more modest changes for deaths from falls or drownings
Fig. 6B). One of the most consistent previous research findings is and the procyclicality of fatalities from fires largely disappears in
that transport fatalities are procyclical29 . This effect persists but has recent years but, as shown later, this is always a relatively minor
weakened recently, with a one percentage point rise in the unem- source of mortality. Given these results, accidental poisonings
ployment rate predicting a 2.6% decrease in 1976–1995 versus a receive special attention below.
0.9% reduction in 1991–2010. Suicides increase with joblessness,
consistent with most prior studies, and this effect has strengthened
over time: a one point growth in unemployment was associated 6. Predicted changes in number of deaths
with an insignificant 0.4% rise is suicides during 1976–1995 versus
a highly significant 1.7% increase in 1991–201030 . I next demonstrate that external sources of deaths, especially
The most noteworthy finding is that fatal non-transport acci- those for non-transport accidents and among these accidental poi-
dents have switched from being strongly procyclical to sharply sonings, play a key role in explaining the declining macroeconomic
countercylical: a one point rise in unemployment reduced pre- responsiveness of total mortality. All numerical calculations are
dicted mortality rates by 1.7% in 1976–1995 but increased them based on Eq. (2) and refer to a one percentage point increase
0.9% in 1991–2010, with nearly monotonic growth over time. The in unemployment. The discussion focuses on predicted secular
parameter estimates were negative and significant for all 20-year changes in macroeconomic effects, rather than levels for a single
periods starting prior to 1985 but insignificantly positive for those analysis period.
beginning after 1987. The first row of Table 3 shows that a one point increase in
The bottom panel of Table 2 and Fig. 5C provide additional speci- unemployment predicts 7253 more fatalities in 1991–2010 than in
ficity on non-transport accident deaths, showing that the secular 1976–199531 . As noted, the procyclicality of mortality weakened
trends are dominated by changes for accidental poisonings, where over time more for men than women, so that males account for
two-thirds of the overall change in the macroeconomic effect (see
29
Previous analyses have often examined motor vehicle deaths, which consti-
tuted 94% of transport accident fatalities from 1976 to 2010. Transport deaths are 31
The unemployment coefficient was −0.00428101 for 1976–1995 and
considered here because they are coded more consistently across time. −0.00102266 in 1991–2010. The resulting difference of .00325834 implies around
30
The unemployment coefficient was positive in all 20-year windows and statis- 0.33% more fatalities, or 7253 additional deaths per year, based on 2222,313 fatali-
tically significant for those beginning after 1987. ties annually: (exp[−0.00102266–−0.00428101] − 1) × 2222,313 = 7252.86.
Fig. 7. Trends in accidental poisoning mortality, by age.
the second and third rows of Table 3)32 . When decomposing by 7. Discussion
age, the most striking finding is the extent to which the declining
procyclicality is concentrated among the relatively young: persons The strong procyclical pattern of mortality present in the 1970s
under <25 are responsible for less than 4% of deaths but 23% of and 1980s has been largely eliminated in recent years. The pat-
the secular trend in the macroeconomic impact; those under 45 tern varies across sources of deaths, with much larger secular
(65) comprise less than 10% (30%) of fatalities but almost two-fifths changes observed for external than disease causes. All types of
(over three-fifths) of the predicted change over time. external deaths became less procyclical or more countercyclical
The remainder of Table 3 examines specific causes of mortal- during the analysis period, with particularly large changes for
ity. A one percentage point rise in unemployment predicts almost non-transport accidents, and within this category, for accidental
3600 more annual deaths from external sources in 1991–2010 than poisonings. Among diseases, cardiovascular mortality continues to
1976–1995, or 49% of the change in total mortality. This occurs even fall sharply when the economy deteriorates, whereas cancer deaths
through only 7% of all deaths are due to such causes, and helps to have became substantially countercyclical. These findings are rel-
explain the large changes for males and <45 year olds (for whom evant not only for understanding of the production of health but
external deaths account for over 40% of mortality). Cancer and other also for measuring the size and effects of business cycle fluctua-
diseases also play a role, although the estimates are imprecise for tions. Egan et al. (2013) argue that procyclical mortality implies
the former and sensitive to the use of sampling weights for the that business cycle fluctuations are milder than when calculated
latter33 . By contrast, cardiovascular disease – the number one killer using standard GDP measures, but this may have become less true
– explains none of the secular change, as the unemployment rate in recent years.
coefficient actually becomes more negative in later years. Some estimates are sensitive to changing the starting and end-
The third panel of Table 3 provide a more detailed decompo- ing dates of analysis. Such parameter instability is particularly
sition of external deaths. Non-transport accidents are of special problematic when the sample window is short – probably anything
interest because the unemployment coefficient switches from large less than 15 years – raising concerns about the findings of many
and negative to positive, accounting for over 1400 additional deaths recent related investigations that have used brief (often less than
annually, or 40% of the predicted rise in external cause mortality. 10 year) timespans. One contribution of this study is to provide par-
This increase is 11% larger than for all cancers, even though non- simonious methods of illustrating the sensitivity of the results the
transport accidents constitute only 2% of all fatalities, versus 22% length of the analysis window, and to the first and last years exam-
for malignant neoplasms. Transport accidents are also important, ined. Another caveat is that specific sources of death are implicitly
explaining 880 extra deaths per year. The effects on suicides and treated here as being independent of each other, although some
homicides are in the same direction but of considerably smaller prior research (Yeung et al., 2014) identifies potential correlations
magnitude. between them. This is not an issue for the analysis of total mortality
The bottom panel of Table 3 presents separate results for four but may be important when considering specific causes of death as
sources of non-transport accidents, which together explain around competing risks.
two-thirds of deaths from this cause. The role of accidental poison- Mechanisms for the previously observed procyclical variation
ings is remarkable. Although accounting for just 0.5% of deaths, a in mortality remain poorly understood, so it is speculative as to
one point rise in unemployment is predicted to result in 709 more why the relationship has changed in recent years. Two possibilities
annual poisoning fatalities in 1991–2009 than in 1976–1995, or are intriguing. First, the change dates to (20-year) analysis periods
half the change in non-transport accident fatalities, 20% of that for beginning in the early 1980s, which precisely coincides with the
external deaths and almost 10% of the overall mortality effect. reduction in macroeconomic volatility that has been referred to
as the “Great Moderation” (Stock and Watson, 2003; Bernanke,
2004)34 . This raises the possibility that the mortality patterns here
are part of a broader change in the effects of short-term changes
32
Since separate (unconstrained) models are estimated for different sources of
death, the total contribution of changes in predicted group-specific mortality can
sum to more or less than the effect predicted for total mortality.
33 34
With weighted data, a one point unemployment increase predicts 580 more Productivity also shifted from procyclical to acyclical or slightly countercyclical
deaths from other diseases in 1991–2010 than in 1975–1995 at about the same time (Galí and van Rens, 2010).
in macroeconomic performance, or in the role of unemployment economic weakness has long been associated with diminished
rates as a proxy for macroeconomic conditions. With regards to the mental health (Ruhm, 2000, 2003; Charles and DeCicca, 2008;
latter, it is noteable that the residual variation in state-year unem- Bradford and Lastrapes (2014) and, to the extent these drugs are
ployment rates after including controls (one minus the R-squared now being taken to address this, the increased procyclicality of
from regressing unemployment rates on state and year dummy poisoning deaths may be a physical manifestation of what was
variables, state-specific time trends, and the time-varying state previously a mental health problem36 .
demographic characteristics) fell from 0.177 in 1976–1995 to 0.094
in 1999–2010. There is also suggestive evidence that the procycli-
cality of mortality might have increased slightly in the most recent
Appendix A. Appendix
analysis periods, that include the severe 2007–2009 recession.
Second, the emerging importance of accidental poisoning fatal-
Tables A1 and A2.
ities occurred at the same time that deaths from this source
increased dramatically for young and middle-aged adults (see
Fig. 7). Over 90% of poisoning fatalities are now due to drug
overdoses, with particularly important roles for prescription opi- Appendix B. Supplementary data
oids (such as hydrocodone and oxycodone) and benzodiazepines
(Warner et al., 2011; Ruhm, 2015). The higher death rates reflect Supplementary data associated with this article can be found,
greater availability of these drugs raising the ease of self-injury in the online version, at http://dx.doi.org/10.1016/j.jhealeco.
and accidental death during bad economic times35 . Moreover, 2015.03.004.
Table A1
Definitions of specific causes of mortality.
Variable Description ICD-8 (1976–1978) ICD-9 (1979–1998) ICD-10 (1999–2010)
Cancer Malignant neoplasms 140–209 140–208 C00–C97

CVD Major cardiovascular diseases 390–448 390–448 I00–I78
Heart Diseases of the heart 390–398, 402, 404, 410–429 390–398, 402, 404–429 I00–I09, I11, I13, I20–I51
Transport Transport accidents 800–848, 940–941 800–848, 929.0, 929.1 V02–V99, Y85
Other Ac Other (non-transport) accidents 850–939, 942–949 850–928, 929.2–949 W00–X59, Y86
Falls Accidents: falls 880–887 880–888 W00–W19
Drowning Accidents: drowning/submersion 910 910 W65–W74
Fires Accidents: smoke/fire/flames 890–899 890–899 X00–X09
Poison Accidents: poisoning/noxious substances 850–879, 924 850–869, 924.1 X40–X49
Suicide Suicide (intentional self-harm) 950–959 950–959 X60–X84, Y87.0
Homicide Homicide and legal intervention 960–978 960–978 X85–Y09, Y87.1, Y35, Y89.0
Table A2
Sources of death by time period.
Source of death All Years 1976–1995 1991–2010
# % # % # %
All deaths 2222,313 100.0 2081,936 100.0 2367,352 100.0

Males 1138,011 51.2 1097,329 52.7 1181,266 49.9
Females 1084,302 48.8 984,607 47.3 1186,086 50.1
Age of death (years)

<25 86,549 3.9 97,528 4.7 74,871 3.2
25–44 127,161 5.7 126,536 6.1 134,595 5.7
45–64 414,460 18.6 399,677 19.2 418,713 17.7
65–74 451,301 20.3 472,501 22.7 437,723 18.5
≥75 1142,362 51.4 985,091 47.3 1301,093 55.0
Cause of death
Cardiovascular 923,419 41.6 957,085 46.0 892,420 37.7
Cancer 502,882 22.6 463,278 22.3 548,661 23.2
Other diseases 638,973 28.8 510,290 24.5 765,354 32.3
External causes 157,039 7.1 151,283 7.3 160,917 6.8
Transport accidents 48,545 2.2 50,581 2.4 45,750 1.9
Other accidents 54,044 2.4 45,419 2.2 60,350 2.5
Falls 15,225 0.7 12,721 0.6 17,217 0.7
Drowning/submersion 4181 0.2 4676 0.2 3565 0.2
Smoke/fires/flame 4229 0.2 4984 0.2 3402 0.1
Poison/noxious substance 12,090 0.5 5938 0.3 17,225 0.7
Suicide 30,755 1.4 29,391 1.4 32,171 1.4
Homicide 22,560 1.1 22,560 1.1 20,117 0.8
Note: Table shows average deaths per year for the specified age group or cause.
35
For example, per capita opioid sales more than tripled between 1999 and 2010
36
(Paulozzi, 2012). See Ruhm (2013) for a more extensive discussion of these issues.
References Lam, Jean-Paul, Piérard, Emmanuelle, 2014. The Time-Varying Relationship Between
Mortality and Business Cycles in the U.S. University of Waterloo, Mimeo (June).
An, Ruopeng, Liu, Junyi, 2012. Local labor market fluctuations and physical activity Lin, Shin-Jong, 2009. Economic fluctuations and health outcome: a panel analysis of
among adults in the United States 1990–2009. In: ISRN Public Health, 2012, Asia-Pacific countries. Applied Economics 41 (4), 519–530.
Article 318610., pp. 1–7. Lindo, Jason M., 2015. Aggregation and the relationship between unemployment
Anderson, Robert, N., Arialdi, M., Miniño, Donna, L., Hoyert, Harry, M., 2001. Rosen- and health. Journal of Health Economics 40 (2), 83–96.
berg comparability of cause of death between ICD-9 and ICD-10: preliminary Malloy, Raven, Smith, Christopher L., Wozniak, Abigail, 2011. Internal migration in
estimates. National Vital Statistics Reports 49 (2), 1–32. the United States. Journal of Economic Perspectives 25 (3), 173–196, Summer.
Ariizumi, Hideki, Schirle, Tammy, 2012. Are recessions really good for your health? McInerney, Melissa, Mellor, Jennifer M., 2012. Recessions and seniors’ health, health
Evidence from Canada. Social Science and Medicine 74, 1224–1231. behaviors, and healthcare use: analysis of the medicare beneficiary survey. Jour-
Arkes, Jeremy, 2007. Does the economy affect teenage substance use? Health Eco- nal of Health Economics 31 (5), 744–751.
nomics 16 (1), 19–36. Miller, Douglas L., Page, Marianne E., Huff Stevens, Ann, Filipski, Mateusz, 2009.
Arkes, Jeremy, 2009. How the economy affects teenage weight. Social Science and Why are recessions good for your health? American Economic Review 99 (2),
Medicine 68 (11), 1943–1947. 122–127.
Bernanke, Ben, 2004. The Great Moderation, www.federalreserve.gov/boarddocs/ Neumayer, Eric, 2004. Recessions lower (some) mortality rates. Social Science &
speeches/2004/20040220/ (retrieved 6 March 2015). Medicine 58 (6), 1037–1047.
Blinder, Alan S., 1973. Wage discrimination: reduced form and structural estimates. Ogburn, William F., Thomas, Dorothy S, 1922. The influence of the business cycle
Journal of Human Resources 8 (4), 436–455. on certain social conditions. Journal of the American Statistical Association 18
Bradford, W. David, Lastrapes, William D., 2014. A prescription for unemployment? (139), 324–340.
Recessions and the demand for mental health drugs. Health Economics 23 (11), Oaxaca, Ronald, 1973. Male–female wage differentials in urban labor markets. Inter-
1301–1325. national Economic Review 14 (3), 693–709.
Brenner, M. Harvey, 1979. Mortality and the national economy. The Lancet 314 Paulozzi, Leonard J., 2012. Prescription drug overdoses: a review. Journal of Safety
(8142), 568–573. Research 43 (4), 283–289.
Brenner, M. Harvey, 1971. Economic changes and heart disease mortality. American Philipson, Thomas, Ebner, Michael, Lakdawalla, Darius N., Corral, Mitra, Conti, Rena,
Journal of Public Health 61 (3), 606–611. Goldman, Dana P., 2012. An analysis of whether higher health care spending in
Buchmueller, Tom, Grignon, Michel, Jusot, Florence, 2007. Unemployment and mor- the United States versus Europe is ‘worth it’ in the case of cancer. Health Affairs
tality in France, 1982-2002. In: Center for Health Economics and Policy Analysis 31 (4), 667–675.
Working Paper 07-04. McMaster University. Ruhm, Christopher J., 2000. Are recessions good for your health? Quarterly Journal
Butler, J.S., 2000. Efficiency results of MLE and GMM estimation with sampling of Economics 115 (2), 617–650.
weights. Journal of Econometrics 96 (1), 25–37. Ruhm, Christopher J., 2003. Good times make uou sick. Journal of Health Economics
Charles, Kerwin Kofi, DeCicca, Philip, 2008. Local labor market fluctuations and 22 (4), 637–658.
health: is there a connection and for whom? Journal of Health Economics 27 Ruhm, Christopher J., 2005. Healthy living in hard times. Journal of Health Economics
(6), 1532–1550. 24 (2), 341–363.
Colman, Gregory J., Dave, Dhaval M., 2013. Exercise physical activity, and exertion Ruhm, Christopher J., 2012. Understanding the relationship between macroecono-
over the business cycle. Social Science and Medicine 93 (September), 11–20. mic conditions and health. In: Andrew, M. Jones (Ed.), Elgar Companion to Health
Cotti, Chad, Tefft, Nathan, 2011. Decomposing the relationship between macroeco- Economics. , second ed. Edward Elgar, Cheltenham, UK, pp. 5–14.
nomic conditions and fatal car crashes during the great recession: alcohol- and Ruhm, Christopher J., 2013. Recessions, healthy no more? In: NBER Working Paper
non-alcohol-related accidents. B. E. Journal of Economic Analysis and Policy 11 No. 19287 (August).
(1), 1–48. Ruhm, Christopher J., 2015. Drug Poisoning Deaths in the United States 1999–2012.
Cutler, David M., 2008. Are we finally winning the war on cancer? Journal of Eco- University of Virginia, Mimeo.
nomic Perspectives 22 (4), 3–26. Ruhm, Christopher J., Black, William E, 2002. Does drinking really decrease in bad
Dávlos, María E., Fang, Hai, French, Michael T., 2012. Easing the pain of an economic times? Journal of Health Economics 21 (4), 659–678.
downturn: macroeconomic conditions and excessive alcohol consumption. Shierholz, Heidi, 2012. Labor force participation: cyclical versus structural changes
Health Economics 21 (11), 1318–1335. since the start of the great recession. In: Economic Policy Institute Issue Brief
Economou, Athina, Nikolau, Agelike, Theodossiou, Ioannis, 2008. Are recessions No. 333 (May 24).
harmful to health after all? Evidence from the European Union. Journal of Eco- Solon, Gary, Haider, Steven J., Wooldridge, Jeffrey, 2015. What are we weighting for?
nomic Studies 35 (5), 368–384. Journal of Human Resources (forthcoming).
Edwards, Ryan, 2011. American Time Use Over the Business Cycle. University of New Stevens, Ann Huff, Miller, Douglas L., Page, Marianne, Filipski, Mateusz, 2011. The
York, Mimeo City. best of times the worst of times: understanding procyclical mortality. In: NBER
Egan, Mark L., Mulligan, Casey B., Philipson, Tomas J., 2013. Adjusting measures of Working Paper No. 17657.
economic output for health: is the business cycle countercyclical? In: National Stock, James H., Watson, Mark W., 2003. Has the business cycle changed and why?
Bureau of Economic Research Working Paper No. 19058. In: Gertler, M., Rogoff, K. (Eds.), NBER Macroeconomics Annual 2002. MIT Press,
Eyer, Joseph, 1977. Prosperity as a cause of death. International Journal of Health Cambridge, pp. 159–218.
Services 7 (1), 125–150. Sullivan, Daniel, von Wachter, Till, 2009. Job displacement and mortality: an analysis
Fishback, Price V., Haines, Michael R., Kantor, Shawn, 2007. Births deaths and new using administrative data. Quarterly Journal of Economics 124 (3), 1265–1306.
deal relief during the great depression. Review of Economics and Statistics 89 Stuckler, David, Basu, Sanjay, Suhrcke, Marc, Coutts, Adam, McKee, Martin, 2009.
(1), 1–14. The public health effect of economic crisis and alternative policy responses in
Freeman, Donald G., 1999. A note on economic conditions and alcohol problems. Europe: an empirical analysis. The Lancet 374 (9686), 315–323.
Journal of Health Economics 18 (5), 661–670. Svensson, Mikael, 2007. Do not go breaking your heart: do economic upturns really
Galí, Jordi, van Rens, Thijs, 2010. The vanishing procyclicality of labor productivity,. increase heart attack mortality? Social Science and Medicine 65 (4), 833–841.
In: Kiel Institute Working Paper No. 1641 (August). Tapia Granados, José A., 2005. Recessions and mortality in Spain, 1980–1997. Euro-
Gerdtham, Ulf-G, Ruhm, Christopher J., 2006. Deaths rise in good economic times: pean Journal of Population 21 (4), 393–422.
evidence from the OECD. Economics and Human Biology 43 (3), 298–316. Tapia Granados, José A., Diez Roux, Ana V, 2009. Life and death during the
Gonzalez, Fidel, Quast, Troy, 2011. Macroeconomic changes and mortality in Mexico. great depression. Proceedings of the National Academy of Sciences 106 (41),
Empirical Economics 40 (2), 305–319. 17290–17295.
Gravelle, H.S.E., Hutchinson, G., Stern, J., 1981. Mortality and unemployment: a Tekin, Erdal, McClellan, Chandler, Jean Minyard, Karen, 2013. Health and health
critique of Brenner’s time-series analysis. The Lancet 318 (8248), 675–679. behaviors during the worst of times: evidence from the great recession. In: NBER
Gruber, Jonathan, Frakes, Michael, 2006. Does falling smoking lead to rising obesity? Working Paper No. 19234.
Journal of Health Economics 25 (2), 183–197. Thomas, Dorothy Swaine, 1927. Social Aspects of the Business Cycle. Alfred A. Knopf,
Halliday, Timothy J., 2007. Business cycles migration and health. Social Science and New York, NY.
Medicine 64 (7), 1420–1424. Warner, Margaret, Chen, Hui Li, Makuc, Diane M., Anderson, Robert N., Miniño,
Heutel, Garth, Ruhm, Christopher J., 2013. Air pollution and procyclical mortality. Arialdi M., 2011. Drug poisoning deaths in the United States, 1980–2008. In:
In: National Bureau of Economic Research Working Paper No. 18959. NCHS Data Brief No. 81. National Center for Health Statistics, Hyattsville, MD.
Kasl, Stanislav V., 1979. Mortality and the business cycle: some questions about Wooldridge, Jeffrey, 1999. Asymptotic properties of weighted M-estimators for vari-
research strategies when utilizing macro-social and ecological data. American able probability samples. Econometrica 67 (6), 1385–1406.
Journal of Public Health 69 (8), 784–788. Xu, Xin, 2013. The business cycle and health behaviors. Social Science and Medicine
Klebba, A. Joan, Scott, Joyce H., 1980. Estimates of selected comparability ratios based 77 (January), 126–136.
on dual coding of 1976 death certificates by the eighth and ninth revisions of Yeung, Gary Y.C., van den Berg, Gerard J., Lindeboom, Marrten, Portrait., France R.M,
the international classifications of diseases. Monthly Vital Statistics Report 28 2014. The impact of early-life economic conditions on cause-specific mortality
(11), 1–19. during adulthood. Journal of Population Economics 27 (3), 895–919.

Education and health: The role of cognitive ability夽

Govert E. Bijwaard a,b , Hans van Kippersluis c,d,e,∗ , Justus Veenman c,f
a
Netherlands Interdisciplinary Demographic Institute (NIDI-KNAW/University of Groningen), PO Box 11650, 2502 AR The Hague, The Netherlands
b
IZA, Bonn, Germany
c
Erasmus School of Economics, Erasmus University Rotterdam, PO Box 1738, 3000 DR Rotterdam, The Netherlands
d
Tinbergen Institute, Rotterdam, The Netherlands
e
Netspar, Tilburg, The Netherlands
f
ERCOMER, Utrecht/Rotterdam, The Netherlands
Article history: We aim to disentangle the relative impact of (i) cognitive ability and (ii) education on health and mortality
Received 17 September 2013 using a structural equation model suggested by Conti et al. (2010). We extend their model by allowing
Received in revised form 21 July 2014 for a duration dependent variable (mortality), and an ordinal educational variable. Data come from a
Dutch cohort born between 1937 and 1941, including detailed measures of cognitive ability and family
background in the final grade of primary school. The data are linked to the mortality register 1995–2011,
such that we observe mortality between ages 55 and 75. The results suggest that at least half of the
unconditional survival differences between educational groups are due to a ‘selection effect’, primarily
C41
I14
on the basis of cognitive ability. Conditional survival differences across those having finished just primary
I24 school and those entering secondary education are still substantial, and amount to a 4 years gain in life
expectancy, on average.
Keywords: © 2015 Elsevier B.V. All rights reserved.
Education
Cognitive ability
Mortality
Structural equation model
Duration model
1. Introduction the most compelling and well established facts in social science
research (Mazumder, 2012). Even in an egalitarian country such
Disparities in health and life expectancy across educational as the Netherlands, with a very accessible health care system, the
groups are striking and pervasive, and are considered one of difference in life expectancy between the university educated and
those who finished only primary school is 6–7 years (CBS, 2008). It is
commonly assumed that a large part of this association derives from
the causal effect of education on health outcomes. An abundant list
夽 Van Kippersluis gratefully acknowledges funding from the National Institute on
of possible mechanisms was proposed, among which occupational
Aging (NIA) under grant R01AG037398, from NETSPAR under the project “Income
demands, health behavior, and the ability to process information
and health, work and care across the life cycle II”, and from the Netherlands Organi-
zation of Scientific Research (NWO Veni grant 016.145.082). The authors acknowl- are the most commonly mentioned (Ross and Wu, 1995; Cutler
edge access to linked data resources (DO 1995–2011) by Statistics Netherlands (CBS). and Lleras-Muney, 2008).
We thank Mars Cramer and Mirjam van Praag for help in accessing the original data Yet, the association between education and health could also
(see https://easy.dans.knaw.nl/ui/datasets/id/easy-dataset:39042), and are grateful stem from (i) ‘reverse causality’, in which childhood ill-health con-
to Mars Cramer, and attendants from the Empirical Health Economics conference in
Munich 2013, the International Health Economics Association conference in Sydney
strains educational attainment (Behrman and Rosenzweig, 2004;
2013, the Multistate Event History Analysis in Hangzhou 2012, and seminar partici- Case et al., 2005) and (ii) confounding ‘third factors’ such as ability,
pants at the Chinese University of Hong Kong, Cornell University, Erasmus University parental background and time preference that influence both edu-
Rotterdam, and the University of Southern California for helpful comments. cation and health outcomes (Fuchs, 1982; Auld and Sidhu, 2005;
∗ Corresponding author at: Erasmus School of Economics, Erasmus University
Deary, 2008).
Rotterdam, PO Box 1738, 3000 DR Rotterdam, The Netherlands. Tel.: +31 10 4088837.
E-mail addresses: bijwaard@nidi.nl (G.E. Bijwaard), hvankippersluis@ese.eur.nl Studies based on natural experiments in education, such as
(H. van Kippersluis), veenman@ese.eur.nl (J. Veenman). changes in compulsory schooling laws, overcome the difficulty of
30 G.E. Bijwaard et al. / Journal of Health Economics 42 (2015) 29–43
separating the direct causal effect of education from third factor The results show that for most ages, cognitive ability and family
effects. The estimates based on these studies point towards a small socioeconomic status explain around half of the raw differences in
effect (Lleras-Muney, 2005; Oreopoulos, 2006; Van Kippersluis mortality across educational groups. Stated otherwise, education
et al., 2011; Meghir et al., 2013), or even insignificant effect of edu- remains important in determining mortality even after controlling
cation on health and mortality (Arendt, 2005; Albouy and Lequien, for cognitive ability, family socioeconomic status, and a range of
2008; Mazumder, 2008; Braakmann, 2011; Clark and Royer, 2013). other background variables. The conditional survival differences
This suggests that confounding factors may well play an impor- across educational groups are even remarkable, and amount to a
tant role in shaping the strong association between education and 4-year gain in life expectancy for those entering at least secondary
health. school compared to those that dropped out after primary school.
Surprisingly little research in economics has investigated the This paper is structured as follows. Section 2 presents the Bra-
contribution of early childhood abilities and childhood social back- bant data including the available register data from Statistics
ground in shaping the association between education and health.1 Netherlands, Section 3 presents the structural equation model that
Some recent economic studies report associations between child- we will use to disentangle the relative contributions of cognitive
hood cognitive and non-cognitive abilities, and health outcomes ability and education on health outcomes. Section 4 presents the
at ages 30–40 using the British Cohort Study (Murasko, 2007), the results and Section 5 discusses them.
U.K. National Child Development Study (Carneiro et al., 2007), the
U.S. National Longitudinal Study of Youth 1979 (Auld and Sidhu,
2. Data and descriptive statistics
2005; Kaestner and Callison, 2011), or the Dutch ‘Brabant data’
(Cramer, 2012). It is established that cognitive ability and some
The data are from a Dutch cohort born between 1937 and 1941.
non-cognitive factors such as self-esteem and conscientiousness
Very detailed information about individual intelligence, social
are associated with health outcomes. Nonetheless, hardly anything
background and school achievement is available for 5823 individ-
is known about (i) the relative impact of education and childhood
uals. The survey was held in the spring and summer of 1952 among
abilities on health outcomes, and in turn (ii) how much of the
pupils of the sixth (last) grade of primary schools in the Dutch
association between education and health is explained by these
province of Noord-Brabant, and hence is referred to as the ‘Bra-
cognitive and non-cognitive abilities.
bant data’. One-fourth of the province population was sampled;
A notable contribution to the literature is a recent series of
mainly by including every fourth child from the schools’ list of
papers by Conti and Heckman (2010), Conti et al. (2010, 2011),
pupils.3 Hartog (1989) investigated the data and found no reason to
and Heckman et al. (2014) who, using the British Cohort Study
doubt representativeness. A selective dropout of pupils before par-
and the National Longitudinal Study of Youth (NLSY79), estimate a
ticipating in the data collection does not exist, as primary school
structural equation model in which the interdependence between
was compulsory and enforcement of school attendance was strict
education, health, and two latent factors capturing cognitive and
(Dronkers, 2002).
non-cognitive abilities is explicitly modeled. The authors show that
Follow-up surveys took place in 1957, 1983 and 1993.4 In 1957
for most health outcomes around half of the association between
only a sub-sample – those who scored above-average on six tests
education and health is driven by cognitive and non-cognitive
– of the original cohort was interviewed about the school careers
abilities and early childhood social background. The other half is
between 1952 and 1957 to particularly investigate school career
interpreted as the causal effect of education on health.
choices of the most intelligent half of the cohort. In 1983 and 1993
While the series of papers by Conti, Heckman and co-authors
attempts were made to trace all initial respondents of the Brabant-
provided a significant contribution to the literature, there are two
cohort to investigate labour market behavior, with overall response
notable limitations. First, the health outcomes are measured at age
rates of around 45 percent. The sample is reduced to 2998 individ-
30, an age at which health differences by education may not have
uals who have measurements in 1952 and in either 1983 or 1993,
fully materialized. In fact, disparities in health and mortality seem
or both.5
to peak around middle-age (Cutler and Lleras-Muney, 2008). Sec-
The Brabant data are subsequently linked to administrative
ondly, the health measures are all self-reported, which may bias
records from Statistics Netherlands. The basis for this linkage is
the estimates since education is related to subjective health per-
identifying information on ZIP code, date of birth, and sex, provided
ceptions (Bago d’Uva et al., 2008).
in 1993 by Dutch municipalities, which includes information on all
In this paper, we aim to disentangle the effects of education and
individuals living in the Netherlands. The administrative records
cognitive ability on health outcomes. We will use the so-called ‘Bra-
are available since 1995. Because of the two-year discrepancy only
bant data’ – a representative cohort of primary school sixth graders
86 percent of the 2998 individuals could be traced in the munici-
in the Dutch province of Noord-Brabant – that has detailed infor-
pality register in 1995, leaving us with a working sample of 2579
mation on cognitive ability and social background measured back
individuals. Administrative records include the mortality register
in 1952. Three follow-up surveys in 1957, 1983 and 1993 contain
and the municipality register for the years 1995–2011 inclusive.
information on education, employment, and self-reported health.
The mortality register is used to identify drop out due to death in
We have linked these data to the mortality register 1995–2011,
such that the impact on mortality can be analyzed.
The contribution of this paper is threefold. First, we study the
relative impact of cognitive ability and education on mortality, as selection of ability due to differential survivorship. His study is based on the Terman
an objective health indicator. The second contribution is that, in data, a cohort of individuals with IQ beyond 140. Hence, apart from differences in
contrast to existing studies that measure health outcomes at ages the model specification, his focus is on an extraordinary sample corresponding to
the 99.6th percentile of the intelligence distribution, with very limited variation
30–40, we observe mortality during ages 55–75. Finally, we extend
in cognitive ability. Not surprisingly, he examines the effect of higher education
the structural equation model by Conti et al. (2010) by allowing for whereas we focus on secondary education.
a duration dependent variable (mortality).2 3
Some schools had school years beginning in April rather than in September. For
these schools, half the pupils of half the schools were included in the sample, which
yielded 369 observations on a total of 5823 (Hartog, 1989).
4
Mathijssen and Sonnemans (1958), Hartog and Pfann (1985), Van Praag (1992),
1
See Gottfredson (2004) for an overview of the epidemiological literature. and Hartog et al. (2002). The complete questionnaire is included in Van Praag (1992)
2
Savelyev (2012) developed a similar structural equation model for mortality as ‘Brabantse zesdeklassers, 1952–2010’.
5
ours, yet using a discrete-time hazard model and not taking into account dynamic In Section 4.2 it is verified that selective attrition does not affect our results.
G.E. Bijwaard et al. / Journal of Health Economics 42 (2015) 29–43 31
the follow-up period. Demographics are obtained from the munic- of the cognitive ability endowment in the final grade of primary
ipality register. school on educational choice and later-life mortality is seen as a
selection effect.9
2.1. Dependent variables The IQ p.m. (‘progressive matrices’) test focuses on mathemat-
ical ability and is a replication of the British Progressive Matrices
Our outcome variable is Mortality, which is identified from the test, designed by Raven (1958). It is considered to be a ‘pure’ mea-
mortality register in the period 1995–2011. Given that most pupils surement of problem solving abilities, as it does not require any
are born around 1940, this implies that we follow mortality from linguistic or general knowledge (Dronkers, 2002). Hence, the Raven
age 55 until 75.6 In our sample, 409 individuals, or 16 percent, died test is supposed to measure fluid or analytic intelligence (Carpenter
during the period 1995–2011. Close to 50 percent died from cancer, et al., 1990). In this sense, the test can be compared to Spearman’s
25 percent from cardiovascular diseases, and 8 percent from respi- g test (1927). The term g refers to the determinants of the common
ratory diseases such as COPD and pneumonia. External causes such variance within intelligence tests, being the core issue of intelli-
as accidents comprise only two percent, as do mental disorders (e.g. gence measurement (Carpenter et al., 1990).
dementia), diseases of the digestive system (e.g. liver cirrhosis) and Table 1 shows that the ability test designed by Raven has an
diseases of the nervous system (e.g. Parkinson). average of 102, with standard deviation of 13 while the vocabulary
test is 101, on average, with standard deviation 13. The correlation
between the Raven test and the vocabulary test is 0.38. This sug-
2.2. Independent variables
gests that while there seems to be some overlap between the two
measurements, the tests additionally gauge some idiosyncratic part
Our main independent variable of interest is Education, here
of cognitive ability. Therefore, we will use both measurements to
defined as the highest level of education attended, in three cat-
build a comprehensive latent factor of cognitive ability. In a robust-
egories: (1) Lower Education, including those who attended at
ness check we solely use the Raven test to see whether the results
most (extended)7 primary school, (2) Lower Vocational Education,
differ.
including those who attended at most lower vocational educa-
tion such as the lower agricultural school or lower polytechnic
schools, and (3) At least General Secondary School, including those
who attended lower general secondary school, higher general 2.3. Control variables
secondary school, and higher vocational education or university.
Education is retrieved mainly from the 1983 and 1993-survey Apart from a fairly standard set of demographic control variables
variables on the highest level of education attended. The maxi- such as Age, whether Male, and Birth Rank, we also have informa-
mum of the two defines Education, and where missing we update tion about the social and school environment of the individuals.
our educational variable with information from the 1957 sur- Most of these variables are reported by the School principal. Family
vey. Socioeconomic Status is measured in three categories from lowest to
Table 1 presents descriptive statistics and shows that 14 per- highest depending on father’s occupation.10 We additionally know
cent did not continue school after primary school forming the Lower whether the child had to work in the parent’s farm or company,
Education category, 35 percent only attended Lower Vocational Edu- defining the binary indicator Child Works, which potentially sig-
cation, and the other 51 percent attended At least General Secondary nals part of the childhood health status. In this (historical) case,
School. Fig. 1 shows the Kaplan–Meier survival curves for a binary however, the variable is mainly dependent on the parents having a
indicator of education with threshold at Lower Education, and sep- firm.
arately for the three education categories. It is clear that the largest Available information regarding the school includes School Type
survival differences are between those with only primary school and the Number of Teachers. Repeat defines the number of classes
and those above primary school, and that the difference grows with that children had to repeat. Further, we know the Teacher’s Advice
age to around ten percentage points near age 75. regarding further education of the child, and the Preference of the
Our second independent variable is Cognitive Ability. In the Bra- Parents concerning the education of the pupil, categories of which
bant data there are two measurements for cognitive ability, both are defined in Table 1, which also includes descriptive statistics.
measured in the final grade of primary school (i.e. around age 12): We have no information about childhood health status, which
(i) the Raven Progressive Matrices Test, and (ii) a Vocabulary test prevents us from investigating the possibility of reverse causality
(picking synonyms).8 The timing of the intelligence tests implies from health to education in our sample. The sample is comprised
that the plausible feedback effects from education to cognitive abil- of pupils who made it to the final grade of primary school. Hence,
ity (Deary and Johnson, 2010; Brinch and Galloway, 2012; Meghir pupils with severe health problems impairing going to school in
et al., 2013) will be seen as an education effect, while the impact the first place will not be present in our sample. Moreover, in the
1983 wave of the survey male respondents were asked whether
they served in the military. The main reason for disqualification of
6
Of the Dutch population 1940 cohort, only 6.8 percent died between the
ages of 12 and 55 – Human Mortality Database, University of California, Berkeley
(USA), and Max Planck Institute for Demographic Research (Germany). Available
9
at www.mortality.org or www.humanmortality.de (data downloaded on July 30, It should be emphasized however that there could be unobserved factors cor-
2012). related to both cognitive ability in the final grade of primary school and later-life
7
At the time, pupils had to stay in school for at least 8 years, or until they reached mortality, which our measure of cognitive ability would be picking up.
10
the age of 14. Since regular primary school only consisted of 6 grades, some schools We classify lower administrative, agricultural, industrial, and other lower work-
offered an additional 2-year extended primary school (“vglo”). ers, and the disabled into the Lowest Socioeconomic Status. If the School Principal
8
The data also contain the so-called LO-IV test, which consists of six sub-tests: considered the family antisocial, the family is also classified into the Lowest Socioe-
regularities in series of numbers, analogies in figures, analogies in words, and sim- conomic Status. Intermediary personnel, self-employed farmers, self-employed
ilarities between concepts (equal, not-equal, cause). Since the quality of this test craftsmen, and the retired are categorized into the Intermediate Socioeconomic Sta-
has been questioned (Hartog et al., 2002, p. 5) we will not use it in our analyses. tus (following Cramer, 2012). Teachers, executives and academics are classified into
There is also information on grades for specific courses (Dutch language, mathe- the Highest Socioeconomic Status. In case father’s occupation is missing, we use
matics (arithmetics), history, physics, geography, health sciences, and traffic), but father’s education for individuals in the 1957 survey. Father’s education is classified
since these are not clean measures of cognitive ability and are relative to others in into 3 levels, which we directly translate into the three socioeconomic statuses. We
one’s classroom, we choose not to use these grades. use mother’s education in case the father died or was not present in the household.
Table 1
Descriptive statistics of the Brabant data sample.
Variable Average Standard deviation Number of observations
Dependent variables
Mortality 0.16 0.35 2579
Independent variables
Education
Lower education 0.14 0.34 2537
Lower vocational education 0.34 0.48 2537
At least general secondary school 0.51 0.35 2537
Raven p.m. test 102.04 13.28 2579
Vocabulary test 101.42 12.87 2579
Control variables
Male 0.58 0.49 2579
Birth rank 2.50 2.55 2412
Family socioeconomic status
Lowest 0.53 0.50 2409
Middle 0.44 0.50 2409
Highest 0.03 0.16 2409
Child works 0.28 0.45 2256
School religion
Roman-Catholic 0.76 0.43 2518
Protestant 0.19 0.40 2518
Special 0.03 0.17 2518
Public 0.02 0.13 2518
Number of teachers 6.92 2.47 2452
Repeat
No repetition of grade 0.64 0.48 2462
Repeated once 0.27 0.45 2462
Repeated twice or more 0.09 0.28 2462
Teacher’s advice
Continue primary school 0.24 0.43 2429
Lower vocational education 0.38 0.48 2429
Lower secondary education 0.24 0.43 2429
Higher secondary education 0.14 0.20 2429
Preference of the parents
Work in family company 0.13 0.33 2200
Paid work without vocational education 0.20 0.28 2200
Paid work with vocational education 0.27 0.44 2200
General secondary education 0.41 0.49 2200
Notes: Author’s calculations on the basis of the Brabant data linked to the municipality register and the mortality register.
compulsory military duty is health problems.11 Since the fraction of depending on the perceived health gains. Hence, the educa-
individuals having served in the military is almost identical across tional choice is endogenous, and in practice it is assumed that
educational levels, this provides some indirect evidence that health selection into schooling can be fully accounted for by using
differences across educational levels were minimal during teenage observed characteristics and unobserved ability. The model con-
years. We furthermore refer to Conti et al. (2010) who showed that sists of three parts: (i) a binary educational choice depending
in their sample childhood health, as measured by childhood height, on latent abilities and other covariates, (ii) potential outcomes
was not an important determinant of educational choice. The lack depending on the choice of education, latent abilities, and other
of information on childhood health should therefore not be a major covariates, and (iii) a measurement system for the latent abili-
source of concern. ties.
The binary indicator for education Di is defined as 1 if individual
3. Methodology i took any education beyond the compulsory schooling age, and 0
if not:

Our empirical approach is an extension of the structural equa- 1 if Di∗ ≥ 0
tion framework developed by Conti et al. (2010). We briefly Di = (1)
describe the Conti et al. model, after which we will present our two 0 otherwise
extensions: allowing for an objective duration dependent variable where we assume Di∗ is an underlying latent utility which is con-
(mortality), and introducing an ordinal educational choice. Finally, tinuous and linear, and depends on latent abilities , and observed
we explain how we disentangle the effects of cognitive ability and characteristics XD :
education on the health outcomes.
Di∗ = XiD + ˛D i + iD (2)
3.1. Basic structural equation model with D being an error term independent of XD and . We assume
that D is normally distributed, which implies that we have a probit
The basic Conti et al. model allows a way of modeling the inter- model for the educational choice. We fix the variance at 1 since the
relationships between abilities, education and health outcomes, variance is not identified in a probit model.
where individuals potentially make their educational decisions The second part is the potential outcomes part, in which
there are two potential outcomes Yi1 and Yi0 , where the former
is the outcome in case the individual chose to pursue education
11
Other reasons were exemption owing to one’s brother’s service, grounds of beyond what is compulsory, and the latter is the outcome in case
conscience, or personal indispensability (e.g. Van Schellen and Nieuwbeerta, 2007). the individual dropped out of school right after the compulsory
Fig. 1. Kaplan–Meier survival function by education level in two categories (top) and three categories (bottom).
schooling age. Both Yi1 and Yi0 depend on latent ability , and on equation for latent ability is defined by (5), where we have two
observed characteristics XY : measurements for latent cognitive ability.
It is common practice to define the potential outcomes of a
Yi1 = ˇ1 XiY + ˛1 i + i1 (3) duration variable like mortality in terms of the hazard that the out-
come of interest occurs.12 We define (1) (t) as the hazard rate for
Yi0 = ˇ0 XiY + ˛0 i + i0 (4)
an individual with education level beyond primary school (Di = 1),
with (0 , 1 ) independent of XY and , independent of iD con- and (0) (t) as the hazard rate for an individual with an education
ditional on XY and , and both follow a normal distribution with level equal to primary school (Di = 0). We assume a Gompertz pro-
variance 12 and 02 , respectively. portional hazard model for the two potential hazards, which has
The final part of the model is the measurement equation, where been shown to be an accurate representation of mortality between
one or two measurements, Mik (k = 1, 2), implicitly define the latent the ages of 30 and 80 (e.g. Gavrilov and Gavrilova, 1991; Cramer,
ability : 2012). Both potential hazards depend on the latent ability ,13 and
observed characteristics XY :
Mik = ık XiM + ˛Mk i + iM k (5)
(0) (t|X Y , ) = exp (a0 t + ˇ0 XiY + ˛0 i ) (6)
with Mk independent of XM and . We assume that Mk is normally
2 .
distributed with variance M
k
(1) (t|X Y , ) = exp (a1 t + ˇ1 XiY + ˛1 i ) (7)
3.2. Allowing for a duration outcome as dependent variable
While the basic model is useful in disentangling the relative

12
We can use a duration model with potential outcomes because the endogenous
contributions of education and abilities on continuous and binary
education choice is determined before mortality plays a major role: mortality can be
health outcomes, it does not allow for a duration outcome like largely ignored for young ages. If the education choice would still play a role during
survival till death. higher mortality rates the model for educational choice should take selective sur-
In our extended model, the first part is the same, defining a vival effects into account. Then a ‘timing-of-events’ model could be a better model,
binary educational choice as in (1) and (2), placing the cut-off at see Abbring and van den Berg (2003).
13
The latent ability in the hazard is similar to including unobserved heterogeneity
Lower Education (primary school). Hence, in our model individ- in the hazard, and for identification the unobserved heterogeneity needs to have a
uals face the choice of quitting after primary education (D = 0), finite mean. The mean of the unobserved heterogeneity term in our model, e˛ , only
or enrolling into secondary education (D = 1). The measurement depends on ˛ and is finite when ˛ is finite.
The effect of latent ability on the hazard is captured by ˛0 and with h() is a normal distribution with variance 2 = 1. The
˛1 . The corresponding potential survival rates are maximum likelihood estimation of the parameters involves the
t calculation of an integral that does not have an analytical solu-
(0) Y (0) Y tion. However, Gaussian quadrature can approximate this one
S (t|X , ) = exp − (s|X , )ds (8)
0 dimensional integral very well. Hence, we estimate the parame-
t ters using maximum likelihood on the basis of Gaussian quadrature
S (1) (t|X Y , ) = exp − (1) (s|X Y , )ds (9) approximation.15
0
3.3. Allowing for an ordered discrete educational choice
Without additional restrictions on the distribution of the latent
factors the model is not identified. However, because we have Usually education is available in more than two categories with a
an intrinsically non-linear duration outcome instead of a linear natural ordering of the alternative education levels. As a robustness
outcome, the Ledermann bound on the number of measurements check (see Section 4.2), we extend the standard model to account
compared to the number of latent factors does not apply. Identi- for this type of ordinal independent variable, where the starting
fication of our model is closely related to the identification in a point is, again, an index model with a single latent variable given as
mixed proportional hazard (MPH) model, where we assume that in (2). Assume there are K education levels and define Di as the indi-
the unobserved heterogeneity has a log-normal distribution. A MPH cator of education that takes value k if the individual has reached
model is identified when the unobserved heterogeneity term has education level k:
finite mean and is independent of the other observed factors (Elbers
and Ridder, 1982). When we assume a normal distribution for the Di = k if
k−1 < Di∗ ≤
k (13)
latent ability, ∼N(0, 2 ), the implied unobserved heterogeneity in
where
0 =− ∞ and
K =∞. Then, assuming normally distributed D ,
the hazard (Eqs. (6) and (7)) has mean exp 1 2 2
˛
2 j
, for j = 0, 1. For we have an ordered probit model with (K − 1) additional thresh-
old parameters,
k . Each education level now has a corresponding
identification ˛j or 2 needs to be fixed. We choose to fix 2 = 1.
potential Gompertz hazard (k) , that depends on exogenous char-
Thus the latent ability follows a standard normal distribution.14 acteristics XY and on the unobserved latent ability, , i.e.,
An important feature of duration data is that for some individu-
als we only know that he or she survived up to a certain time (often (k) (t|X Y , ) = exp (ak t + ˇk XiY + ˛k i ) (14)
the end of the observation window). In this case an individual is
(right) censored, i = 0, and we use the survival function instead of 3.4. Disentangling the effects of ability and education
the hazard in the likelihood function. Another feature of duration
data is that only individuals are observed having survived up to a At the individual level, the main estimate of interest is the sur-
certain age. In our case, mortality follow-up is only available from vival difference across the two educational levels, S(1) (t) − S(0) (t),
age 55 onwards. In this case the individuals are left-truncated, and where S(1) (t) denotes the survival time up to age t for individuals
we need to condition on survival up to the age of first observation, with at least secondary education (D = 1), and S(0) (t) is the survival
t0 . time up to age t for those with primary school only (D = 0). We
The likelihood contribution of individual i in our duration are interested in the expected value of this identity for a given
model is (sub)population. In the sample, the difference in the Kaplan–Meier
survival curves is the unconditional survival difference between the
Li = (j) (t)i S (j) (t)/S (j) (t0 ),
(j)
j = 0, 1 (10) two levels of educational attainment, E[S(1) (t) − S(0) (t)]. This uncon-
ditional difference can be interpreted as the association between
With left-truncated data the distribution of latent ability among
education and mortality.
the survivors (up to the left-truncation time) changes. When
Here we are interested to what extent this association is driven
only individuals are observed that have survived until age t0 the
by cognitive ability and other control variables. Using the estimated
likelihood contribution is
parameters, we define the conditional survival difference between
the two levels of educational attainment, where conditioning is

i
Di
Li = ˚ XiD + ˛D · (1) (t|X Y , ) S (1) (t|X Y , )/S (1) (t0 |X Y , ) based on cognitive ability and the other control variables, as fol-
lows:

1−Di
× ˚ −XiD − ˛D · (0) (t|X Y , )
i
S (0) (t|X Y , )/S (0) (t0 |X Y , )
E S (1) (t) − S (0) (t)|X = x, = c dF X, (x, c) (15)

2 M − ık XiM − ˛Mk

1 ik
× dH(|T > t0 ) (11) where X are the covariates, and is the value of latent cognitive
Mk Mk
k=1
ability. We integrate over the joint distribution of the covariates
with the distribution of the latent abilities conditional on survival

up to t0
D

˚ Xi + ˛D S (1) (t0 |X Y , ) + ˚ −XiD − ˛D S (0) (t0 |X Y , ) h()
dH(|T > t0 ) =

(12)
D D
˚ Xi + ˛D S (1) (t0 |X Y , ) + ˚ −Xi − ˛D S (0) (t0 |X Y , ) h() d
14
In principle restricting the distribution of to a normal distribution is not nec-
essary. In line with the literature on MPH models a discrete distribution with finite
15
points of support would be an alternative choice (Heckman and Singer, 1984). How- Gaussian quadrature is a numerical integration method based on Hermite poly-
ever, using a normal distribution assumes a continuum of ability values rather than nomials (Press et al., 1993). It provides an efficient approximation for evaluating
a finite number, and the distribution of intelligence is generally found to be close to indefinite integrals based on normal distributions (Butler and Moffitt, 1982). A sim-
normal (Gottfredson, 1997). ilar method has been applied before in survival analysis (Lillard, 1993).
and latent ability, FX, (x, c).16 Note that these conditional survival For the ordinal education measure the procedure is very similar.
differences are conditional on surviving to the initial age, which is We have three potential hazards and three possible survival func-
55 in our case. tions, one corresponding to each educational level. Although there
Unfortunately, the integrals cannot be solved analytically, as the are more possibilities now to compare the educational groups, we
dimension of the covariates X is too large. Another issue is that the choose to focus on two binary comparisons of the particular edu-
comparison of the survival functions involves the counterfactual of cational level to the educational level directly preceding it. Hence,
surviving with another education level. Hence in order to illustrate we estimate two different conditional survival differences: (i) lower
the conditional survival differences we resort to simulation.17 For vocational education compared to primary education only and (ii)
each education level we simulate the survival of 10,000 individu- at least general secondary education compared to lower vocational
als. To each individual we assign observed characteristics based on education.
the empirical distribution in the sample. The simulation procedure
consists of four steps:
4. Results
1. Draw a vector of parameter estimates assuming that the esti-

Our baseline specification is the survival model with a binary
mator is normally distributed around the point estimates with a
education variable and two measurements for cognitive ability.
variance-covariance matrix equal to the estimated one.
We estimate the model by maximizing the likelihood in (11), and
2. Compute the conditional hazard rates based on these parameter
present the results in Section 4.1. Exogenous factors influencing
values and individual characteristics using (6) and (7), condi-
the outcome, XY in (6) and (7), include male, whether the child is
tional on the value of the latent ability.
working, family socioeconomic status, and birth rank. Factors addi-
3. Determine the unconditional survival function for every individ-
tionally influencing the measurements of cognitive ability, XM in (5),
ual and for the whole age-range from 55 to 100 on the basis of
include school type and the number of teachers at school. Finally,
equations (8) and (9), and by integrating out the latent ability
on top of the exogenous variables affecting the outcome and intel-
through Gaussian quadrature methods.
ligence, additional factors influencing the educational choice, XD in
4. Calculate the average (over the 10,000 individuals) survival at
(2), include the teacher’s advice, whether a grade was repeated, and
each age (with steps of a month).
the preference of the parents.
We repeat these steps 100 times to obtain 100 independent obser-

vations of the survival function for each education level. 4.1. Main results
With this information, we can compute the fraction of individu-
als who are still alive at a certain age for the two educational groups Table 2 contains the parameter estimates of the model. The
(both the average and the variance). This defines the conditional first column shows that our latent factor of cognitive abil-
survival difference between the two educational groups, since we ity strongly influences the educational choice, as expected. The
condition on cognitive ability and the other covariates. The simu- probability of entering secondary school can be derived from
lations also allow us to compute life expectancy (and its standard the impact of the latent factor and is already beyond 0.6 for
error) separately for the two educational groups, by multiplying those with the lowest cognitive abilities, and gradually increases
the survival function with the age steps. towards 1.0 for those with the highest cognitive abilities (see
In order to illustrate the relative importance of education and Fig. 2).
cognitive ability, we decompose the unconditional survival differ- Conditional on other observed characteristics such as parental
ences from the Kaplan–Meier curves in Fig. 1 into the conditional preference, teachers advice, and family socioeconomic status,
survival difference and a residual, which is a selection effect on the males were less likely to enter secondary school, as are chil-
basis of cognitive ability and the other observable factors. Mathe- dren who had to work in the family business during primary
matically, school. Family socioeconomic status is a strong predictor of edu-
cation, with children from families with a higher socioeconomic
E S (1) (t) − S (0) (t) status significantly more likely to enter secondary school. Chil-

dren who went to protestant or other schools, as compared to
= E S (1) (t) − S (0) (t)|X = x, = c dF X, (x, c) + ε X, those who went to catholic schools, were more likely to enter
secondary school. Strong predictors of educational choice are the
(16)
teacher’s advice and the preference of the parents. Children who
where the LHS represents the unconditional survival differences repeated one or more grades were less likely to enter secondary
represented by the Kaplan–Meier survival curve, the first part of school.
the RHS is the conditional survival difference defined in (15), and Columns 2 and 3 show that on both measurements of cognitive
ε(X, ) represents the selection effect on the basis of observable ability girls did slightly better, and children from families with a
characteristics X and cognitive ability . Note that this selection higher socioeconomic status had higher scores. School characteris-
effect is the combination of actual selection bias and selection based tics such as the school type and the number of teachers also relate
on perceived gains of secondary education. to the test scores.
The final two columns of the table present the determinants
of mortality across the two educational groups. While the point
16
Since the conditional survival differences may well be very different for indi- estimates of the effect of cognitive ability on mortality are neg-
viduals in different parts of the education distribution, we additionally define the ative as expected, the effects do not reach statistical significance
conditional survival difference for those with D = j, j = 0, 1 as follows: at the 10 percent level, although the p-values are close to the 10

percent cut-off. The point estimates suggest that the effect of a
E S (1) (t) − S (0) (t)|X = x, = c, D = j dF X,|D=j (x, c) one-standard deviation increase in cognitive ability is to reduce
the mortality hazard by 18% and 28%, for those with primary
17
Cockx and Picchio (2012) use a similar simulation procedure to obtain predic-
school only and those with at least secondary education, respec-
tions and the standard errors for non-linear combinations of the parameters. Elbers tively. These results are comparable to both the results presented
et al. (2003) used a related method to account for counterfactual uncertainty. in the review by Batty et al. (2007) for the Scottish Mental survey
Table 2
Duration model – binary education variable, two measurements for ability.
Outcome Education Raven Test Vocabulary Hazard Hazard

Test
D M1 M2 (0) (1)
Cognitive ability
˛ 0.36*** 9.63*** 10.37*** −0.33 −0.20
(0.09) (1.49) (1.58) (0.27) (0.13)
Constant term
c 2.13*** 3.59*** 4.69*** −11.68*** −10.71***
(0.21) (0.74) (0.69) (1.50) (0.80)
a 0.11*** (0.02) 0.09*** (0.01)
Control variables
Male −0.25*** −0.93* −0.87* 0.33 0.66***
(0.08) (0.53) (0.49) (0.25) (0.12)
Child is working – base is “No”
Yes −0.29*** −3.84*** −7.14*** 0.34 0.15
(0.09) (0.63) (0.58) (0.26) (0.14)
Missing −0.32*** −1.07 −2.52*** −0.87 0.12
(0.14) (0.90) (0.83) (0.57) (0.19)
Family socioeconomic status – base is “Low”
Middle 0.42*** 2.55*** 2.24*** −0.34 0.00
(0.10) (0.54) (0.50) (0.30) (0.12)
High 0.42 4.16*** 4.68*** −0.34 0.44
(0.46) (1.64) (1.52) (0.30) (0.29)
Missing −0.54*** −4.39*** −7.63*** −0.67 0.17
(0.18) (1.30) (1.20) (0.59) (0.30)
Birthrank – base is “First”
Second −0.15 0.53 −0.02 −0.17 −0.00
(0.12) (0.79) (0.73) (0.38) (0.16)
Third or Fourth −0.09 −0.22 −2.70*** 0.09 −0.19
(0.11) (0.73) (0.68) (0.35) (0.16)
Fifth or higher −0.09 −3.02*** −4.52*** 0.09 −0.26*
(0.11) (0.73) (0.68) (0.35) (0.16)
Missing 0.11 −0.63 0.47 1.13* −0.65*
(0.31) (1.47) (1.36) (0.62) (0.34)
School religion – base is “Catholic”
Protestant 0.31*** 0.62 2.59***
(0.11) (0.68) (0.63)
Other 0.42** 5.19*** 7.32***
(0.20) (1.13) (1.04)
Number of teachers – base is “5–8 teachers”
≤4 −0.16 −3.81*** −3.16***
(0.10) (0.73) (0.67)
9–12 0.05 0.37 0.42
(0.10) (0.63) (0.59)
Missing 0.33 0.81 0.66
(0.22) (1.30) (1.21)
Teacher’s advice – base is “Lower vocational school”
Continued primary school −0.22**
(0.09)
Lower general secondary school 0.42**
(0.17)
Higher general secondary school 0.39
(0.26)
Missing −0.56**
(0.25)
Repeat grade – base is “None”
Once −0.30***
(0.09)
Twice or more −0.74***
(0.12)
Missing 0.74*
(0.42)
Preference of the parents – base is “Only vocational education”
Work in own company −0.78***
(0.19)
Work without education −1.29***
(0.19)
Work with education −0.88***
(0.20)
General secondary school −0.28
(0.18)
Missing −0.85***
(0.18)
Notes: Author’s calculations on the basis of the Brabant data linked to the municipality register and the mortality register.
*
p-value < 0.1.
**
p-value < 0.05.
***
p-value < 0.01.
−1 to 7). Around age 74, entering secondary school is associated

with a 5 percentage point increase in the survival probability (90%
confidence interval 1–10).
If we extrapolate the estimated survival functions outside of
our observed age window, the simulations allow computing the
estimated differences in life expectancy for the average individ-
ual in the sample. This provides an alternative summary measure.
The life expectancy of those only finishing primary school is 82.86
(standard error 1.37), compared with 87.15 (1.11) for those having
finished at least secondary school,18 a statistically significant dif-
ference of 4.29 years (1.58). This implies that entering secondary
school is associated with an increase of more than 4 years in life
expectancy, which is within the bandwidth of the raw survival dif-
ference of 5 years across individuals with primary and secondary
education. It has to be acknowledged, however, that this estimate is
based upon extrapolation and hence on relatively strong functional
form assumptions.
Fig. 2. Relationship between cognitive ability and the binary measure for education. We decompose the unconditional differences in the
Kaplan–Meier survival curves from Fig. 1 into a conditional
difference and a selection effect based on cognitive ability and
for individuals born in 1921 whose intelligence was measured at other observable control variables.19 Fig. 4 shows that at early
age 11, with mortality follow-up for 65 years (reduction in haz- ages mortality differentials are mainly due to selection effects,
ard rate 21%), and the results presented by Batty et al. (2009) for while after age 60 the importance of education increases. For
a cohort of one million Swedish whose IQ was measured at age most ages, the selection effect is responsible for around half of the
18 and who were followed for 20 years (reduction in hazard rate unconditional differences in survival across educational groups. It
25%). Males have a higher hazard of dying compared to females, has to be acknowledged though that these conclusions are based
although the effect is only statistically significant among the higher on the point estimates, uncertainty around which is relatively
educated. large in this case.
The coefficients in Table 2 allow computing the conditional To gauge the importance of cognitive ability in the selection
survival difference across educational groups, as described in Sec- effect, we additionally ran all models without the latent factor
tion 3.4. Fig. 3 shows these conditional survival differences for all for cognitive ability. The results show that the conditional sur-
age groups from 55 to 75 years of age. The survival difference vival differences are larger in a model without cognitive ability.20
between the two educational groups, conditional on family back- This is an indication that cognitive ability plays an important role
ground and cognitive ability, is positive and increases with age. in the selection effect. It is tempting to decompose the selection
Note that the confidence intervals are fairly wide, such that the effect into a selection due to cognitive ability and a selection on
conditional survival differences only reach statistical significance other observable characteristics. The selection on other observable
at higher ages. The sizes can be interpreted as percentage point dif- characteristics can be computed as the difference between (i) the
ferences in the survival probability at a certain age. Hence, around unconditional difference from the observed Kaplan–Meier survival
age 70, entering secondary school is associated with a 2 percentage rate of the two education levels and (ii) the conditional survival dif-
point increase in the survival probability (90% confidence interval ference from the model without cognitive abilities. The selection
on cognitive ability can then be easily computed as the difference
between the total selection effect and the part of the selection effect
attributed to other observable characteristics. This is illustrated in
0.12
Conditional survival difference Fig. 5, which shows that cognitive ability explains the largest part
90% upper and lower bound of the selection effect. In fact, selection on other observable factors
is even negative between ages 60 and 70. We have to emphasize,
0.10
however, that this interpretation should be taken with care as cog-

nitive ability could be correlated to other control variables in the
0.08
model, and as such the selection effect may not be additive.

difference in survival
0.06
4.2. Robustness checks
In a model with an ordinal educational choice, corresponding

0.04
to the three levels in the definition of Education in Section 2, the

0.02
18
These outcomes are reasonably close to the gender-education specific estimated
0.00
life expectancies conditional on surviving to age 55 that Statistics Netherlands

presents on the basis of the 2012 mortality risks (see http://www.cbs.nl/en-GB/
menu/themas/gezondheid-welzijn/cijfers/extra/resterende-gezonde-
−0.02
levensverwachting.htm?Languageswitch=on). Men and women having finished

primary education can expect to live up to 78.1 and 82.2, respectively. Men and
55 60 65 70 75 women having finished higher education can expect to live up to 83.3 and 87.1,
age respectively.
19
The corresponding graphs using the distribution of X those with D = 1 and D = 0
Fig. 3. Conditional survival differences (see Eq. (15)) by age and binary education are very similar.
20
category variable. Results are available upon request.
0.10
Conditional survival difference
Selection Effect
0.08
0.06
0.04
0.02
0.00
55 57 59 61 63 65 67 69 71 73
age

Selection Effect
0.10
0.05
0.00
−0.05
55 57 59 61 63 65 67 69 71 73 75
age
Fig. 4. Decomposition of unconditional difference in the Kaplan–Meier survival function into conditional differences and a selection effect based on observed characteristics
and cognitive ability, with binary education variable and two measurements for cognitive ability.
coefficient estimates of the exogenous variables are very simi- If we decompose the unconditional survival differences
lar to the ones presented for the binary educational variable.21 between the three educational groups into a conditional survival
Fig. 6 presents the conditional survival differences for the three difference and a selection effect, we obtain Fig. 7. This graph
different educational levels. It is clear that there is a large, but shows that the conditional survival difference between primary
insignificant, conditional survival difference between lower voca- and vocational education is positive and becomes larger than the
tional school (level 2) and primary school (level 1). At age 75, selection effect from age 70 onwards, in line with the findings of
those who only attended primary school are around four per- the dichotomous indicator for education. The conditional survival
centage points more likely to die than those who attended lower difference between vocational and higher education is negligible.
vocational school. The conditional survival difference between gen- Taken together, Figs. 6 and 7 clearly indicate that the largest dif-
eral secondary school and lower vocational school is practically ference is between those having finished primary school and those
zero. beyond primary school, such that the dichotomization in the pre-
vious subsection seems justified.
While mortality is an objective, and in some sense ‘the ulti-
mate’, health outcome, the influence of education and cognitive
21 ability may differ depending on the health outcome used. In the
All results not presented and the details of the models used in this section are
available upon request. 1993 wave of our Brabant survey, hence around age 53 for our
0.05
Selection effect (other)
Selection effect (cognitive skills)
0.04
0.03
0.02
0.01
0.00
−0.01
−0.02
55 57 59 61 63 65 67 69 71 73
age

Selection Effect (other)
0.10
Selection Effect (cognitive skills)

0.05
0.00
−0.05
55 57 59 61 63 65 67 69 71 73 75
age
Fig. 5. Decomposition of observed difference in the Kaplan–Meier survival function into conditional differences and a selection effect due to observed characteristics and
cognitive ability, and other selection effects based on observed characteristics only, with binary education variable and two measurements for cognitive ability (with 90%
confidence intervals, below).
sample, a subjective assessment of one’s health was asked to the

respondents in five categories, i.e. ‘poor’, ‘sometimes good, some-
times bad’, ‘fair’, ‘good’, and ‘very good’. We estimated the model
described in Section 3, now allowing for an ordinal dependent vari-
able, to check robustness to our main outcome measure, and to
compare our results to the literature.
We estimated the conditional difference in the probability to
report any of the five categories between the two educational
groups. The conditional difference in the probabilities for the
categories ‘poor’ and ‘sometimes good, sometimes bad’ are signif-
icantly lower with −0.03 (standard error 0.007) and −0.08 (0.011)
respectively. The conditional difference in the probabilities for
the categories ‘fair’ and ‘very good’ are very close to zero, while
the conditional difference in the probability of reporting to be in
‘good’ health between those having finished only primary school
and those entering secondary education is large (and significant)
and amounts to a 15 (standard error 0.011) percentage points
increase.
When comparing our results to the literature, we confirm the
findings of Hartog and Oosterbeek (1998) that both education and
cognitive ability affect self-reported health. Conti and Heckman
(2010) used a binary indicator for ‘poor health’ and found that half
of the raw differences in poor health is due to a treatment effect of
education and the other half was selection. We find that selection
Fig. 6. Conditional survival differences by age, ordinal education variable, two mea- is not as important as in our mortality analyses, while education
surements for cognitive ability. does play a large role in explaining the raw differences in health
levels in a model with five health levels and binary education, see
Conditional survival difference primary to vocational education

Selection Effect primary to vocational education
Conditional survival difference vocational to higher education
0.06
Selection Effect vocational to higher education
0.04
0.02
0.00
−0.02
55 57 59 61 63 65 67 69 71 73
age
Fig. 7. Decomposition of observed difference in the Kaplan–Meier survival function into conditional differences and a selection effect based on observed characteristics and
cognitive ability, with ordinal education variable and two measurements for cognitive ability (with 90% confidence intervals: lower left primary to vocational education;
lower right vocational to higher education).
Conditional difference
0.15
Fig. 8. This suggests that the relative contributions of education and Selection effect
selection effects may well differ across objective health measures
and the subjective health measures that are commonly used in the
literature.
0.10
Since the sample size is somewhat small we chose not to

present all results separately by gender. Yet, since both educa-
tional choices and survival are obviously dependent on gender, we
0.05
ran all models separately for males and females. Strong disparities
in survival across educational groups exist for both males and
females. This can be distracted from Fig. 9 where the height of
0.00
the bar indicates the unconditional survival differences across the

two educational groups. While unconditional survival differences
across educational groups are larger for females, the relative
−0.05
importance of education, derived from the decomposition of the

raw survival differences is higher for males.
One could argue that the Raven progressive matrices test is a
−0.10
purer measurement of cognitive ability and should be used inde-

pendently from the vocabulary test. We ran all analyses for both
the binary and the ordinal educational classification, using only the
poor sometimes bad fair good very good
Raven test as a measure of cognitive ability. The results were very
similar. Fig. 10 depicts the conditional survival difference for the health status
model without latent ability, a model with one measurement and
Fig. 8. Decomposition of observed difference in the self-reported health into con-
a model with two measurements.
ditional differences and a selection effect based on observed characteristics and
Even though the initial sample in 1952 was found to be repre- cognitive ability, with binary education variable and two measurements for cogni-
sentative for the Dutch population at that time, more than half of tive ability (with 90% confidence bars).
the sample is lost between 1952 and our observation period that
Conditional survival difference Conditional survival difference
0.14
0.14
Selection effect Selection effect
0.12
0.12
0.10
0.10
0.08
0.08
0.06
0.06
0.04
0.04
0.02
0.02
0.00
0.00
55 57 59 61 63 65 67 69 71 73 75 55 57 59 61 63 65 67 69 71 73 75
age age
0.15
Conditional survival difference Conditional survival difference
0.20
Selection Effect Selection Effect
0.10
0.15
0.05
0.10
0.05
0.00
0.00
−0.05
−0.05
−0.10
55 57 59 61 63 65 67 69 71 73 75 55 57 59 61 63 65 67 69 71 73 75
age age
Fig. 9. Decomposition of observed difference in the Kaplan–Meier survival function into conditional differences and a selection effect based on observed characteristics and
cognitive ability, for males (left) and females (right).
starts in 1995. This could lead to an attrition bias, if attrition is non- again did not find any deviation. All these results are summarized
random. Unfortunately, we do not have access to the original data in Fig. 11 and the detailed estimation results are available upon
files such that we cannot investigate attrition directly. However, request.
Hartog (1989) investigated the non-response for the 1983 survey Finally, we varied the observed characteristics in the model.
and found no attrition bias in a wage analysis.22 Since the sample First by including additional variables among the exogenous vari-
in 1983 has been shown to be representative, we reran all analy- ables such as family size, number of children, additional school
ses on just the respondents that were observed in 1983 and found characteristics (e.g. whether restricted to girls, restricted to boys,
no substantial changes in the results. This suggests that selective or mixed), and whether both parents were still alive. These vari-
attrition does not affect our results. ables were not statistically significant in any of the models, and
The data contains information about children from different did not alter the results. Second, we also checked robustness to
years of birth. Most of them born in earlier years had to repeat excluding individuals with item non-response on some of the
a class and their average cognitive skills are lower. This could observed characteristics, in which case too the results remain
be a potential source of selection, if staying back was due to low similar.
cognitive ability, or, worse, to health reasons. We ran a robustness
check in which we excluded individuals born in 1937, 1938 and
1941. The results show that the conditional survival difference 5. Discussion
hardly deviates from the base model, if anything the difference
even becomes larger. Finally, we included an indicator for the This paper estimates to what extent survival differences across
year of measurement of the education level (1983 or 1993), and educational groups are due to a ‘selection effect’ based on cognitive
ability and other background variables. We extend the structural
equation model of Conti et al. (2010) to allow for a duration depen-
dent variable and an ordinal educational choice, and estimate the
22
Following Hartog (1989) we investigated whether the attrition between 1993 model on the basis of a Dutch cohort born around 1940 for which
and 1995 was related to observed characteristics. Literally all explanatory variables
including education, family background, and intelligence were not related to attri-
we observe mortality between ages 55 and 75. Most important
tion. The only exception was self-reported health; a worse health status increased conclusion is that the selection effect based on cognitive ability is
the probability of attrition between 1993 and 1995. responsible for around half of the raw differences in survival. Yet,
similarity in findings, irrespective of the health measures and

samples used, two tentative conclusions regarding the education-
health gradient are emerging. First, at least half of the raw
association between education and health is due to confounding
‘third factors’, of which cognitive ability proved very important in
our analysis, while Conti et al. (2010) and Savelyev (2012) stress the
importance of non-cognitive factors, in particular conscientious-
ness. Second, even after controlling for cognitive ability, family
socioeconomic status, and a range of other background variables,
education seems to remain important in determining mortality.
This suggests that at least part of the educational differences in
health outcomes is due to a genuine, causal effect of education on
health.
However, a limitation of our data is the absence of direct mea-
surements of non-cognitive ability. Hence, we cannot rule out that
specific non-cognitive factors influence both education and health,
such that our ‘conditional survival difference’ across educational
groups cannot be interpreted as – and is likely to be an upper
bound to – the causal effect of education on mortality. Moreover,
we may overestimate the influence of cognitive ability if correlated
non-cognitive abilities are omitted from the model.23
As a starting point for sorting out this important issue, we
Fig. 10. Conditional survival differences by age, binary education variable. Base: two can use the teacher’s advice regarding secondary education of the
measurements for cognitive ability; Model without latent skills and model with 1
child. It is presumably a function of both the cognitive and non-
measurement for cognitive ability (Raven test).
cognitive abilities of the pupil. Hence, one may argue that while
controlling for cognitive ability, the teacher’s advice is a proxy
for non-cognitive abilities. When allowing the teacher’s advice
to influence mortality directly, on top of being a determinant of
educational choice, the conditional survival differences become
Base
0.8
smaller, as illustrated in Fig. 11. This evidence corroborates our

main conclusion that the selection effect explains at least half
of the association between education and mortality. It also sug-
gests that the remaining causal effect of education on mortality
is likely to be smaller when taking non-cognitive abilities into
account.
Another limitation is the relatively small sample size, which nat-
urally compromised the precision of our results. Care should be
taken in interpreting the exact fraction of survival differences into a
selection effect and a part attributable to education, given the large
uncertainty around these estimates. While the conditional survival
differences across educational groups from age 73 onwards as well
as the extrapolated differences in life expectancy across educa-
tional groups are statistically significant, we cannot rule out the
absence of conditional differences in survival across educational
groups at younger ages.
A fruitful avenue for future research would be to investigate
the effect of both education and cognitive abilities on health out-
comes using a more elaborate set of non-cognitive abilities. In such
research the literature could benefit from our structural equation
model that allows for a duration dependent variable like mortality,
Fig. 11. Conditional survival differences by age, binary education variable using
and an ordinal independent variable such as educational attain-
alternative models. Base: two measurements for cognitive ability; 1983: only ment.
respondents observed in 1983; non-cog: non-cognitive skills included; 1939–40:
only respondents born in 1939–40; edudum: dummy for measurement of education
included.
Appendix A. Supplementary data
Supplementary data associated with this article can be

even conditional on cognitive ability and a wide range of individ-
found, in the online version, at http://dx.doi.org/10.1016/j.
ual characteristics, survival differences between individuals having
jhealeco.2015.03.003.
finished only primary school and those who entered at least sec-
ondary education are still substantial, and correspond to a 4 year
difference in life expectancy.
Even though we analyze mortality between ages 55 and 75
rather than self-reported health at age 30, our findings are in line 23
Although the literature suggests that most non-cognitive abilities are uncorre-
with the results presented by Conti et al. (2010). Due to this striking lated with IQ (Borghans et al., 2011; Savelyev, 2012).
References Elbers, C., Ridder, G., 1982. True and spurious duration dependence: the iden-
tifiability of proportional hazards models. Review of Economic Studies 49,
403–410.
Abbring, J., van den Berg, G., 2003. The nonparametric identification of treatment
Fuchs, V.R., 1982. Time preference and health: an exploratory study. In: Fuchs, V.
effects in duration models. Econometrica 71 (5), 1491–1517.
(Ed.), Economic Aspects of Health. The University of Chicago Press, Chicago.
Albouy, V., Lequien, L., 2008. Does compulsory education lower mortality? Journal
Gavrilov, L.A., Gavrilova, N.S., 1991. The Biology of Life Span: A Quantitative
of Health Economics 28 (1), 155–168.
Approach. Harwood Academic Publisher, New York, ISBN 3-7186-4983-7.
Arendt, J.N., 2005. Does education cause better health? A panel data analysis
Gottfredson, L., 1997. Mainstream science on intelligence: an editorial with 52 sig-
using school reforms for identification. Economics of Education Review 24 (2),
natories, history, and bibliography. Intelligence 24, 13–23.
149–160.
Gottfredson, L., 2004. Intelligence: is the epidemiologists’ elusive fundamental cause
Auld, M.C., Sidhu, N., 2005. Schooling, cognitive ability and health. Health Economics
of social class inequalities in health? Journal of Personality and Social Psychology
14 (10), 1019–1034.
86 (1), 174–199.
Bago d’Uva, T., O’Donnell, O., van Doorslaer, E., 2008. Differential health reporting by
Hartog, J., 1989. Survey non-response in relation to ability and family background:
education level and its impact on the measurement of health inequalities among
structure and effects on estimated earnings functions. Applied Economics 21,
older Europeans. International Journal of Epidemiology 37 (6), 1375–1383.
387–395.
Batty, G.D., Deary, I.J., Gottfredson, L.S., 2007. Premorbid (early life) IQ and later
Hartog, J., Pfann, G., 1985. Vervolgonderzoek Noord-Brabantse zesdeklassers 1983,
mortality risk: systematic review. Annals of Epidemiology 17 (4), 278–288.
Verantwoording van hernieuwde gegevensverzameling onder Noordbrabantse
Batty, G.D., Wennerstad, K.M., Smith, G.D., Gunnell, D., Deary, I., Tynelius, P., Ras-
zesdeklassers van 1952. University of Amsterdam, Amsterdam.
mussen, F., 2009. IQ in early adulthood and mortality by middle age: cohort
Hartog, J., Oosterbeek, H., 1998. Health, wealth and happiness: why pur-
study of 1 million Swedish men. Epidemiology 20 (1), 100–109.
sue a higher education? Economics of Education Review 17 (3),
Behrman, J.R., Rosenzweig, M.R., 2004. Returns to birthweight. Review of Economics
245–256.
and Statistics 86 (2), 586–601.
Hartog, J., Jonker, N., Pfann, G., 2002. Documentatie Brabant data. Netherlands Insti-
Borghans, L., Golsteyn, B.H.H., Heckman, J.J., Humphries, J.E., 2011. IQ, Achievement,
tute for Scientific Information Services, Amsterdam.
and Personality. University of Maastricht (Unpublished manuscript).
Heckman, J.J., Humphries, J.E., Veramendi, G., Urzua, S., 2014. Education Health and
Braakmann, N., 2011. The causal relationship between education, health and health
Wages. NBER Working Paper No. 19971.
related behaviour: evidence from a natural experiment in England. Journal of
Heckman, J.J., Singer, B., 1984. A method for minimizing the impact of distribu-
Health Economics 30 (4), 753–763.
tional assumptions in econometric models for duration data. Econometrica 52,
Brinch, C.N., Galloway, T.A., 2012. Schooling in adolescence raises IQ scores. Pro-
271–320.
ceedings of the National Academy of Sciences of the United States of America
Kaestner, R., Callison, K., 2011. Adolescent cognitive and non-cognitive correlates of
109 (2), 425–430.
health. Journal of Human Capital 5 (1), 29–69.
Butler, J.S., Moffitt, R., 1982. A computationally efficient quadrature procedure for
Lillard, L.A., 1993. Simultaneous equations for hazards: marriage duration and fer-
the one-factor multinomial probit model. Econometrica 50 (3), 761–764.
tility timing. Journal of Econometrics 56, 189–217.
Carneiro, P., Crawford, C., Goodman, A., 2007. The impact of early cognitive and
Lleras-Muney, A., 2005. The relationship between education and adult mortality in
non-cognitive skills on later outcomes. In: CEE DP 92.
the United States. Review of Economic Studies 72, 189–221.
Carpenter, P.A., Just, M.A., Shell, P., 1990. What one intelligence test measures: a
Mathijssen, M.A.J.M., Sonnemans, G.J.M., 1958. Schoolkeuze en schoolsucces bij
theoretical account of processing in the Raven progressive matrices test. Psy-
VHMO en ULO in Noord-Brabant. Zwijssen, Tilburg.
chological Review 97 (3), 404–431.
Mazumder, B., 2008. Does education improve health: a reexamination of the evi-
Case, A., Fertig, A., Paxson, C., 2005. The lasting impact of childhood health and
dence from compulsory schooling laws. Economic Perspectives 33 (2).
circumstance. Journal of Health Economics 24 (2), 365–389.
Mazumder, B., 2012. The effects of education on health and mortality. Nordic Eco-
Centraal Bureau voor de Statistiek, CBS, 2008. Hoogopgeleiden leven lang en gezond.
nomic Policy Review 2012, 261–301.
In: Gezondheid en zorg in cijfers. CBS.
Meghir, C., Palme, M., Simeonova, E., 2013. Education, Cognition and Health: Evi-
Clark, D., Royer, H., 2013. The Effect of Education on Adult Mortality and Health:
dence from a Social Experiment. NBER Working Paper 19002.
Evidence from Britain. American Economic Review 103 (6), 2087–2120.
Murasko, J.E., 2007. A lifecourse study on education and health: the relationship
Cockx, B., Picchio, M., 2012. Are short-lived jobs stepping stones to long-lasting jobs?
between childhood psychosocial resources and outcomes in adolescence and
Oxford Bulletin of Economics and Statistics 74 (5), 646–675.
young adulthood. Social Science Research 36 (4), 1348–1370.
Conti, G., Heckman, J.J., 2010. Understanding the early origins of the education-
Oreopoulos, P., 2006. Estimating average and local average treatment effects of edu-
health gradient: a framework that can also be applied to analyze
cation when compulsory school laws really matter. American Economic Review
gene-environment interactions. Perspectives on Psychological Science 5 (5),
96 (1), 152–175.
585–605.
Press, W.H., Flannery, B.P., Teukolsky, S.A., Vetterling, W.T., 1993. Numerical Recipes
Conti, G., Heckman, J.J., Urzua, S., 2010. The education-health gradient. American
in C: The Art of Scientific Computing, 2nd ed. Cambridge UP, Cambridge.
Economic Review Papers and Proceedings 100, 234–238.
Raven, J.C., 1958. Mill Hill Vocabulav Scale, 2nd ed. H.K. Lewis, London.
Conti, G., Heckman, J.J., Urzua, S., 2011. Early Endowments, Education, and Health.
Ross, C.E., Wu, C.-L., 1995. The links between education and health. American Soci-
University of Chicago, Department of Economics (Unpublished manuscript).
ological Review 60 (5), 719–745.
Cramer, J.S., 2012. Childhood Intelligence and Adult Mortality, and the Role of Socio-
Savelyev, P.A., 2012. Conscientiousness, Education, and Longevity of High-Ability
Economic Status. Tinbergen Institute Discussion Paper 2012-070/4.
Individuals. Vanderbilt University, Department of Economics (Unpublished
Cutler, D., Lleras-Muney, A., 2008. Education and health: evaluating theories and
manuscript).
evidence. In: House, J.S., Schoeni, R.F., Kaplan, G.A., Harold, P. (Eds.), Making
Spearman, C., 1927. The Abilities of Man: Their Nature and Measurement. Macmillan,
Americans Healthier: Social and Economic Policy as Health Policy. Russell Sage
New York.
Foundation, New York.
Van Kippersluis, H., O’Donnell, O., van Doorslaer, E., 2011. Long run returns to edu-
Deary, I., 2008. Why do intelligent people live longer? Nature 456, 175–176.
cation: does schooling lead to an extended old age? Journal of Human Resources
Deary, I., Johnson, W., 2010. Intelligence and education: causal perceptions drive
46 (4), 695–721.
analytic processes and therefore conclusions. International Journal of Epidemi-
Van Praag, M., 1992. Zomaar een dataset: ‘Noordbrabantse zesde klassers’, Een pre-
ology 39, 1362–1369.
sentatie van 15 jaar onderzoek. University of Amsterdam, Amsterdam.
Dronkers, J., 2002. Bestaat er een samenhang tussen echtscheiding en intelligentie?
Van Schellen, M., Nieuwbeerta, P., 2007. De invloed van de militaire dienst-
Mens & Maatschappij 77 (1), 25–42.
plicht op de ontwikkeling van crimineel gedrag. Mens & Maatschappij 82 (1),
Elbers, C., Lanjouw, J.O., Lanjouw, P., 2003. Micro-level estimation of poverty and
5–27.
inequality. Econometrica 71 (1), 355–364.

The intensive margin of technology adoption – Experimental

evidence on improved cooking stoves in rural Senegal夽
Gunther Bensch a , Jörg Peters a,b,∗
a
Rheinisch-Westfälisches Institut für Wirtschaftsforschung (RWI), Essen, Germany
b
AMERU, University of the Witwatersrand, Johannesburg, South Africa
Article history: Today, almost 3 billion people in developing countries rely on biomass as primary cooking fuel, with
Received 30 July 2014 profound negative implications for their well-being. Improved biomass cooking stoves are alleged to
Received in revised form 12 March 2015 counteract these adverse effects. This paper evaluates take-up and impacts of low-cost improved stoves
through a randomized controlled trial. The randomized stove is primarily designed to curb firewood
consumption, but not smoke emissions. Nonetheless, we find considerable effects not only on firewood
consumption, but also on smoke exposure and, consequently, smoke-related disease symptoms. The
reduced smoke exposure results from behavioural changes in terms of increased outside cooking and a
C93
I12
reduction in cooking time. We conclude that in order to assess the effectiveness of a technology-oriented
O12 intervention, it is critical to not only account for the incidence of technology adoption – the extensive
O13 margin – but also for the way the new technology is used – the intensive margin.
Q53 © 2015 Elsevier B.V. All rights reserved.
Keywords:
Household air pollution
Energy access
Technology adoption
Development economics
Biomass fuel
1. Introduction mostly firewood. The collection of and cooking with firewood is

associated with various negative effects on the living conditions of
In developing countries, almost 3 billion people rely on tradi- the poor. According to the World Health Organization (WHO), the
tional biomass-based fuels for their daily cooking purposes. In rural emitted smoke is the leading environmental cause of death and
sub-Saharan Africa, virtually all households cook with biomass, is responsible for 4.3 million premature deaths every year – more
deaths than are caused by malaria or tuberculosis (WHO, 2014;
Martin et al., 2011). Medical research throughout the last decades
found links between air pollution induced by open fires and vari-
夽 We thank Mark Andor, Manuel Frondel, Rachel Griffith, Michael Grimm,
ous illnesses including pneumonia, chronic obstructive pulmonary
Subhrendu Pattanayak, Fiona Ross, Christoph M. Schmidt, Maximiliane Sievert,
disease (COPD), and eye infections, but also stunted growth of chil-
and in particular Colin Vance for helpful comments. Participants of the Nordic
Conference in Development Economics, Gothenburg/Sweden in June 2012, the Cen- dren, tuberculosis, and cardiovascular diseases (see Armstrong and
tre for the Studies of African Economies conference in Oxford/United Kingdom in Campbell, 1991; Campbell et al., 1989; Dherani et al., 2008; Kan
March 2013 and research seminars at University of Göttingen/Germany and Witwa- et al., 2011; McCracken et al., 2011; Pandey, 1984a,b; Pandey et al.,
tersrand University Johannesburg/South Africa provided valuable input. Financial
1989). Furthermore, biomass usage for cooking is a major source of
support from the German Federal Ministry for Economic Cooperation and Develop-
ment (BMZ) through the Independent Evaluation Unit of Deutsche Gesellschaft für
climate-relevant emissions (Shindell et al., 2012).
Internationale Zusammenarbeit (GIZ) is gratefully acknowledged. Peters gratefully Improved biomass cooking stoves (ICSs) are often believed to
acknowledges the support of a special grant (Sondertatbestand) from the German be a game changer for cooking in developing countries. It is in this
Federal Ministry for Economic Affairs and Energy and the Ministry of Innovation, context that the United Nations set out the Sustainable Energy for
Science, and Research of the State of North Rhine-Westphalia.
∗ Corresponding author at: RWI, Hohenzollernstrasse 1–3, 45128 Essen, Germany. All initiative with the ambitious goal of globally universal adoption
Tel.: +49 0201 8149 247; fax: +49 0201 8149 200. of clean cooking stoves and electricity by 2030. There is, however, a
E-mail address: peters@rwi-essen.de (J. Peters). wide range of ICSs with different levels of sophistication that have
G. Bensch, J. Peters / Journal of Health Economics 42 (2015) 44–63 45
strong implications for smoke emissions and thus cleanliness. It confirm savings rates determined in lab tests. In addition, we find a
is hence still a matter of ongoing debate under which conditions decrease in early indicators for respiratory diseases and eye infec-
ICSs can be considered as clean, also compared to modern fuels like tions. These effects on people’s health status cannot be explained
electricity and gas.1 only by the take-up of the new ICS and the firewood savings, but
This paper presents findings from a Randomized Controlled Trial rather by an additional reduction in smoke exposure due to more
(RCT) among 253 households in twelve villages in Senegal to ana- outside cooking and a reduced cooking time that is enabled by the
lyze behavioural responses and impacts following the introduction new stove.
of an ICS. The ICS, which was assigned free of charge, is a low- Our findings add to the existing body of evidence on ICS impacts,
cost and maintenance-free portable clay-metal stove. It is produced which so far is mainly represented by two RCTs: the RESPIRE study
in a fairly standardized way by local manufacturers (potters and in Guatemala (see, for example, Smith-Sivertsen et al., 2004, 2009;
whitesmiths) in their workshops and is marketed at a retail price Díaz et al., 2007; Smith et al., 2011) and a study conducted by J-Pal
of around 10 US$. The stove has an expected life span of one to in India (Hanna et al., 2012).3 Both studies used stationary chim-
three years before it deteriorates and has to be replaced. It has ney ICSs that are installed in the user’s kitchen, with the difference
already been widely used in large governmental dissemination pro- that the RESPIRE stoves are of higher quality, thus more expen-
grammes in urban and rural Africa. As such, this is the first study sive (100–150 US$), and require less maintenance than those used
to assess a type of ICS whose design is geared towards fuel savings, in the Hanna et al. (2012) study. A more detailed comparison of
ease of use, affordability and, hence, large-scale applicability, but technical features of the ICSs used in the different studies is pro-
one that lacks specific health-conducive technical features such as vided in Appendix A. While the RESPIRE study detects a substantial
a cleaner burning process or a chimney. Without further changes in reduction in household air pollution and a reduction in the risk
cooking behaviour, the reduction in particulate matter emissions of respiratory disease symptoms and eye problems, Hanna et al.
that the randomized ICS can technically achieve would probably be observe reductions in smoke inhalation only in the first year but
insufficient to affect the health of users. This is due to the non-linear not over a four year time horizon. This is mainly driven by mainte-
particulate exposure–response relation found in medical research nance being more and more neglected over time, which leads to a
suggests that large reductions in smoke exposure are required in weak performance and low usage rates after some years.
order to ensure positive health effects (see, for example, Ezzati and Against this background, our paper is the first to add evidence on
Kammen, 2001; Pope et al., 2011; Burnett et al., 2014). how people use an adapted and simple ICS in an unsupervised setup
The main impact indicators of this study are firewood consump- that is deemed to represent a more realistic study environment
tion, time use, respiratory disease symptoms and eye infections. than the highly controlled medical trials conducted for RESPIRE.
They are supplemented by various indicators along the results Our study contributes to the literature by providing compelling
chain of the intervention with regard to cooking behaviour. Effects evidence that such a simpler and cheaper ICS can actually also
on these indicators were assessed 12 months after randomiza- trigger substantial impacts – if cooking behaviour also changes.
tion following a baseline study in November 2009. The behavioural Conceptually, these results confirm the findings of Hanna et al.:
changes we look at – firewood usage patterns and smoke exposure Looking at the technical features of an ICS is not enough, since the
– can be expected to materialize already in the first few months real-world behaviour of users strongly co-determines the results.
after ICS adoption. The changes in these indicators we observe after Unlike Hanna et al., though, we find that behavioural adaptations
one year of ICS ownership therefore reflect impacts to be expected to a simple ICS may trigger sizable positive health effects.
in the long run – as long as people continue to use the ICS and These differences in findings of the two studies show the poten-
replace it by a new one once it is not functional anymore. The tials of disseminating ICS that are adapted to the target population
third wave of interviews in March 2013 is used to track the longer- and that facilitate cleaner cooking. The stove used in the Hanna et al.
term usage behaviour and the stove’s durability at the end of what study requires regular maintenance, for which people in turn need
technically is the life span of the ICS. to be trained (which not all of them were), while the stove random-
A couple of factors contribute to a high external validity of this ized for our study is maintenance-free. Furthermore, our portable
RCT for the African context: the study was implemented in an unob- stove is well adapted to the local cooking habits, whereas the stove
trusive way in order to ensure that we observe real-world cooking distributed in Hanna et al. interferes more with local cooking habits
behaviour. It was designed and conducted in cooperation with the by requiring people to cook inside, which they are not accustomed
ICS dissemination programme of the Government of Senegal, so to. In this sense, the stove in our study increases the number of
that an upscaling of the intervention under real-world conditions choice variables for the users, while the one used in Hanna et al.
would be possible. Furthermore, the dominating cooking fuel in decreases it.
our study area is firewood, which is also the case in most other In this broader behavioural context, our study adds to a
African countries (Bonjour et al., 2013). Firewood scarcity in our nascent strand in the health economics literature studying adop-
study region and, consequently, the incentive to use more effi- tion behaviour of households for health relevant technologies and
cient stoves is pronounced and comparable to other dry areas in goods such as bednets (Cohen and Dupas, 2010; Tarozzi et al.,
non-equatorial Africa.2 2014), point-of-use drinking water disinfectants (Luby et al., 2008;
We find that the ICSs are taken up by virtually all households Kremer et al., 2009), deworming drugs (Kremer and Miguel, 2007),
and intensively used, even after three and a half years. For the most condoms (e.g. Kamali et al., 2003), or a range of such technolo-
part, people only give up using the stove when it is not functional gies (Wendland et al., 2015). More specifically, it demonstrates
anymore and not because they lose interest in using it. We further-
more observe substantial effects on firewood consumption, which
3
In addition to these two studies, further evidence with mixed results exists for
China (Mueller et al., 2013; Yu, 2011), Mexico (Masera et al., 2007) and urban Senegal
1
See World Bank (2011) for a more detailed discussion of different types of (Bensch and Peters, 2013). Burwen and Levine (2012) conducted an RCT in Ghana
improved cooking stoves and Martin et al. (2011) for a recent overview on the using a very simple mud stove. As a major difference to the present study as well as
improved stoves and air pollution policy debate. the RESPIRE and the J-Pal study, tests in a controlled field lab setting already find that
2
External validity and potential challenges to it are discussed further in Section the stove does not perform better than the traditional ones. The poor performance
3.5 and Appendix D. is also reflected in low usage rates after a few months.
46 G. Bensch, J. Peters / Journal of Health Economics 42 (2015) 44–63
that the analysis of technology adoption and related promotion which is equivalent to 1–5 US$ (see Appendix B for pictures of
programmes should encompass both a technical and an economic the ICS and other stove types used in the study region). The GIZ
perspective, not only an assessment of the mechanical perfor- programme intends to expand its activities to rural areas and
mance. This is in line with the concept of intensive and extensive expects the price of the Jambaar for the rural market to be around
margins of behaviour that has recently been brought into the 4000–5000 CFA Francs (8–11 US$), which is well below the prices
debate on public health interventions (see Dupas, 2011): It is not of the more sophisticated ICS technologies widely disseminated in
only the mere technology adoption that counts (extensive margin). Latin America or Asia.
Rather, the full effect can only be determined if the way the new Cooking fuels are an issue of major importance in the daily
technology is used is accounted for as well, the intensive margin. life of Senegalese households. Households have the custom to
The remainder of the paper is organized as follows: Section cook inside, which leads to a higher exposure to smoke emissions
2 reviews the country and intervention background and outlines than outside cooking. WHO (2009) holds household air pollution
the research design including the identification strategy. Section 3 induced by solid fuel usage for cooking accountable for 6300 pre-
presents the study results for all our impact indicators, and Section mature deaths every year in Senegal alone. Apart from agricultural
4 concludes. land clearance, wood usage for cooking purposes is moreover the
most important driving force of ongoing deforestation in the mostly
2. Programme background and methodological approach arid and Sahelian country (see WEC/FAO, 1999; Tappan et al., 2004;
FAO, 2005a,b). A constant population growth of 2.6% per year puts
2.1. Improved stove dissemination and cooking fuels in Senegal further pressure on fuelwood resources. As a consequence, house-
holds face an increasing scarcity of fuelwoods: firewood collection
Despite its seeming superiority to traditional biomass cooking, is becoming increasingly time-consuming, while fuelwood prices
the ICS technology has not made significant inroads into African are rising. This circumstance applies particularly to the Bassin
households. There may be various reasons for this, which are Arachidier, the study area of this evaluation, situated some 200 km
comprehensively discussed in Rehfuess et al. (2014) and Lewis southeast of Dakar.
and Pattanayak (2012). One explanation relevant for the rural
setting is that firewood can typically be collected for free so that 2.2. Impact indicators
most of the benefits of ICS usage are not monetary ones. This
makes it more difficult for households to finance the investment The first impact indicator of our study is the household consump-
given liquidity and credit constraints. On the supply side, the tion of firewood. This indicator aggregates each dish cooked in a
stove design may fail to meet user needs in preparing local dishes typical week, with a dish being one component of a meal that is pre-
with available fuels and cooking utensils. Earlier programmes in pared on a separate stove, for example rice and sauce. We thereby
various African countries relied on subsidies for ICS production account for the fact that several stoves may be used simultaneously
or distributed them for free. Most of these programmes did not for the preparation of a single meal. The rationale for this indicator
succeed, however, in triggering sustainable ICS usage. Based on is that a reduction in firewood consumption not only has immedi-
such experience, development practitioners frequently argue that ate implications for wood scarcity and deforestation pressures, but
people do not appreciate and use ICS that they receive as a gift is also a strong intermediate indicator for other ultimately relevant
and, consequently, reject the option of distributing ICSs for free impacts such as health and time use.
(Barnes et al., 1994; Martin et al., 2011). Impacts on health and time use are examined directly. We
This is also the spirit underlying the ICS dissemination pro- investigate the indicator time spent by household members on fire-
gramme Foyer Amélioré au Sénégal (FASEN), which is implemented wood collection and cooking and the prevalence of diseases that are
by the Senegalese Ministry of Energy in cooperation with Deutsche potentially related to firewood usage. For this purpose, we look
Gesellschaft für Internationale Zusammenarbeit (GIZ).4 In contrast at symptoms that are likely to be affected in the short-term after
to earlier ICS interventions, FASEN focuses on establishing a sus- smoke emissions are reduced; these are captured by the indicators
tainable and autonomous market for ICSs by testing performance, household member with symptoms of respiratory diseases and house-
training producers and distributors, and supporting communica- hold member with eye problems. We examine this indicator both on
tion and promotional campaigns. Similar to other countries, FASEN the household level and the household member level. For respira-
so far concentrated its ICS dissemination on charcoal ICSs in urban tory diseases, these symptoms are cough, asthma, or difficulty in
areas. breathing. They indicate acute respiratory infections and chronic
The main ICS type disseminated by FASEN since 2006, the Jam- obstructive pulmonary diseases, which are the leading causes of
baar, is also used in the present RCT. It is a portable single-pot stove mortality and diseases induced by exposure to air pollution from
with a fired clay combustion centre enclosed by a metal casing. solid fuels (Ezzati and Kammen, 2002). Exposure to particles could
Owing to basic design improvements of the Jambaar compared be detected as a causal agent of these and other serious respiratory
to traditional stoves, the woodfuel burns more efficiently and the diseases such as lung cancer or pneumonia (see Duflo et al., 2008b;
heat is better conserved and focused towards the cooking pot. Pattanayak and Pfaff, 2009).
Both charcoal and firewood models exist. We chose the firewood Respiratory diseases and eye problems are elicited on a self-
Jambaar for our experiment as firewood is the dominant fuel reporting basis: respondents are asked to give information on those
in rural Senegal with 89% of rural households using it as their household members who exhibited the symptoms of interest in
primary cooking fuel (ANSD, 2006). In rural areas ICSs have not the six months preceding the interviews. While such self-reported
been available so far. Stove types used here are either three-stone health indicators are sometimes viewed with concern because of
stoves available at zero cost or traditional metal stoves and open potential measurement errors, the literature supports their appli-
fire grills that can be bought for between 500 and 2500 CFA Francs, cation by highlighting the correlation with actual illnesses (see
Idler and Benyamini, 1997; Miilunpalo et al., 1997; Peabody et al.,
2006; Butrick et al., 2010). In particular, if specific symptoms are
4
GIZ provides technical assistance on behalf of the German Federal Ministry for
asked about precisely as was done in this study, respondents can
Development and Economic Cooperation (BMZ) and is one of the largest bilateral be expected to report accurately. A deterioration in recall accuracy
development agencies in the world. of reported morbidity as found in Das et al. (2012) and Kjellsson
et al. (2014) is a concern in this study but would only reduce the controlling for baseline household characteristics such as educa-
precision of our health estimates and not induce any bias. tion and income using Ordinary Least Squares (OLS) regression.
To record firewood consumption and cooking time, the person In order to shed more light on how reductions in firewood con-
responsible for cooking is asked to specify the number of people sumption are induced by ICS usage, we also do an OLS regression on
cooked for and the types of stoves used for every meal throughout the individual dish level, additionally controlling for a set of poten-
a typical day. For each stove application, we then record the tial dish- and meal-specific confounders such as the number of
cooking duration and the cooking fuel type. In case of firewood, people cooked for. This dish-level regression has to be interpreted
the cooking person is additionally asked to pile up the amount with some care, since – in spite of the random ICS assignment – the
of firewood used for the respective stove application, which is households that received a new stove can still choose whether to
then weighed with scales. In combination with information on the use the ICS or a traditional stove for the respective meal. This choice
frequency with which the respective stoves are used throughout a might then be driven by unobservable factors, which would distort
typical week, this data serves to determine the weekly household the savings estimates if the unobservables are also correlated with
consumption of firewood. Enumerators crosschecked stove usage as firewood consumption.
part of the interviews by verifying which stove was currently in use Finally, we employ probit regressions on the health status of
or had been used recently. The indicator time spent by household households and of individual household members. In principle,
members on cooking aggregates the self-reported cooking duration these estimations might as well suffer from some endogeneity
for all meals of a typical day, whereas the time spent by household induced by intra-household bargaining processes: healthier and
members on firewood collection aggregates the spells in which more powerful women might bargain themselves out of cooking
household members are occupied with gathering firewood in the with the dirtier stove and into cooking with the cleaner ICS (see
course of a week. Pitt et al., 2006). This potentially leads to a spurious correlation
Technically achievable savings rates for the Jambaar (referred between ICS ownership and improvements in the health status. In
to as ICS in the following) have already been determined in con- our context, though, this is very unlikely, since the assignment to
trolled cooking tests (CCTs), where a cooking person prepares the the cooking duty does not seem to be a result of short-term nego-
same meal on both a traditional stove and an improved stove in tiations, but it is rather determined by cultural norms with one
order to compare the woodfuel consumption of both stove types. or two women per household being continuously responsible for
However, the effective savings in real-life households might devi- cooking. Even if post-randomization selection processes occurred,
ate from such laboratory field tests for various reasons summarized they would be uncovered by the health indicators we use, because
by Bensch and Peters (2013).5 The deficiencies of CCTs can be over- we observe both the people responsible for cooking and those who
come by evaluating the woodfuel consumption based on a survey are not.
among a larger sample of households in which the diversity and
dynamics of real-life cooking practises are captured. This is what is
2.4. RCT design and implementation
done in the present paper.
The study design followed the guidelines on the implementa-
tion of RCTs provided in Duflo et al. (2008a). The first decision that
2.3. Identification strategy
had to be taken was the level on which to randomize the treatment
– the village or the individual household. In the present case, it is
We employ two approaches to estimate the impact of ICS usage
sensible to randomize on the household level, since the decision
in this experimental setup. The intention-to-treat effect (ITT) is
about whether to adopt an ICS is taken in the household and not
obtained by simply comparing mean values of impact indicators
on a regional level. Furthermore, our impact indicators are mea-
for the treatment and control group, without accounting for non-
sured on household level (or below). One reason to randomize on
compliance from households that were assigned to the treatment
the village level instead of the household level would be to account
group but for some reason do not use the ICS. In our case, the ITT
for spillover effects. These are expected to be negligible, since the
serves to estimate the effect of providing the ICS for free to house-
ICSs are only used by the households themselves and the penetra-
holds who do not yet own one. The average treatment effect on
tion rate per village envisaged in this RCT is too low to affect, for
the treated (ATT), by contrast, accounts for non-usage in the treat-
example, local firewood supply.
ment group and potential take-up in the control group and thereby
The next decision regards the sample size, both in terms of
serves to estimate the impact of effective ICS usage. For this pur-
households and villages. We determined the sample size based on
pose, instrumental variable (IV) estimations are applied with the
a power calculation focusing on the indicator firewood savings. We
random assignment into the treatment group as an instrument for
approximated the relevant parameters ex-ante using the data col-
the effective usage of the ICS. In our case, ITT and ATT are very simi-
lected for the quasi-experimental study presented in Bensch and
lar given the high compliance rate in the treatment group and given
Peters (2013). Taking into account these parameters and the prob-
that only one household in the control group acquired an ICS from
ability of being assigned to the treatment group, we obtained a
another source. Although RCTs allow for a simple comparison of the
required sample size of 250 households spread across 12 villages
impact indicators at the time of the follow-up, the precision of the
(see Appendix C). We selected villages that are far away from GIZ-
estimates can be increased by controlling for other household char-
supported ICS producers in order to avoid treatment contamination
acteristics that have been collected in a baseline survey. We there-
that might occur if households randomly assigned to the control
fore implement both the ITT and ATT approach with and without
group obtain an ICS independently.6 Furthermore, we selected the
5 6
For example, the tests frequently concentrate on the main meal only and they Two further channels exist through which the treatment may be contaminated.
cannot account for the fact that households might prepare more hot meals because First, treatment households may share their stoves with control households. This
cooking becomes cheaper due to the higher efficiency of the ICS (or less exhausting in did not occur. Second, the two household groups may exchange about determinants
terms of firewood collection) – a phenomenon, which is referred to as the rebound of respiratory health, for example. Yet, the treatment did not involve any awareness
effect in the energy economics literature (see Frondel et al., 2008; Herring et al., raising and cooking is also a rather private issue, as stated in open interviews, that
2009). seems less of a talking point in women’s conversations. As a consequence, only
Baseline survey Stove allocat

a ion Intermeediary Follow-up ICS us
sage
and lottery and user instructions
i visiits Survey tracking sur
s vey
Nove
ember 2009 few days 1, 2 and 7 months November March 2013
2
after th
he lottery after allocation
o 2010
Fig. 1. Steps in RCT implementation.
12 villages from the target region of a planned GIZ rural elec- as a fuel-saving device, which requires a few precautions. House-
trification intervention so that we could introduce the study as holds were, for example, informed that, in contrast to open fires
preparatory field work related to the electrification project and, for which people typically use large branches or even trunks, the
thereby, reduce attention paid to the randomization. firewood has to be chopped first in order to fit the relatively small
In November 2009, we conducted the baseline survey among fuel feed entrance of the ICS. In line with what real-world users are
253 randomly sampled households (see Fig. 1 for the timeline of told about this type of ICS, households were briefly informed about
the RCT). Information was gathered using a structured question- the convenience co-benefits of fuel savings, which are a quicker
naire covering the socio-economic dimensions that characterize cooking process, less smoke and a cleaner kitchen (if cooking is
the relevant living conditions of the households. Since the study done indoors). No information about potential repercussions on
also served as a baseline in the context of the envisaged electrifi- the health status was provided. The complete instructions on the
cation intervention – a solar home system dissemination project functioning and proper usage of the ICS and related information
– a particular focus of the questionnaire was on energy sources provided are presented in Appendix E.
(including electricity) and energy services (including cooking). Between the baseline and the follow-up phase, local community
Consequently, the cooking-related parts of the interviews did workers conducted three preparatory visits in the survey villages
not draw particular attention. This is important to avoid auspices for the planned electrification project. It is worth highlighting that
biases and Hawthorne effects (see Appendix D). We complemen- the electrification intervention was not implemented in any of
tarily gathered qualitative information in focus group discussions the sampled villages before the end of this study. Furthermore,
and semi-structured interviews with key informants such as electricity is virtually never used for cooking in rural Africa, in par-
women’s groups, stove and charcoal producers, teachers, regional ticular not in the case of solar home systems whose capacity is
administrators, and village chiefs. not sufficient for cooking purposes. Once in the field, the commu-
The random assignment was put into practice through a lot- nity workers additionally checked if ICS households were using the
tery directly following the baseline interviews. We presented the ICS and whether they had encountered technical problems (which
prizes of this lottery, an ICS or a 5 kg bag of rice, as recompense were in any case very rare). Again, no further treatment in terms of
for participation in the baseline study. Participants were therefore awareness raising or usage encouragement was undertaken. While
not aware of being part of an experiment. The connotation of the a few of the households were not yet making frequent use of their
ICS receivers as the treatment group and the bag of rice receivers new stove one month after ICS allocation, by the time of the second
as a control group was not communicated to the participants.7 In visit virtually all ICS households cooked regularly on the ICS.9 For
order to increase trust in the fairness of the lottery, we conducted it the follow-up phase at the end of 2010, the same structured ques-
in each of the villages directly after completing the interviews and tionnaire was used as in the baseline phase. Attrition was very low:
informed the households immediately about which recompense only four households either could not be located or had moved out
they would get. Hence, we applied simple stratified randomization of the village, three in the control and one in the treatment group.
with the villages as the stratification criterion. Of the 253 house- None of the households refused to participate in the follow-up
holds interviewed for the baseline, 98 received an ICS and 155 a survey.
bag of rice. The rice and ICSs were distributed within three days of We excluded two groups of households from the analysis: four
the baseline interview. The households that were drawn to get an households with affiliated Quran schools, where usually between
ICS received a brief 15-min introduction on how to use the stove.8 50 and 150 students live and eat and which are therefore not com-
The ICS and rice bag distribution as well as the instruction were parable to family households, as well as households that prior to
done by field workers who were involved in the preparation of the the study had already received improved stoves other than the ICS
electrification project and who were visiting the village anyhow. used in the RCT from urban relatives. These six treatment and ten
No specific village gathering was organized. The ICS was presented control households cannot be expected to have bought another ICS
in a non-RCT world and therefore do not represent the population
of interest. They were originally included in the randomization
only because they were a priori not clearly discernible and since we
minor contamination effects are conceivable that, furthermore, would rather lead conducted the randomization on-site and directly after the survey.
to an underestimation of effects. No further restrictions were made on who to include in the sample
7
The average rice consumption per capita in Senegal is 84 kg per year (GAIN,
2011). Hence, the bag of rice received by the control group corresponds to 0.5% of
annual rice consumption for the average household size in our sample and will most
likely not affect any of our impact indicators that were measured one year after the
9
distribution. It is not likely that the delayed take-up was triggered by the visits or in any way
8
Because many other ICS types require more extensive maintenance and more related to them. Instead, the visits revealed that, first, a few housewives travelled
usage instructions, one might think of these instructions as a treatment in its own outside the village and therefore had not used the ICS so far. Second, some women
right, which might be introduced as a random second treatment arm. For our ICS, needed to adapt to the quicker cooking with the ICS, which at the beginning created a
this is however not the case. Given the simplicity in the use of the ICS and given that feeling of insecurity. Third, some households were reluctant at the beginning as they
it is virtually maintenance free, additionally randomizing the instruction within the wanted to preserve their ICS and used it only sparsely. Fourth, a few polygamous
treatment group in our case would not make a difference. households needed some time to decide on who would use the ICS and when.
Table 1
Baseline characteristics of randomly assigned ICS owners and non-owners.
Treatment Control p-value for test on difference in

Mean (sd) Mean (sd) means (2) − (1)
(1) (2) (3)
Socio-economic characteristics
Household size 12.88 (5.55) 12.94 (5.82) 0.94
Family structure (%) 0.96
Extended family 77.8 74.8
Nuclear family 15.6 18.0
Couple or monoparental family 6.6 7.2
Household head is of Wolof ethnicity (%) 52.2 50.4 0.92
Father with more than one wife (1 = yes) 33.7 30.2 0.58
Father’s education level (%) 0.88
None 12.5 9.8
Alphabetization 77.3 77.4
Primary 5.7 7.5
At least secondary 4.5 5.3
Main wife’s education level (%) 0.84
None 41.6 39.6
Alphabetization 51.7 53.9
Primary 6.7 5.8
At least secondary 0.0 0.7
Telecommunication expenditures (CFAF) 4250 (3830) 5090 (8640) 0.43
Ownership of bank account (1 = yes) 0.08 0.06 0.55
Household receives remittances (1 = yes) 0.42 0.45 0.65
Thatched roof (1 = yes) 0.67 0.67 0.97
Wall material of house is stone or brick (1 = yes) 0.49 0.51 0.75
Flooring material is soil (1 = yes) 0.36 0.27 0.16
Land is completely owned by household (1 = yes) 0.94 0.93 0.62
Ownership of sheep (1 = yes) 0.62 0.63 0.87
Number of mobile phones owned 1.86 (1.31) 2.04 (2.15) 0.48
Main wife is member of an association (1 = yes) 0.71 0.73 0.71
Father’s primary activity (%) 0.57

Subsistence farming 79.3 83.4
Services and manufacturing 16.1 14.3
Retirement 4.6 2.3
Cooking-related characteristics
Most utilized stove type (%) 0.33
Open fire (three-stones or Os† ) 72.2 72.7
Traditional metal wood stove 24.4 26.6
LPG stove 3.3 0.7
Stove usage in times per week 21.01 (4.10) 21.43 (4.98) 0.50
Firewood consumption per dish (kg)
Three-stones 4.85 (2.14) 5.14 (2.57) 0.37
Os† 4.84 (2.32) 4.84 (2.84) 1.00
Per capita and dish firewood consumption of three-stones and Os, 0.50 (0.32) 0.47 (0.26) 0.52
main dishes lunch and dinner (kg)
Firewood provision (%) 0.95
Only collected 76.1 76.3
Only bought 1.2 0.7
Both collected and bought 22.7 23.0
Number of observations 90 139
Note: sd – standard deviation; p-values are determined by means of t- and chi-square tests
†
The Os is a stove in which an open fire burns between three metal feet.
or the analysis. Altogether, the sample used for the subsequent 3. Results
impact analysis in Sections 3.2–3.4 comprised 229 households.
As a robustness check shows, not discarding these two groups 3.1. Socio-economic conditions and cooking behaviour
of households and, hence, performing the analysis with all 249
households for which baseline and follow-up data is available does The primary purpose of this section is to scrutinize the balancing
not change any of our findings, neither when applying ITT nor ATT. of the two randomized groups, since we abstained from explic-
In March 2013, approximately three and a half years after the itly balancing them through re-randomization before assigning the
randomization, an ICS usage tracking survey among the households ICSs. The second purpose is to illustrate the socio-economic envi-
that had received an ICS was conducted by enumerators famil- ronment in which the RCT was implemented. Table 1 documents
iar with the ICS. All but one of the 90 ICS households included the baseline socio-economic and cooking-related characteristics of
in the impact analysis could be retrieved for this interview wave. the 229 households before stove distribution. On average, house-
In addition to asking the households simple usage questions, the holds consist of 13 members; household size varies in a range
enumerators recorded their own assessment on the condition of between 2 and 42 persons per household. Larger households are
the ICS. The results of this usage tracking survey are presented in more common: 78% are extended families and 16% nuclear families
Section 3.5. (two parents plus children). Four in five households are subsistence
Table 2
Utilization rates of different stove types at follow-up.
Treatment Control
Open fire 19.5% 70.5%

Traditional metal wood stove 7.7% 24.1%
ICS 69.1% 0.7%†
LPG stove 3.7% 4.7%
Average total number of stove applications per 25.3 22.6
household and week
Note: The shares represent the ratio between the number of times the respective
stove type is used and the total number of stove applications per household and
week.
†
ICS usage among the control group is due to the fact that one household which
was not randomly assigned to receive an ICS acquired one individually after the
randomization.
per week using one of its stoves. As sometimes more than one
stove is used for one meal, the range of weekly stove applications
Fig. 2. Distribution of non-farm income at baseline. is between 14 and 49.
The follow-up data on stove usage shows no changes in the
control group: the most often used stove types are three-stone
stoves (53%), traditional metal stoves (25%) or Os (20%). Accord-
ingly, the savings potentials of ICS usage are relatively high with
73% of households mainly using open fire stoves in the absence of
an ICS. For the treatment group, the follow-up data shows that the
ICSs have achieved broad acceptance among users. There are only
two non-compliers: one ICS was completely broken in an accident
and one household did not use the new stove. Otherwise, as many
as 95% of the distributed ICSs are used at least seven times per week;
for 85% of treatment households the ICS became the predominantly
used stove. The proportion of individual dishes prepared with the
different stove types also mirrors this usage pattern (see Table 2).
As such, our set-up mimics the most likely scenario where treat-
ment households have one ICS at their disposal and continue to use
less efficient traditional stoves, because one stove is not sufficient
to prepare the required amount of food or because the ICS is too
Fig. 3. Distribution of farm income at baseline.
small for the pot sizes used in a few large households. The table also
shows that treatment households increased the number of dishes
prepared. This is probably not due to rebound effects (see footnote
farmers, the majority of them living in houses with thatched roofs. **), since the total number of hot meals cooked does not increase
As can be seen from the p-values in the right-hand column, two- in the treatment group and households reported that the quantity
sided tests of equality of the values for the two compared groups and type of food prepared has not changed since receiving the ICS.
do not reveal statistically significant differences. The groups are Instead, the increase simply reflects the fact that ICS households
balanced in the relevant observable characteristics. In addition, have an additional stove at their disposal such that the different
Figs. 2 and 3 show the distribution in non-agricultural and agricul- components of one meal that were formerly prepared on a single
tural income: the treatment and the control group strongly overlap. stove are now prepared on two stoves.
Accordingly, a two-sample Kolmogorov–Smirnov test cannot reject
the null of identical distributions at the 10 percent level.10 3.2. Firewood consumption
Regarding the baseline stove usage patterns reflected in the
table, two stove types dominate rural kitchens in Senegal: open ITT and ATT estimates for the household consumption of fire-
fires (three-stone stoves or Os in which the open fire burns between wood indicator are calculated both with and without the baseline
metal feet) and traditional metal stoves, the Malagasy and Cire. LPG household-level control variables taken from Table 1: in addition
stoves are rarely used in rural Senegal; in our sample only three to income, telecommunication expenditures are used as a proxy for
households mainly use LPG for cooking. 90% of dishes are prepared living standards. Bank account ownership is used as a proxy for the
with firewood. Around 15% of all meals are prepared with more household’s access to credits and ability to pay. Housing condi-
than one stove, primarily to prepare rice on one stove and a sauce tions as a wealth indicator are captured by whether the flooring
on a second one. On average, each household prepares 21 hot dishes material in the household is soil and whether the wall material is
stone or brick. As another wealth metric, we include a dummy indi-
cating sheep ownership. The results do not change if other wealth
10
We additionally ran a probit regression to check the correlation between ICS and socio-economic indicators shown in the table are included. As
allocation and the joint set of cooking-related as well as socio-economic character- suggested in Bruhn and McKenzie (2009), we additionally include
istics and village dummies. As part of this estimation, we performed a Likelihood village dummies in order to account for the stratified randomiza-
Ratio chi-square test with the null hypothesis that all of the regression coefficients tion. According to our findings presented in columns (1) and (2) of
are simultaneously equal to zero. The p-value of 0.98 validates the findings from the
univariate comparisons of no correlation. All tests have as well been carried out with
Table 3, firewood savings are substantial, with around 27 kg being
the original sample of 253 baseline households, for which statistically significant saved per week in every household after introduction of the ICS.
differences cannot be observed either. These are ITT results. ATT estimates differ only marginally being
Table 3
Effect of ICS usage on firewood consumption per week and per dish.
Estimator: Coefficient (Standard Error in parentheses)

Ordinary Least Squares, ITT
Dependent variable: Firewood consumption per week in kg Firewood weight per dish in kg
Variable (1) (2) (3) (4)
Dish variables†
Dish is cooked on open fire Ref. Ref.
Dish is cooked on ICS −1.99*** (0.24) −2.04*** (0.24)
Dish is cooked on traditional metal stove −0.03 (0.53) Ref.
Main dish 1.00*** (0.33)
Short cooking (<30 min) −0.94*** (0.19)
Meal variables†
Number of people the meal is cooked for (in terms of the logarithm 1.96*** (0.57)
of adult equivalents)
Lunch Ref.
Breakfast −1.66*** (0.18)
Dinner −0.32*** (0.10)
Multiple stoves −0.14 (0.33)
Household variables
Household with ICS −26.78*** (6.33) −26.96*** (6.10)
Average number of people cooked for (in terms of the logarithm of 42.79*** (14.36)
adult equivalents)
Father has formal education 2.63 (8.27) 0.01 (0.31)
Mother has formal education 5.95 (5.39) 0.23 (0.20)
Household income (in logarithmic terms) 1.25 (2.48) 0.03 (0.07)
Telecommunication expenditures (in logarithmic terms) 0.56 (0.97) 0.03 (0.03)
Bank account ownership 1.66 (16.85) 0.46 (1.18)
Flooring material is soil −17.63** (7.09) −0.60** (0.25)
Wall material of house is stone or brick 2.55 (9.83) 0.42 (0.36)
Ownership of sheep −7.31 (7.80) −0.32 (0.29)
Association membership of the mother −7.02 (7.93) −0.71** (0.29)
Village dummies Included Included Included Included

Constant −13.60 (42.09) 82.56*** (8.83) −0.77 (1.64) 3.90*** (0.36)
Mean of treatment group 60.80 (3.92) 60.69 (3.24) 2.28 (0.16) 2.25 (0.12)
Mean of control group 87.58 (4.68) 87.65 (5.03) 4.27 (0.17) 4.29 (0.21)
Savings rate (%) 30.6 30.8 46.7 47.5
Number of observations 228 228 627 633

Adjusted R-squared 0.25 0.13 0.43 0.18
F-test 4.83*** 3.71*** 16.18*** 9.44***
Note: Computations on household level (columns 1 and 2) are performed with heteroskedasticity corrected standard errors accounting for heterogeneity in treatment
responses; standard errors for the dish-level estimations (columns 3 and 4) are clustered by household.
†
For an explanation of the dish- and meal-level control variables, see Bensch and Peters (2013).
*
Significance level of 10%.
**
***
slightly higher. As these observations hold in the same way for the furthermore provide insights into how the savings materialize,
other impact indicators, we will only present the more conserva- since they make it possible to examine the influence of dish- and
tive ITT estimates in the following (ATT estimates can be taken meal-specific factors. Table 3 shows in columns (3) and (4) the
from Table F1 in Appendix F). Inserting in the regression the values results for the OLS regression that controls for household charac-
1 and 0 for the binary treatment variable and average values for the teristics and characteristics specific to the stove application. The
covariates gives us the absolute ICS consumption values shown at results reveal the differential effects of various dish- and meal-
the bottom of the table. This implies that 30% of the households’ specific variables whose coefficient signs are as expected and
total firewood consumption is saved. reflect consistent firewood consumption figures across dish types.
This is clearly less than the 40–50% found in CCTs. As noted The R2 of 0.43 for the estimation including control variables in col-
above, rebound effects as one potential driver for the difference umn 3 indicates that a good part of the variation in the dependent
to CCT results do not seem to play a role. Another likely reason is variable can be explained by observable factors. The statistically
the fact that treatment households do not switch completely to ICS highly significant ICS coefficient would imply an average ICS sav-
usage and still prepare parts of their meals on traditional stoves. ings rate of 47%. It is, thus, in the range of the CCT results.
In order to assess the savings potentials in case they would fully An unbiased alternative to come up with a firewood savings
switch to ICS usage, we additionally compare the firewood con- estimate for the case of adopting ICS for the entire range of
sumption for dishes prepared on an ICS in the treatment group to stove applications is to perform a slightly adapted version of
dishes prepared on traditional stove types in the control group. the IV estimation in the calculation of the ATT for total firewood
Even though the analysis of firewood savings on the dish level may consumption. We now instrument a new treatment variable, ICS
be endogenous, it provides an upper bound estimate of savings usage intensity, by the random assignment. Usage intensity is
potentials where households had access to several ICSs to poten- coded as a continuous variable obtained by dividing the number of
tially abandon traditional stoves completely. These estimations dishes prepared on an ICS by the total number of dishes prepared
Table 4
Effect of ICS usage on time expenditures.
Treatment Control Difference in means (2) − (1) Regression-adjusted difference in means
Mean (se) Mean (se) Mean (se) p-Value (H0 : Diff = 0) Mean (se) p-Value (H0 : Diff = 0)
(1) (2) (3) (4) (5) (6)
Duration of firewood collection per week (min) 719 ((75.1 ) 867 ((69.4 ) 148 (103.3 ) 0.15 136 (102.0 ) 0.19
Number of observations† 86 134
Cooking duration per day (min) 251 333 81 (21.8 ) 0.00*** 75 (20.7 ) 0.00***
Note: All values derived from ITT estimations with heteroskedasticity corrected standard errors (in parentheses) and including village dummies; se – standard error.
†
For the firewood collection indicator, the nine missing observations (5 control and 4 treatment) are due to households that were not able to specify the firewood collection
time spells.
***
in the respective household. It thus ranges from 0% to 100%. The ICS takes around one and a half hours. These savings far exceed the
resulting Wald estimator yields an average rate of 43.8–45.0% time that households additionally invest in cutting the firewood
(with and without controls). This unbiased IV estimate still suffers into smaller pieces, which takes not more than 15 min/day. Due
from the fact that treatment households increased the number of to a lack of local job and business opportunities, a shift of time
stove applications over which they spread the food preparation. towards income-generating activities cannot be observed. The
Nevertheless, we can conclude that if all meals in a household cooking women do not seem to sleep more either, since their time
were cooked on an ICS, the savings rate obtained in columns (1) awake differs by mere 5 min between the two compared groups.
and (2) of Table 3 could well be around 40%. Qualitative discussions rather suggest that the facilitation of the
The results on firewood consumption turn out to be robust cooking task helps them to execute household duties in a less
to outliers just like the results for the time and health indicators hurried way and to take more rest during the day.
assessed in the following two sections. The two robustness checks
we applied were, first, to estimate the median of the dependent
3.4. Health
variables by using quantile regression techniques and, second, to
exclude outliers defined as values more than two standard devia-
The negative effect of firewood usage on people’s health may
tions away from the mean. The results can be taken from Table F2
be alleviated by ICS usage via two channels. First, the reductions
in Appendix F.
in firewood consumption found in Section 3.2 can be expected to
reduce harmful smoke emissions, although it is – as discussed in
3.3. Time use the introduction – unclear whether simple ICSs like those used
in this RCT reduce smoke emission sufficiently to induce positive
As many as 96% of all households collect at least part of the fire- health effects. Second, exposure to the emitted smoke might be
wood they use for cooking. A reduction in firewood consumption is reduced, either via reductions in the cooking duration (as found
likely to lead to households spending less time on firewood collec- in Section 3.3) or if cooking behaviour changes because of the new
tion. In fact, the reduction in the aggregate time spent by household stove. In general, smoke exposure is very high in rural Senegal,
members on firewood collection is approximately two and a half with around two-thirds of the household members responsible
hours per week, which corresponds to 16–17% (Table 4). The reduc- for cooking staying next to the stove most of the time they are
tion, though, is statistically only borderline significant (p-values of cooking. Furthermore, the vast majority of households cook inside,
between 0.15 and 0.19 for ITT with and without controls), a finding predominantly in a separate kitchen. While in the control group the
that does not seem to be fully consistent with the reduction in total proportion of outside cookers stays stable, in the treatment group
firewood consumption of around 30% found in the previous section. it doubles from 11% to 23% between baseline and follow-up. The
Still, it is not surprising that time savings are less pronounced than main reason for this can be traced to the fact that the ICS better
savings in firewood. One reason for the lower savings is that ICS- shields the fuel from wind than three-stone stoves; also, from the
using households might just collect less wood during one excursion households’ perspective, wind and dust are indeed the main draw-
instead of reducing the number of excursions. The lack of statisti- back. In addition, the ICS requires less supervision, allowing the
cal significance of the difference might be due to inaccuracies in cook to dedicate more of her attention to other tasks away from
the time usage variable, which increases the standard error and, the smoke source.
thus, reduces power. The inaccuracies are induced, for example, by Virtually all persons responsible for cooking are women, on
the fact that 31% of households collect the wood on their own land average two per household with no difference between treatment
while farming, which makes it difficult to disentangle time spent on and control. We examine whether chronic symptoms of respiratory
the task of collection from time spent on ordinary field work. Also, diseases and eye infections prevail among the women respon-
some households use a variety of wood supply strategies depending sible for cooking and, as placebo outcomes, among the women
on different factors, most notably the season: some, for example, not responsible for cooking and male household members. We
do not collect the firewood every week but instead hold a stock that first look at two dummy variables: at least one household mem-
is typically replenished before the rainy season. ber with symptoms of respiratory diseases and at least one household
ICS households might moreover save time because cook- member with eye problems take the value one if at least one house-
ing is facilitated and quicker. In qualitative interviews women hold member of the respective group reports having suffered from
repeatedly pointed out that the ICS allows them to regulate the these symptoms at some point in the last six months before the
temperature more easily, which, in turn, makes it easier to do interview. The results are displayed in Table 5 and indicate the
other things while cooking. The cooking duration of all three meals share of households for which these variables take the value one.
throughout a typical day decreases significantly by more than The gender-differentiated data provides for striking indications of
75 min (Table 4), where preparation of an individual meal on an health effects: for women responsible for cooking, 9.0% of treated
Table 5
Effect of ICS usage on health status.
Treatment Control Difference in means (2)-(1) Regression-adjusted difference in means
Mean Mean Mean p-Value (H0 : Diff = 0) Mean p-Value (H0 : Diff = 0)
(1) (2) (3) (4) (5) (6)
Household level analysis

†
Respiratory system disease (%)
Any woman responsible for cooking 9.0 17.7 8.7 0.07*
Any male 7.9 6.5 −1.4 0.69
Any woman not responsible for cooking 3.5 6.2 2.7 0.38
†
Eye problems (%)
Any woman responsible for cooking 4.5 14.0 9.5 0.02**
Any male 4.5 7.3 2.8 0.40
Any woman not responsible for cooking 8.1 7.1 −1.0 0.78
Number of observations‡ 86–90 127–139
Individual level analysis

Cook in household shows symptoms of . . . (%)
A respiratory system disease 4.7 11.8 7.1 0.01** 6.9 0.01***
Eye problems 2.9 9.8 6.9 0.01** 5.7 0.00***
Non-cooking person in house hold shows symptoms of. . . (%)
A respiratory system disease 1.7 2.0 0.3 0.63 0.5 0.46
Eye problems 1.9 2.0 0.1 0.87 0.2 0.73
Note: Standard errors for the household level estimations are heteroskedasticity corrected, those for individual household member level estimations are clustered by
household, all estimations include village dummies in order to account for the stratified randomization.
†
ITT with inclusion of controls is not shown in this table, since for some control variables (bank account ownership, flooring material, village dummies) failure is perfectly
predicted in the estimated probit regressions.
‡
Differences in the number of observations are due to a few missing values and some households without any woman not responsible for cooking.
§
The values in this analysis are marginal means and marginal effects derived from estimations that can be found in regression form in Appendix F, Table F3. They are
conventionally calculated at the mean of the other independent variables taking into account the particularities of calculating margins for interaction terms in non-linear
models and conditioning on household members who are cooks.
*
**
***
households report at least one of them suffering from respiratory on individual level.12 The results confirm the findings of the
disease symptoms. The corresponding value for the control group household level estimations. In the group of household members
of 17.7% is almost twice as large – with this difference being statis- responsible for cooking the prevalence rates for both respiratory
tically significant. disease symptoms and eye infections go down by almost seven
If we look at the same proportion for male household members, percentage points. Significance levels are even more pronounced
who usually do not spend time around the cooking spot, treatment with p-values of 0.01 for both estimations with and without control
and control group households do not differ significantly from each variables respectively reflecting the more accurate definition of
other, nor do we find a difference for women not responsible for the indicator and the larger sample size. The estimations as well
cooking. The same pattern is observable for eye infections: 14.0% corroborate that the treatment has no effect at all on the group of
of households report that at least one woman responsible for cook- household members not responsible for cooking.
ing suffers from eye problems in the control group compared with Altogether, while the reduction in smoke due to fuel savings
4.5% in the treatment group. The difference is significantly different might be too modest to trigger perceivable health effects by itself, it
from zero. No such statistically significant difference is observed for is likely that the combination with the change in cooking behaviour
men and women not responsible for cooking. With respect to the enabled by the ICS explains the observed improvements in health
potential bargaining into or out of cooking selection processes out- indicators: the ICS facilitates outdoor cooking, the cooking duration
lined in Section 2.3, one would expect changes in prevalence rates is reduced, and the cooking and combustion process requires less
in the group of women who are not responsible for cooking if that supervision.
bargaining process was strong. However, this is not the case.
The bottom of Table 5 refers to results derived from ITT pro-
bit regressions for the same disease symptoms on the level of 3.5. Impact sustainability and upscaling the intervention
individual household members.11 We now look at the dummy
variables household member with symptoms of respiratory diseases Hitherto we have found quite strong and robust evidence for
and household member with eye problems, which take the value high take-up and impacts of ICS usage after one year that are, given
one if the respective household member reports having suffered the experimental set-up, internally valid. Internal validity, though,
from these symptoms at some point in the last six months before is only a necessary condition for high policy relevance. The decisive
the interview. We find prevalence rates of between 3% and 12% questions in a next step are, first, whether these usage rates and
impacts persist over time, second, whether the intervention yields
11
We abstain from showing ATT estimations here, since the specification requires
interacting the treatment status with the dummy variable that indicates the cooking
12
responsibility. We would thus need to instrument the ICS uptake and the interaction Comparable data on respiratory system diseases and eye problems for Sub-
term, respectively. Using the random assignment as instrument for both ICS uptake Saharan-African countries is very sparse (van Gemert et al., 2011). Studies with
and also in the interaction term (which is a controversial procedure) does not deliver indicator definitions that come closest to ours show comparable levels in these
any result in our case, since the estimations do not converge. health problems (ANSD and ICF International, 2012; Adeloye et al., 2013).
Hawthorne and John Henry effects as well as possible limitations to

generalizations beyond our specific intervention and sample. Over-
all, the external validity of this RCT is quite high. In particular, the
fact that our field experiment was implemented in an unobtrusive
way enables us to transfer the findings to a non-experimental set-
up. In terms of transferability of the high take-up rates, the severe
firewood scarcity in our study area may have increased the incen-
tives for households to effectively use the ICS. Take-up in more
biomass-abundant regions could hence be lower. Another driver of
high usage rates is the fact that we are working with a type of ICS
that is adapted to the rural conditions not only in our study area
but also beyond.
In sum, if permanent access to ICS is ensured and provided that
the ICS is slightly modified in response to potentially different cook-
ing habits elsewhere (e.g. pot sizes or cooking fuel), our findings are
Fig. 4. Decline in the percentage of ICS users among randomized households.
transferable to different populations in (Western) Africa.
benefits that outweigh the costs and if so, third, whether it can be
upscaled. 4. Discussion and conclusion
In order to assess the sustainability of the observed impacts we
conducted an ICS usage tracking survey three and a half years after In this paper we evaluated take-up behaviour and impacts of
the random assignment. This enables us to examine the durability improved cooking stoves (ICSs) in rural Senegal by means of a ran-
of the randomized ICS under day-to-day rural cooking conditions domized controlled trial (RCT). ICSs are widely seen as an option for
and the usage behaviour over the full life-span of the ICS. In this sec- developing countries to combat the devastating effects of wood-
ond follow-up round, we did not collect information on impacts, fuel usage for cooking purposes on people’s health, work load as
because a majority of the stoves would have already exceeded well as the environment. The first finding is that ICS take-up was
their useable lifetimes. For statistical power reasons, this reduced close to 100% among the randomly assigned households and that
sample size would have made an examination of impacts difficult. people only cease to use the ICS if it deteriorates. This sustain-
Considering an expected life span of one to three years, the propor- ably high take-up rate comes as a surprise, since it is often argued
tion of 49% of treatment households still using the randomized ICS among development practitioners that people would not use ICSs
can, nevertheless, be considered surprisingly high. In the enumer- for which they have not paid. It also constitutes a major difference
ators’ appraisal, half of these ICS were still in good condition. The to the findings in Hanna et al. (2012). Major reasons for this are
proportion of dishes prepared with an ICS among ICS users declined probably differences in how convenient and advantageous the ICS
only slightly from 70% in 2010 to 62%. As can be seen in Fig. 4, those technology is from the household perspective and to which degree
treatment households who do not use the ICS anymore (51%) only the ICS has a better performance than the existing stove portfo-
slowly ceased to use their ICS. All of them have done so because lio. First, the ICS used in our study is maybe closer to the regular
the stove has deteriorated and 90% of them still used their ICS two cooking habits of the target population. It is easier to use, does
years after randomization.13 not require any particular maintenance and due to its portability
Against this background of persisting usage behaviour we con- households can decide themselves where to cook. Second, wood
duct a simple cost–benefit analysis. The costs of the ICS are scarcity is probably higher in our study area thereby increasing the
represented by the market price of around 10 US$. For a conser- relevance of an ICS. Third, more than a fourth of the households
vative estimate of the benefits, to begin with, we only account for in the study in India already also used cleaner fuels like electricity
reductions in firewood consumption. We take the average price and gas before the randomization so that the randomized ICS did
of 0.02 US$/kg of firewood paid by firewood-purchasing house- not necessarily represent an improvement for them.
holds at the time of the follow-up survey as an upper bound of The firewood savings were found to be statistically significant
the shadow price for collected firewood. Valuing the firewood that and substantial. They amount to around 30% per week in the most
ICS users save compared to traditional stove users shows that the likely scenario where households have one ICS and continue to
savings amount to 2.03 US$ per month. Even with a lower shadow use traditional stoves complementarily. If these complementar-
price for collected firewood, it is obvious already at this stage that ily used traditional stoves were also replaced by ICSs, the savings
the benefits of ICSs outweigh the costs by far over its life span. If could increase further up to around 40%. Such a reduction in fire-
health benefits and the reduction in cooking duration were taken wood consumption is an important impact in an arid country like
into account, the benefits would be even greater. Similarly, bene- Senegal, where forests are permanently under pressure and fire-
fits would turn out to be larger when social costs were additionally wood provision is a daily hardship for rural women. Moreover,
included, i.e. forest degradation, village air pollution, and carbon the CO2 that is sequestered in both dead wood and green wood
emissions. As a consequence, upscaling the intervention seems to is set free with obvious implications for climate change processes.
be economically sensible. Deforestation and forest degradation are in fact a relevant source
However, some challenges for external validity of the RCT need of global CO2 emissions. IPCC (2013) estimates that net land-use
to be considered when transferring the results to an upscaled change, mainly deforestation, is responsible for about 10% of the
intervention or to other regions. In Appendix D, we discuss the total anthropogenic CO2 emissions. To the extent woodfuel usage
aspects raised by Duflo et al. (2008a): general equilibrium effects, contributes to these processes, dissemination of ICS as used in this
study can help to reduce such losses of carbon sinks.
We also observe a reduction in firewood collection time, but this
13
Within the complete investigation period of three and a half years, the ICS was
is only borderline significant. Furthermore, we find that cooking
destroyed in two cases, once because of heavy rainfall and once because the kitchen duration is decreased significantly by over 20%. In addition, the
wall collapsed. In four cases, the ICS was stolen. cooking process is facilitated so that the time the cook needs to be
in direct proximity to the cooking spot is reduced. Together with from the point of view of the individual households. The inter-
an increase in outdoor cooking, this leads to an evident reduction play of cash and credit constraints, the lack of information, and
in exposure to harmful smoke. Consequently, we also find a clear the fact that in many cases the women responsible for cooking do
indication of a decrease in respiratory disease symptoms and eye not manage the household budget, all this however raises doubts
problems, with a drop of around 9 percentage points each for the about whether households would be able and willing to pay the
women responsible for cooking. market price for ICSs, even if the stoves were readily available on
Our self-reported health outcomes might of course feed criti- the market. The experience from long-standing pilot dissemination
cism that objective indicators such as individual particulate matter activities in neighbouring rural areas in Senegal seems to support
exposure as measured in the RESPIRE study deliver more accurate the presumption that the majority of rural households would prob-
information. Apart from the high costs of executing such a sur- ably stick to the cheaper traditional three-stone or metal stoves.14
vey, there is also a trade-off between the increased accuracy and a As the strategy of promoting the creation of sustainable ICS mar-
Hawthorne effect. Study participants can be expected to behave kets has already proven to be difficult in urban areas, where fuels
differently if they are asked to wear exposure monitoring tools are purchased and ICS benefits are clearly monetary ones, it can be
for 24 h, for example. Hence, self-reported and objective measure- expected to require even more efforts and resources in rural areas.
ments can rather be seen as complements. In addition, one might In combination, the high take-up and the positive external
suspect an auspices or courtesy bias in our data where respon- effects of ICS usage observed in this study would suggest that more
dents express their gratitude for having received the ICS or expect direct options of ICS promotion should be reconsidered. This could
additional benefits from a satisfied implementing agency. In their mean, for example, directly subsidizing the production of ICSs in
stove study in Ghana, Burwen and Levine (2012) suspect that this rural areas so that end-user prices can compete with traditional
effect biases their results, since the positive effects on self-reported stoves. If the findings can be confirmed in other rural areas, it might
health they observe are not plausible given that smoke exposure even be an option to distribute ICSs directly to the households,
is not reduced. However, this bias is not likely in the present case, either for free or at a very low, symbolic price. While this would
since participating households were not aware of the study’s focus be in contrast to the strategies pursued by most ICS dissemination
on ICSs. Even if some households noticed the role the ICS played in programmes, and many practitioners are opposed to a free distribu-
this study, they were unlikely to relate its usage to health outcomes. tion policy, the empirical literature provides evidence from other
The fact that we did not observe any health effect among house- field experiments that supports the idea. Paying a positive price
hold members not responsible for cooking strongly underpins this does not necessarily lead to higher usage rates of health-relevant
view. Hence, different from the Burwen and Levine (2012) study, goods (Cohen and Dupas, 2010; Tarozzi et al., 2014), charging cost-
placebo outcome indicators corroborated our findings. Finally, the sharing prices substantially reduce take-up (Kremer and Miguel,
magnitude of observed savings is in the range of what is expected 2007) and there is only weak evidence yet that price serves to allo-
based on laboratory tests and, thus, does not feed the suspicion of cate the health-relevant goods to those with the most need (Okeke
biased responses. et al., 2013).
Altogether, the substantial and statistically significant impacts Any ICS promotion policy has to be designed in close coop-
on different levels of indicators including positive external effects eration with local stakeholders, putting particular effort into the
such as reduced deforestation and household air pollution substan- choice of technically and culturally appropriate ICS models. Insti-
tiate the efforts that the international community dedicates to the tutions have to be created to sustain the distribution of direct
dissemination of ICSs. The findings on the health level fit into the subsidies for the ICSs, thereby avoiding the flash-in-the-pan effect
concept of intensive and extensive margins of behaviour that has a that has been observed in unsuccessful earlier ICS subsidization
longer tradition in agricultural economics (Feder et al., 1985) and programmes.
has recently been brought into the debate on public health-relevant As these recommendations can only be an interim conclusion,
behaviour in developing countries (see Dupas, 2011). The present further research on the take-up behaviour and on the impacts of
analysis suggests that not only the extensive margin of cooking ICS usage has to follow up in other regions and potentially other
should be addressed by disseminating cleaner stoves, but also the seasons as well. The indication of positive health effects of the
intensive margin by, for instance, raising awareness of the need to simpler ICS used in this RCT calls for taking into account cooking
reduce smoke exposure. This behavioural dimension should also be behaviour in these studies. As evidenced by the lower take-up of
taken into account by the Global Alliance for Clean Cookstoves and ICSs in the Hanna et al. (2012) study in India, the results may vary
the United Nations in outlining future policies to increase access in different environments and if other ICS types are used. In addi-
to improved or clean cooking stoves. Even ICSs that still emit con- tion, further experimental studies should examine the mechanisms
siderable amounts of smoke might trigger positive health effects if behind take-up behaviour, such as the households’ willingness-to-
they also induce exposure-relevant behavioural changes. pay for ICSs, but also the role of credit constraints, information,
The almost universal take-up among randomly assigned ICS and woodfuel scarcity. Such research efforts can substantiate –
owners suggests that if they have an easy opportunity to obtain an or contradict – the findings in this study and will thereby help
ICS that is adapted to local cooking habits people also use it. A sim- to decide under which circumstances and to which degree sub-
ple back-of-the-envelope cost–benefit calculation further made it sidies might in fact be required to encourage rural people to obtain
clear that investing in an ICS would be a profitable investment ICSs.
14
See also Miller and Mobarak (2013) for evidence on low purchase rates of ICS in
Bangladesh.
Appendix A. Technical features of improved cookstove used in different studies
Study reference Stove type/model name Combustion Fuel type Feed type Chimney Portability Approx. Further stove
chamber type cost (US$) references
This study Jambaar wood Ceramic Wood Continuous No Yes 10 GIZ (2011a)
Bensch and Peters (2013) Jambaar charcoal Ceramic Charcoal Batch fed No Yes 9–19 GIZ (2011b)
Burwen and Levine (2012) Council of Scientific and Industrial Mud Wood Continuous Yes No <10 –
Research (CSIR) improved stove
Hanna et al. (2012) Appropriate Rural Technology Institute Mud Wood Continuous Yes No 12.5 –
(ARTI) improved stove
Masera et al. (2007) Patsari stove Mud/brick Wood Continuous Yes No 35 Kshirsagar and
Kalamkar (2014)
Miller and Mobarak (2013) Bangladesh Council of Scientific and Clay Wood Continuous E: no E: Yes E: $5.8 Mobarak et al.
Industrial Research (BCSIR) “efficiency” C: Yes C: No C: $10.9 (2012)
(E) and “chimney” (C) stove
RESPIRE Plancha mejorada Brick Wood Continuous Yes No 100–150 Díaz (2008)
Notes: All listed stoves are direct combustion stoves with natural draft. Further main (and more advanced) combustion types are gasifier and rocket type direct combustion;
forced draft is an alternative to natural draft. In addition, the combustion chamber may be metallic.
Appendix B. Stove types used in the survey area
Stove type/model name Combustion chamber type Fuel type Feed type Chimney Portability Approx. cost (US$)
Three-stone stoves None Biomass Continuous No Yes –

Os None Biomass Continuous No Yes 1–2
Cire khatach Metal Crop residues Batch fed No Yes 3–5
Cire wood Metal Wood Continuous No Yes 3–5
Malagasy stove Metal Charcoal (wood) Continuous No Yes 3–5
Jambaar Wood Ceramic Wood Continuous No Yes 10
Appendix C. Power calculation Appendix D. External validity
Since information on our primary impact variable, firewood External validity prevails if a study’s findings can be transferred
consumption, was not available in existing data sets for the from the study population to the policy population. In other words,
target region of our study, we took data collected in the quasi- external validity is concerned with whether findings obtained from
experimental study presented in Bensch and Peters (2013) from a small sample group represent the wider population in real world
urban Senegal to approximate the relevant parameters (prospec- situations. In the following, we discuss how our RCT design took
tive power analysis). After the follow-up survey, we verified these into account the three dimensions of external validity as defined
parameters by rerunning the analysis with the actual baseline data by Duflo et al. (2008a): general equilibrium effects, Hawthorne and
for those households included in the analysis (retrospective power John Henry effects as well as possible limitations to generalizations
analysis).The sample size n is given by the following formula: beyond our specific intervention and beyond our sample.
1 2
r (sd1 + sd22 ) General equilibrium effects may occur in the present case if
n = D[(Z˛ + Zˇ )2 ] widespread ICS usage leads to a sizable reduction in firewood
(X2 − X1 )2
demand and, in turn, to a reduction in the costs of firewood pro-
Table C1 provides the description, the values and the sources of vision, either because prices decrease or because firewood is less
the different parameters. The decisive parameter to be defined by scarce and easier to collect. This might induce households to con-
the researcher is the minimum detectable effect size (ES), which sume more of the now cheaper fuel. Although this would bring
reflects the smallest relative reduction in woodfuel consumption welfare benefits such as more hot meals, from a public health and
that we are able to detect at the given significance level (see Bloom, resource saving perspective this might be considered an adverse
1995). While the CCT suggest an effect size of 40%, we chose a second-round effect. Since most households in rural Senegal col-
minimum detectable effect size of 30% in order to account for the lect firewood and do not buy it, this effect can be expected to be
possibility of an overestimated effect size in the CCT. We defined less pronounced than for market-based energy sources.
the probability of being assigned to the control group to be 60% and Another major risk to the external validity of RCT results is if par-
that for the ICS treatment group to be 40%. ticipants change their behaviour because they know that they are
Taking these parameters into account, we obtain a required participating in an experiment or are somehow under observation.
sample size of around 200 households, as is indicated in the last row While so-called Hawthorne effects (if treatment group members
of the column for the prospective analysis in Table C1. In order to change their behaviour) or John Henry effects (if control group mem-
account for the sensitivity of the different parameters in the power bers change their behaviour) can never be ruled out completely,
calculation and potential attrition or non-compliance, we built in a we reduced the risk considerably through various precautionary
cushion and increased the number of households to be interviewed measures: first, we embedded the interviews in a baseline survey
to 250. for an electrification intervention under preparation in the studied
With respect to health and time savings impacts, the sample size areas (the intervention was not implemented in any of the sampled
required to measure significant effects tends to be substantially villages before the end of this study). The applied questionnaire
higher. The reason is that the effect on respiratory diseases, for covered a comprehensive set of socio-economic and energy-related
example, can be expected to be less pronounced. The implication dimensions such as electricity so that attention was not focused
of this is that the power of our study is not necessarily sufficient to primarily on cooking-related parts of the interviews. Second, the
detect all relevant health and time savings effects. lottery was framed as a reward for all households to recompense
Table C1
Table C1 Parameters for power calculation.
Description Value Source
Prospective Retrospective
D = 1 + (m + 1) Design effect, accounting for the loss of variation in the data if clustered 1.59 2.25 Household data†
with instead of simple random sampling is used
Intra-cluster correlation, i.e. the proportion of the overall variance with 0.031 0.069 Household data
respect to firewood consumption explained by within-village (cluster)
variance in the data
m Mean number of interviewed households per cluster (village) 20 229/12 = 19.1 Defined
Z˛ Critical value (Z-score) for a given level of confidence ˛ reflecting the 1.96 (˛ = 5%) 1.96 (˛ = 5%) Defined (conventional)
probability that the null hypothesis is rejected given that it is in fact true
Zˇ Z-score for a given level of confidence ˇ reflecting the probability that the null 0.84 (ˇ = 80%) 0.84 (ˇ = 80%) Defined (conventional)
hypothesis is rejected given that it is in fact false
R Ratio of treatment and control observations (ICS owners to non-owners) 0.66 90/139 = 0.65 Lottery outcome defined in
sampling design
sd1 Standard deviation of firewood consumption of ICS non-owners 0.266 0.259 Household data
sd2 Standard deviation of firewood consumption of ICS owners 0.186 0.181 Implicitly defined through
minimum detectable effect size
(see below)
X1 Per capita firewood consumption of ICS non-owners (in kg) 0.384 0.411 Household data
X2 Expected per capita firewood consumption of ICS owners (in kg) 0.269 0.288 Implicitly defined through
minimum detectable effect size
(see below)
ES = |X2 –X1 |/X1 Minimum detectable effect size 30% 30% Defined based on experiences
with laboratory tests
n = n (ICS owners) + Result of power calculation: required minimum sample sizes for treatment 192 = 76 + 116 229 = 90 + 139
n (non-owners) and control group
†
Household data refers to the data from the urban quasi-experimental study (“prospective”) and to the baseline data from the present study (“retrospective”) to corroborate
the calculations of the prospective analysis.
them for participation in the electrification baseline survey, a sim-

ilar procedure as applied by De Mel et al. (2008) in an RCT on
business grants among micro-enterprises in Sri Lanka. Third, all
survey activities were conducted in an unobtrusive way by local
interviewers and community workers.15
According to Duflo et al. (2008a), three problems may hamper
a valid generalization beyond the specific programme and sample.
First, it may be that the particular care with which the random-
ized treatment was implemented makes it difficult to upscale the
intervention. As outlined in Section 2.4 and the instructions given
to participants (see Appendix E) we keep to what real-world users
are told about the randomized ICS. Furthermore, we conducted
the study together with the Government of Senegal and GIZ and
thereby mimicked a typical ICS dissemination intervention. Sec-
ond, the question arises as to whether we can transfer the results
to a slightly modified intervention. Here, the fact that we dis-
tributed the ICS for free deserves some attention as usage behaviour
might change if households need to pay for the ICS. If a change
can be suspected when households with sufficient willingness-to-
pay self-select into the treatment, then most practitioners would
expect an intensification of usage and, thus, also impacts. Yet, usage Fig. E1. Location of survey sites.
intensity is already high so that no substantial increase can be
expected. The third point is the particularity of the study popu- stoves (see Fig. E1 and Table E1). Within the villages, all households
lation. The most important characteristics here are the fuels used were eligible. They were randomly sampled and none of the sam-
for cooking and their availability. Firewood is the dominant cooking pled households refused to participate in the RCT (see also Fig. E2).
fuel in our sample as it is in major parts of rural Sub-Saharan Africa.
98% of sample households use firewood as their primary cooking E.3. Instructions given to participants
fuel. The national average for rural Senegal is slightly lower at 89%
(ANSD, 2006). Across Sub-Saharan Africa this value amounts to 87% On the day of the ICS distribution, households were reminded
(UNDP/WHO, 2009).16 The reason for these slightly lower numbers via phone in order to make sure the person responsible for cooking
of firewood usage is that the national rural averages include peri- in the household was present. A local staff member with several
urban areas, where charcoal is also used. Firewood usage patterns years of experience in ICS usage training had a meeting with those
in rural Africa excluding peri-urban areas will be very much the women who had received an ICS. In the local language Wolof, he
same as in our sample for the vast majority of countries. presented the ICS as a fuel-saving device and briefly informed about
Firewood availability in our study area is typical for large parts convenience co-benefits: a quicker cooking process, less smoke,
of interior Western Africa and dry savannah regions in general. All and a cleaner kitchen. He verbally informed the women about the
the households in our target area use firewood, which is the case functioning of the stove and the proper utilization. He explained
in virtually all rural areas in Africa. Take-up rates and consequently to them that the clay inlay of the ICS serves the purpose of storing
impacts might change, though, in regions in which firewood is more the heat and that it could easily break if the embers were doused
abundantly available (e.g. the southern region of Senegal) or in with water; instead they were told to put the fire out with sand
which cleaner fuels are already available such as in urban Africa on the ground. Moreover, unlike with open fires for which people
or in parts of rural Asia (see Hanna et al., 2012 for an example). typically use entire branches or even trunks, the firewood has to be
chopped in order to fit the fuel feed entrance of the ICS. He advised
them not to use pot sizes that are too big for the stove and not
Appendix E. Experimental design
to move the pot when it is placed on the stove. Households were
also given a leaflet summarizing these instructions (Fig. E3). This
E.1. Design of the field experiment
is all regular information that is also provided by ICS traders in a
non-RCT setup. In addition, in order to avoid ICS misuse the women
The original design of the experiment was drafted in an incep-
were also asked not to share the ICS with other households or lend
tion report for the Independent Evaluation Unit of Deutsche
it to other women. From a methodological point of view, this was
Gesellschaft für Internationale Zusammenarbeit (GIZ) and finalized
also intended to avoid treatment contamination.
on August 20, 2009. It was concretized during an in-country prepa-
ration mission between October 12 and 22, 2009 and is outlined in Table E1
Section 2.4 of this paper (‘RCT design and implementation’). Table E1 List of survey sites.
Village Rural community

E.2. Selection and eligibility of participants Pethie Djilor
Keur Mandao Djilor
We selected twelve villages from the target region of a planned Ndoffane Ndarry Djilor
GIZ rural electrification intervention in Foundiougne District that Keur Omar Djilor
Goudeme Sidy Djilor
are far away from GIZ-supported producers of improved cooking
Darou Keur Mor Khoredi Diossong
Thiamene Ndiagnene Diossong
Simong Bambara Nioro A. Tall
15 Ndiayene Kad Nioro A. Tall
See Zwane et al. (2011) for an examination of how being surveyed might affect
Keur Bacar Santhie Keur S. Diané
response behaviour. The authors generally call for an unobtrusive method of data
Keur Maniane Keur S. Diané
collection.
16 Nema Bah Toubacouta
See Bonjour et al. (2013) for individual country estimates.
Fig. E2. Participant flow.

Fig. E3. Leaflet provided to households that received an ICS.

Appendix F. Additional estimation results
Table F1
Table F1 ATT results for household level indicators on firewood consumption, time expenditures, and health.
Difference in means Regression-adjusted difference in means
(se) p-Value (H0 : Diff = 0) Mean (se) p-Value (H0 : Diff = 0)

(1) (2) (3) (4)
Firewood consumption per week (kg) 27.74 (6.53) 0.00*** 27.64 (5.94) 0.00***
Duration of firewood collection per week (min) 153 ((102.9 ) 0.14 140 ((96.2 ) 0.15
Cooking duration per day (min) 84 ((22.1 ) 0.00*** 77 ((20.3 ) 0.00***
Respiratory system disease (%)
Any woman responsible for cooking 9.1 0.05* 9.2 0.05**
Any male −1.0 0.77 −1.3 0.71
Any woman not responsible for cooking 2.7 0.39 3.1 0.32
Eye problems (%)
Any woman responsible for cooking 9.9 0.02** 10.0 0.01**
Any male 2.5 0.45 2.5 0.44
Any woman not responsible for cooking −1.5 0.70 −0.8 0.82
Note: All computations are performed with heteroskedasticity corrected standard errors accounting for heterogeneity in treatment responses and include village dummies;
se – standard error.
*
**
Sgnificance level of 5%.
***
Table F2
Table F2 Outlier analysis for household and dish level indicators.
Outlier analysis using median regressions Outlier analysis using outlier exclusion
Difference in means Regression-adjusted difference Difference in means Regression-adjusted difference

in means in means
(se) p-Value (H0 : Mean (se) p-Value (H0 : (se) p-Value (H0 : Mean (se) p-Value (H0 :
Diff = 0) Diff = 0) Diff = 0) Diff = 0)
(1) (2) (3) (4)
Firewood consumption per week (kg) 26.50 (4.63) 0.00*** 26.51 (3.51) 0.00*** 19.15 (4.41) 0.00*** 18.53 (4.45) 0.00****
Firewood weight per dish (kg) 1.54 (0.19) 0.00*** 1.77 (0.13) 0.00*** 1.50 (0.12) 0.00*** 1.61 (0.13) 0.00***
Duration of firewood collection per week (min) 60 (58 ) 0.30 76 (66 ) 0.25 123 (66 ) 0.06* 117 (62 ) 0.06*
Cooking duration per day (min) 68 (21.6 ) 0.00*** 77 ((16.0 ) 0.00*** 40 ((17.1 ) 0.02** 40 ((17.2 ) 0.02**
Note: Median regressions are quantile regressions that determine the median of the dependent variable conditional on the values of the independent variables. For outlier
exclusion, outliers are defined as values more than two standard deviations away from the mean. All values are computed using robust standard errors; se – standard error.
*
**
***
Table F3
Table F3 Probit regression on health status of household members.
Estimator: Coefficient (Standard Error in parentheses)
Probit, ITT
Dependent variable: Household member with respiratory system Household member with eye problem
disease
Variable (1) (2) (3) (4)

ICS dummy −0.11 (0.16) −0.08 (0.16) −0.06 (0.17) −0.02 (0.15)
Household member is responsible for cooking 0.85*** (0.18) 0.87*** (0.16) 0.68*** (0.16) 0.77*** (0.15)
Household member is responsible for cooking × ICS dummy −0.41 (0.28) −0.43 (0.28) −0.62** (0.29) −0.59** (0.28)
Further household member variables

Household member’s sex 0.03 (0.17) 0.30* (0.16)
Household member’s age 0.01** (0.00) 0.02*** (0.00)
Household variables
Average number of people cooked for (in terms of the logarithm of adult equivalents) −0.34** (0.16) −0.17 (0.17)
Father has formal education −0.13 (0.17) −0.26 (0.25)
Mother has formal education 0.19 (0.12) 0.20 (0.14)
Household income (in logarithmic terms) 0.01 (0.04) 0.01 (0.06)
Telecommunication expenditures (in logarithmic terms) 0.04* (0.02) 0.00 (0.02)
Bank account ownership −0.50 (0.44) −0.52 (0.38)
Wall material of house is stone or brick −0.09 (0.14) −0.30** (0.15)
Ownership of sheep 0.04 (0.13) −0.08 (0.14)
Association membership of the mother 0.08 (0.12) −0.09 (0.14)
Village dummies Included Included Included Included

Constant −2.02*** (0.58) −2.07*** (0.20) −2.41*** (0.77) −2.14*** (0.22)

p-Value of interaction term 0.143 0.119 0.031 0.037
Pseudo R-squared 0.131 0.103 0.176 0.090
Note: The household control variable flooring material is not included, since it predicts failure perfectly in the estimated probit regressions; standard errors are clustered by
household.
*
**
***
References FAO (Food and Agriculture Organization of the United Nations, 2005b. Global Forest
Resources Assessment. Food and Agriculture Organization of the United Nations,
Adeloye, D., Chan, K.Y., Rudan, I., Campbell, H., 2013. An estimate of asthma preva- Rome.
lence in Africa: a systematic analysis. Croatian Medical Journal 54 (6), 519–531. Frondel, M., Peters, J., Vance, C., 2008. Identifying the rebound: evidence from a
ANSD (Agence Nationale de la Statistique et de la Démographie), 2006. Résul- German household panel. Energy Journal 29 (4), 154–163.
tats du troisième recensement général de la population et de l’habitat (2002): GAIN (Global Agricultural Information Network), 2011. Senegal, Grain and Feed
Rapport National de présentation, http://www.ansd.sn/publications/rapports Annual. West Africa Rice Annual, http://gain.fas.usda.gov/Recent%20GAIN%
enquetes etudes/enquetes/RGPH3 RAP NAT.pdf (last accessed 03.03.10). 20Publications/Grain%20and%20Feed%20Annual Dakar Senegal 5-6-2011.pdf
ANSD (Agence Nationale de la Statistique et de la Démographie), ICF International, (last accessed 08.02.12).
2012. Enquête Démographique et de Santé à Indicateurs Multiples au Sénégal GIZ (Gesellschaft für Internationale Zusammenarbeit), 2011a. Firewood Jam-
(EDS-MICS) 2010–2011. ANSD and ICF International, Calverton, MD, USA. bar Stove, Senegal, https://energypedia.info/wiki/File:GIZ HERA 2011 Jambar
Armstrong, J.R., Campbell, H., 1991. Indoor air pollution exposure and lower Bois Senegal.pdf (last accessed 25.11.14).
respiratory infections in young Gambian children. International Journal of Epi- GIZ (Gesellschaft für Internationale Zusammenarbeit), 2011b. Charcoal Jam-
demiology 20 (2), 424–429. bar Stove, Benin, Kenya, Senegal. https://energypedia.info/images/b/b1/
Barnes, D.F., Openshaw, K., Smith, K., Van der Plas, R., 1994. What Makes People GIZ HERA 2011 Jambar Charbon Senegal.pdf (last accessed 25.11.14).
Cook with Improved Biomass Stoves? A Comparative International Review of Hanna, R., Duflo, E., Greenstone, M., 2012. Up in Smoke: The Influence of Household
Stove Programs, World Bank Technical Paper No. 242. World Bank. Behavior on the Long-run Impact of Improved Cooking Stoves, CEEPR WP 2012-
Bensch, G., Peters, J., 2013. Alleviating deforestation pressures? Impacts of improved 008. MIT Center for Energy and Environmental Policy Research.
stove dissemination on charcoal consumption in urban Senegal. Land Economics Herring, H., Sorrell, S., Elliott, D., 2009. Energy Efficiency and Sustainable Consump-
89 (4), 676–698. tion – The Rebound Effect. Palgrave Macmillan, New York.
Bloom, H., 1995. Minimum detectable effect size – a simple way to report the sta- Idler, E., Benyamini, Y., 1997. Self-assessed health and mortality: a review of twenty-
tistical power of experimental designs. Evaluation Review 19 (5), 547–556. seven community studies. Journal of Health and Social Behavior 38 (1), 21–37.
Bonjour, S., Adair-Rohani, H., Wolf, J., Bruce, N.G., Mehta, S., Prüss-Ustün, A., Lahiff, IPCC (Intergovernmental Panel on Climate Change), 2013. Climate Change 2013:
M., Rehfuess, E.A., Mishra, V., Smith, K.R., 2013. Solid fuel use for household The Physical Science Basis. Contribution of Working Group I to the Fifth Assess-
cooking: country and regional estimates for 1980–2010. Environmental Health ment Report of the Intergovernmental Panel on Climate Change. Cambridge
Perspectives 121 (7), 784–790. University Press, Cambridge/New York.
Bruhn, M., McKenzie, D., 2009. In pursuit of balance: randomization in practice Kamali, A., Quigley, M., Nakiyingi, J., Kinsman, J., Kengeya Kayondo, J., Gopal, R.,
in development field experiments. American Economic Journal: Applied Eco- Ojwiya, A., Hughes, P., Carpenter, L.M., Whitworth, J., 2003. Syndromic manage-
nomics 1 (4), 200–232. ment of sexually-transmitted infections and behaviour change interventions
Burnett, R.T., Pope III, C.A., Ezzati, M., Olives, C., Lim, S.S., Mehta, S., Shin, H.H., Singh, on transmission of HIV-1 in rural Uganda: a community randomised trial. The
G., Hubbell, B., Brauer, M., Anderson, H.R., Smith, K.R., Balmes, J.R., Bruce, N.G., Lancet 361 (9358), 645–652.
Kan, H., Laden, F., Prüss-Ustün, A., Turner, M.C., Gapstur, S.M., Diver, W.R., Cohen, Kan, X., Chiang, C.Y., Enarson, D.A., Chen, W., Yang, J., Chen, G., 2011. Indoor solid
A., 2014. An integrated risk function for estimating the global burden of disease fuel use and tuberculosis in China: a matched case–control study. BMC Public
attributable to ambient fine particulate matter exposure. Environmental Health Health 11 (1), 1–7.
Perspectives 122 (4), 397–403. Kjellsson, G., Clarke, P., Gerdtham, U.-G., 2014. Forgetting to remember or remem-
Burwen, J., Levine, D.E., 2012. A rapid assessment randomized-controlled trial of bering to forget: A study of the recall period length in health care survey
improved cookstoves in rural Ghana. Energy for Sustainable Development 16 questions. Journal of Health Economics 35, 34–46.
(3), 328–338. Kremer, M., Miguel, E., 2007. The illusion of sustainability. Quarterly Journal of
Butrick, E., Peabody, J., Solon, O., DeSalvo, K., Quimbo, S., 2010. A compari- Economics 122 (3), 1007–1065.
son of objective biomarkers with a subjective health status measure among Kremer, M., Miguel, E., Mullainathan, S., Null, C., Zwane, A.P., 2009. Making Water
children in the Philippines. Asia-Pacific Journal of Public Health 24 (4), Safe: Price, Persuasion, Peers, Promoters, or Product Design? Mimeo.
565–576. Kshirsagar, M.P., Kalamkar, V.R., 2014. A comprehensive review on biomass cook-
Campbell, H., Armstrong, J.R., Byass, P., 1989. Indoor air pollution in developing stoves and a systematic approach for modern cookstove design. Renewable &
countries and acute respiratory infection in children. Lancet 1, 1012. Sustainable Energy Reviews 30, 580–603.
Cohen, J., Dupas, P., 2010. Free distribution or cost-sharing? Evidence from a ran- Lewis, J.J., Pattanayak, S.K., 2012. Who adopts improved fuels and cookstoves? A
domized Malaria prevention experiment. Quarterly Journal of Economics 125 systematic review. Environmental Health Perspectives 120 (5), 637–645.
(1), 1–45. Luby, S., Mendoza, C., Keswick, B., Chiller, T.M., Hoekstra, R., 2008. Difficulties in
Das, J., Hammer, J., Sánchez-Paramo, C., 2012. The impact of recall periods on bringing point-of-use water treatment to scale in rural Guatemala. The Ameri-
reported morbidity and health seeking behaviour. Journal of Development Eco- can Journal of Tropical Medicine and Hygiene 78 (3), 382–387.
nomics 98 (1), 76–88. Martin II, W.J., Glass, R.I., Balbus, J.M., Collins, F.S., 2011. A major environmental
De Mel, S., McKenzie, D., Woodruff, C., 2008. Returns to capital in micro-enterprises: cause of death. Science 334 (6053), 180–181.
evidence from a field-experiment. Quarterly Journal of Economics 123 (4), Masera, O., Edwards, R., Armendariz, C., Berrueta, V., Johnson, M., Rojas Bracho, L.,
1329–1372. Riojas-Rodríguez, H., Smith, K.R., 2007. Impact of Patsari improved cookstoves
Dherani, M., Pope, D., Mascarenhas, M., Smith, K.R., Weber, M., 2008. Indoor air on indoor air quality in Michoacán, Mexico. Energy for Sustainable Development
pollution from unprocessed solid fuel use and pneumonia risk in children aged 11 (2), 45–56.
under 5 years: a systematic review and meta-analysis. Bulletin of the World McCracken, J., Smith, K., Stone, P., Díaz, A., Arana, B., Schwartz, J., 2011. Intervention
Health Organization 86 (5), 390–398. to lower household wood smoke exposure in Guatemala reduces ST-segment
Díaz, E., Smith-Sivertsen, T., Pope, D., Lie, R., Diaz, A., McCracken, J., Arana, B., Smith, depression on electrocardiograms. Environmental Health Perspectives 119 (11),
K., Bruce, N., 2007. Eye discomfort, headache and back pain among Mayan 1562–1568.
Guatemalan women taking part in a randomized stove intervention trial. Journal Miilunpalo, S., Vuori, I., Oja, P., Pasanen, M., Urponen, H., 1997. Self-rated health
of Epidemiology & Community Health 61 (1), 74–79. status as a health measure: the predictive value of self-reported health status
Díaz, E., (Ph.D. dissertation) 2008. Impact of Reducing Indoor Air Pollution on on the use of physician services and on mortality in the working-age population.
Women’s Health. RESPIRE Guatemala-Randomised Exposure Study of Pollu- Journal of Clinical Epidemiology 50 (5), 517–528.
tion Indoors and Respiratory Effects. Department of Public Health and Primary Miller, G., Mobarak, M., 2013. Gender Differences in Preferences, Intra-household
Health Care, The University of Bergen. Externalities, and Low Demand for Improved Cookstoves, NBER Working Paper
Duflo, E., Glennerster, R., Kremer, M., 2008a. Using randomization in development 18964. National Bureau of Economic Research.
economics research: a toolkit. In: Schultz, P., Strauss, J. (Eds.), Handbook of Mobarak, A.M., Dwivedi, P., Bailis, R., Hildemann, L., Miller, G., 2012. Low
Development Economics. North Holland, Amsterdam, pp. 3895–3962. demand for nontraditional cookstove technologies. Proceedings of the National
Duflo, E., Greenstone, M., Hanna, R., 2008b. Indoor air pollution, health and economic Academy of Sciences of the United States of America 109 (27), 10815–10820,
well-being. Sapiens Journal 1 (1), 1–9. http://dx.doi.org/10.1073/pnas.1115571109.
Dupas, P., 2011. Do teenagers respond to HIV risk information? Evidence from a Mueller, V., Pfaff, A., Peabody, J., Liu, Y., Smith, K.R., 2013. Improving stove evalu-
field experiment in Kenya. American Economic Journal: Applied Economics 3 ation using survey data: who received which intervention matters. Ecological
(1), 1–34. Economics 93, 301–312.
Ezzati, M., Kammen, D.M., 2001. Indoor air pollution from biomass combustion and Okeke, E.N., Adepiti, C.A., Ajenifuja, K.O., 2013. What is the price of prevention? New
acute respiratory infections in Kenya: an exposure–response study. The Lancet evidence from a field experiment. Journal of Health Economics 32 (1), 207–218.
358, 619–624. Pandey, M.R., 1984a. Prevalence of chronic bronchitis in a rural community of the
Ezzati, M., Kammen, D.M., 2002. Household energy, indoor air pollution, and health hill region of Nepal. Thorax 39, 331–336.
in developing countries: knowledge base for effective interventions. Annual Pandey, M.R., 1984b. Domestic smoke pollution and chronic bronchitis in a rural
Review of Environment and Resources 27, 233–270. community of hill region of Nepal. Thorax 39, 337–339.
Feder, G., Just, R.E., Zilberman, D., 1985. Adoption of agricultural innovations in Pandey, M.R., Smith, K.R., Boleij, J.S.M., Wafula, E.M., 1989. Indoor air pollution in
developing countries: a survey. Economic Development and Cultural Change developing countries and acute respiratory infection in children. The Lancet 1,
33 (2), 255–298. 427–429.
FAO (Food and Agriculture Organization of the United Nations), 2005a. State of the Pattanayak, S., Pfaff, A., 2009. Behavior, environment, and health in developing
World’s Forests. Food and Agriculture Organization of the United Nations, Rome. countries: evaluation and valuation. Annual Review of Resource Economics 1,
183–217.
Peabody, J.W., Nordyke, R.J., Tozija, F., Luck, J., Munoz, J.A., Sunderland, A., DeSalvo, Tappan, G., Sall, M., Wood, E.C., Cushing, M., 2004. Ecoregions and land cover trends
K., Ponce, N., McCulloch, C., 2006. Quality of care and its impact on population in Senegal. Journal of Arid Environments 59 (3), 427–462.
health: a cross-sectional study from Macedonia. Social Science & Medicine 62 Tarozzi, A., Mahajan, A., Blackburn, B., Kopf, D., Krisham, L., Yoong, J., 2014. Micro-
(9), 2216–2224. loans, insecticide-treated bednets, and malaria: evidence from a randomized
Pitt, M.M., Rosenzweig, M.R., Hassan, M.N., 2006. Sharing the Burden of Disease: controlled trial in Orissa, India. American Economic Review 104 (7), 1909–1941.
Gender, the Household Division of Labor and the Health Effects of Indoor Air UNDP/WHO (United Nations Development Programme and World Health Organi-
Pollution. Mimeo. zation), 2009. The Energy Access Situation in Developing Countries – A Review
Pope, C.A., Burnett, R.T., Turner, M.C., Cohen, A., Krewski, D., Krewski, D., Jerrett, Focused on the Least Developed Countries and Sub-Saharan Africa. United
M., Gapstur, S.M., Thun, M.J., 2011. Lung cancer and cardiovascular disease Nations Development Programme, New York.
mortality associated with ambient air pollution and cigarette smoke: shape of van Gemert, F., van der Molen, T., Jones, R., Chavannes, N., 2011. The impact of
the exposure–response relationships. Environmental Health Perspectives 119, asthma and COPD in sub-Saharan Africa. Primary Care Respiratory Journal 20
1616–1621. (3), 240–248.
Rehfuess, E.A., Puzzolo, E., Stanistreet, D., Pope, D., Bruce, N., 2014. Enablers and WEC/FAO (World Energy Council and Food and Agriculture Organization of the
barriers to large-scale uptake of improved solid fuel stoves: a systematic review. United Nations), 1999. The Challenge of Rural Energy Poverty in Developing
Environmental Health Perspectives 122 (2), 120–130. Countries. World Energy Council, London.
Shindell, D., Kuylenstierna, J.C.I., Vignati, E., van Dingenen, R., Amann, M., Klimont, Wendland, K.J., Pattanayak, S.K., Sills, E.O., 2015. National-level differences in the
Z., Anenberg, S.C., Muller, N., Janssens-Maenhout, G., Raes, F., Schwartz, J., Falu- adoption of environmental health technologies: a cross-border comparison
vegi, G., Pozzoli, L., Kupiainen, K., Höglund-Isaksson, L., Emberson, L., Streets, from Benin and Togo. Health Policy Planning 30 (2), 145–154.
D., Ramanathan, V., Hicks, K., Kim Oanh, N.T., Milly, G., Williams, M., Demkine, WHO (World Health Organisation), 2014. Burden of Disease from House-
V., Fowler, D., 2012. Simultaneously mitigating near-term climate change and hold Air Pollution for 2012, http://www.who.int/phe/health topics/outdoorair/
improving human health and food security. Science 335, 183–189. databases/FINAL HAP AAP BoD 24March2014.pdf (last accessed 26.04.14).
Smith, K.R., McCracken, J.P., Weber, M.W., Hubbard, A., Jenny, A., Thompson, L.M., WHO (World Health Organization), 2009. Country Profile of Environmental Burden
Balmes, J., Díaz, A., Arana, B., Bruce, N., 2011. Effect of reduction in household of Disease – Senegal. World Health Organisation, Geneva.
air pollution on childhood pneumonia in Guatemala (RESPIRE): a randomised World Bank, 2011. Household Cookstoves, Environment, Health, and Climate
controlled trial. The Lancet 378 (9804), 1717–1726. Change. A New Look on an Old Problem. World Bank, Washington.
Smith-Sivertsen, T., Díaz, E., Pope, D., Lie, R.T., Díaz, A., McCracken, J.P., Bakke, P., Yu, F., 2011. Indoor air pollution and children’s health: net benefits from stove and
Arana, B., Smith, K.R., Bruce, N., 2009. Effect of reducing indoor air pollution behavioral interventions in rural China. Environmental and Resource Economics
on women’s respiratory symptoms and lung function: the RESPIRE randomized 50 (4), 495–514.
trial, Guatemala. American Journal of Epidemiology 170 (2), 211–220. Zwane, A.P., Zinman, J., Van Dusen, E., Pariente, W., Null, C., Miguel, E., Kremer, M.,
Smith-Sivertsen, T., Díaz, E., Bruce, N., Díaz, A., Khalakdina, A., Schei, M.A., Karlan, D.S., Hornbeck, R., Giné, Y., Duflo, E., Devoto, F., Crepon, B., Banerjee,
McCracken, J.P., Arana, B., Klein, R., Thompson, L.M., Smith, K.R., 2004. Reducing A., 2011. Being surveyed can change later behavior and related parameter esti-
indoor air pollution with a randomized intervention design – a presentation of mates. Proceedings of the National Academy of Sciences of the United States of
the stove intervention study in the Guatemalan highlands. Norsk Epidemiologi America 108 (5), 1821–1826.
14 (2), 137–143.

The effect of medical marijuana laws on adolescent and adult use of

marijuana, alcohol, and other substances夽,夽夽
Hefei Wen a,∗ , Jason M. Hockenberry a,b , Janet R. Cummings a
a
Emory University, Department of Health Policy and Management, 1518 Clifton Road, Atlanta, GA 30322, United States
b
National Bureau of Economic Research (NBER), 1050 Massachusetts Avenue, Cambridge, MA 02138, United States
Article history: We estimate the effect of medical marijuana laws (MMLs) in ten states between 2004 and 2012 on ado-
Received 22 May 2014 lescent and adult use of marijuana, alcohol, and other psychoactive substances. We find increases in the
Received in revised form 23 February 2015 probability of current marijuana use, regular marijuana use and marijuana abuse/dependence among
those aged 21 or above. We also find an increase in marijuana use initiation among those aged 12–20. For
those aged 21 or above, MMLs further increase the frequency of binge drinking. MMLs have no discernible
impact on drinking behavior for those aged 12–20, or the use of other psychoactive substances in either
age group.
I18
K32 © 2015 Elsevier B.V. All rights reserved.
Keywords:
Medical marijuana law
Marijuana use
Alcohol use
Natural experiment
As of February 2015, 23 states and the District of Columbia consensus about the relief medical marijuana can bring for a range
have implemented medical marijuana laws (MMLs), which permit of serious illnesses, concerns have been voiced that MMLs may
marijuana use for medical purposes. Three states (i.e., Maryland, give rise to increased marijuana use in the general population and
Minnesota, and New York) adopted MMLs during 2014, and an increased use of other substances. Legislative and public attention
additional 11 states1 passed pro-medical marijuana legislation. have focused on these issues, but the empirical evidence is limited.
Medical marijuana bills have also been considered in many of the We contribute to the literature on the effects of marijuana
remaining states and are likely to land on the legislative agenda liberalization policies by examining the effect of the implemen-
in more states in the near future. Understanding the behavioral tation of MMLs in ten states between 2004 and 2012 on a variety
and public health implications of this evolving regulatory envi- of substance use outcomes including marijuana use, alcohol use,
ronment is critical for the ongoing implementation of MMLs and pain medication misuse, and hard drug use in both adolescent
future iterations of marijuana policy reform. Despite the growing and adult populations. To tease out the potential causal effect of
MML implementation, we exploited the geographic identifiers in a
restricted-access version of the National Survey on Drug Use and
夽 The authors gratefully acknowledge the helpful comments on earlier drafts of Health (NSDUH) micro-level data and estimated two-way fixed
this study from Sara J. Markowitz and David H. Howard. All errors are our own. effects models with state-specific linear time trends and a rich set
夽夽 The authors declare that they have no relevant or material financial interests of individual- and state-level covariates.
that relate to the research described in this study. The study was approved by the We find that implementation of an MML leads to a relative 14
Emory University Institutional Review Board (IRB) through an expedited review
procedure.
percent increase in the probability of past-month marijuana use
∗ Corresponding author. Tel.: +1 4047911709. and a 15 percent increase in the probability of almost daily/daily
E-mail addresses: hwen2@emory.edu (H. Wen), jason.hockenberry@emory.edu marijuana use among adults aged 21 or above. For this age group,
(J.M. Hockenberry), jrcummi@emory.edu (J.R. Cummings). MML implementation also results in a 10 percent increase in the
1
11 states with pro-medical marijuana legislation include Alabama, Florida, Iowa,
probability of marijuana abuse/dependence. Among adolescents
Kentucky, Mississippi, Missouri, North Carolina, South Carolina, Tennessee, Utah,
and Wisconsin. and young adults aged 12–20, we find a 5 percent increase in
H. Wen et al. / Journal of Health Economics 42 (2015) 64–80 65
the probability of past-year marijuana use initiation attributable and access to marijuana for a select group of patients. In practice
to MML implementation. however, the laws may have a spillover effect on marijuana use in
In addition to the increases in marijuana use, implementation the non-patient population.
of an MML also increases the frequency of binge drinking among The spillover effect may arise from four dimensions of the
those aged 21 or above, partially through increasing simultaneous existing MMLs that create a de facto legalized environment for
use of the two substances. In contrast, MML implementation does marijuana use in the general population (Pacula et al., 2013). First,
not affect underage drinking among those aged 12–20. In both age although all MMLs specify a list of conditions that are eligible for
groups, non-medical use of prescription pain medication, heroin medical marijuana,4 most MMLs include in the list a generic term
use, and cocaine use are unaffected. “chronic pain”, rather than specific diseases causing the pain (e.g.,
Overall, our findings indicate that state implementation of an neuropathy, fibromyalgia, rheumatoid arthritis, etc.) (Pacula et al.,
MML increases marijuana use, but has limited impacts on other 2013). The interpretation of “chronic pain” can go far beyond the
types of substance use (i.e., underage drinking, pain medication original legislative intent, analogous to the practice of off-label
misuse, and hard drug use), except for binge drinking among adults prescribing of other medications. Because pain can often be non-
of legal drinking age. descript and difficult to verify clinically, a recreational user may
The article proceeds as follows. Section 1 provides background pretend to be a pain patient in order to obtain a prescription for
information on medical marijuana and MMLs, outlines the theoret- medical marijuana.
ical framework, and summarizes the existing literature. Section 2 Second, some MMLs do not require establishment of a reg-
describes the data sources, variable measurement, and identifica- istry/renewal system to assess and monitor patient eligibility for
tion strategy. Section 3 presents the estimated policy effects, and medical marijuana. This, coupled with the loosely-defined eligibil-
the robustness checks. Concluding remarks are given in the last ity criteria, further blurs the boundary between the patient and the
section of the article. non-patient population (Cohen, 2010).
Third, MMLs provide medical marijuana patients with access
1. Background to the drug by allowing licensed retail dispensaries and/or home
cultivation. These supply channels exist in a legal grey area and
1.1. Medical marijuana law and potential risks and medical value may proliferate as a result of the reduced threat of prosecution
of marijuana under the MMLs (Pacula et al., 2010). In particular, Anderson
et al. (2013) provided empirical evidence that MMLs have led to a
In the last two decades, growing evidence has lent support to substantial increase in the supply of high-grade marijuana. As mar-
the efficacy and safety of marijuana as medical therapy to alle- ijuana supply rises, it may become prohibitively expensive for law
viate symptoms and treat diseases (see, for instance, Ben Amar, enforcement to ensure that the entire supply of marijuana intended
2006; Campbell and Gowran, 2007; Krishnan et al., 2009; Pertwee, for medical purpose ends up in the hands of legitimate patients,
2012; Gloss and Vickrey, 2012). This growing body of clinical evi- akin to how prescription opioids eventually find their way into the
dence on marijuana’s medicinal value has propelled many states street drug market. This spillover to the non-patient population is
toward a more tolerant legal approach to medical marijuana. In likely to occur in places where marijuana possession is decriminal-
1996, California signed the Compassionate Use Act into law (Propo- ized, prosecution of a marijuana offense is local law enforcement’s
sition 215) and became the first state in the U.S. to permit the “lowest priority”, and federal interference in marijuana regulation
medical use of marijuana. And since then a total of 23 states and the is limited (Sekhon, 2009).
District of Columbia have passed MMLs. These laws are intended to In addition to those specific components of the law, an MML
protect patients from state prosecution for their medical marijuana as a whole symbolizes liberalization of marijuana policy, which
use (Hoffmann and Weber, 2010).2 in turn, may give rise to the underestimation of the risks associ-
Typically under an MML, a patient with an eligible condition ated with marijuana use and the normalization of marijuana use
should first obtain recommendation from a qualified doctor for for recreational purposes (Hathaway et al., 2011).
the use of marijuana in medical treatment. With the doctor’s rec-
ommendation for medical marijuana use, the patient can then be 1.2. Literature on the effect of MML on marijuana use in the
issued a medical marijuana patient identification card by the state. general population
The patient ID cardholder and his/her caregivers are allowed to
possess a certain amount of marijuana through cultivation at home Empirical evidence is inconclusive with respect to the effect of
and/or purchase from a nonprofit retail dispensary licensed by an MML on marijuana use in the general population. A review of
the state (in some states called “compassionate center”).3 As such, this line of literature is beyond the scope of our paper. We direct
MMLs in principle should only provide restricted legal protection readers to Chu (2014) for a comprehensive review. Briefly, how-
ever, we note that the mixed findings from the previous studies can
be explained by the heterogeneity between different age groups
2
In contrast to the state MMLs, federal law continues to prohibit marijuana use examined and the variation in specific state laws covered by the
for any purpose since the enactment of the Controlled Substances Act (CSA) of studies.
1970. A 2005 Supreme Court decision (Gonzales v. Raich) reaffirmed that federal Studies on youths generally find no significant effect of an MML
law enforcement has the authority to prosecute patients for medical marijuana use
on youth marijuana use (e.g., Harper et al., 2012; Lynne-Landsman
in accordance with state laws (Gostin, 2005). It is only recently that the Obama
administration and the Department of Justice clarified the position that federal
et al., 2013; Anderson et al., 2011, 2012). The most comprehensive
law enforcement resources should not be dedicated to prosecuting persons whose evidence comes from Anderson et al. (2011, 2012), which brings
actions comply with their states’ permission of medical marijuana (Hoffmann and
Weber, 2010). This change in the prosecutorial stance would strengthen the legiti-
macy of existing MMLs and pave the way for the passage of new MMLs.
3
Several more recent MMLs have taken innovative twists that are intended to doctors for certifying patients’ medical need, as a doctor can be charged with a
tighten the regulation on access to medical marijuana. For instance, New York’s felony for prescribing marijuana to an ineligible patient.
4
2014 MML is the first in the U.S. to allow doctors in qualified hospitals to prescribe California is the only exception that allows medical marijuana for any condition
medical marijuana instead of recommending it. By allowing for medical marijuana “for which marijuana provides relief” and leaves the interpretation almost entirely
prescription, the law in effect imposes more responsibility on the participating to the discretion of doctors.
66 H. Wen et al. / Journal of Health Economics 42 (2015) 64–80
together several commonly used data sets and covers an 18-year 1.3.1. Relationship between marijuana use and alcohol use
period from 1993 to 2011. The study findings suggest that imple- Marijuana and alcohol target many common neural pathways in
menting an MML does not lead to a significant increase in marijuana human brains (Maldonado et al., 2006). On the one hand, marijuana
use among youths. Compared to the literature on youth marijuana use produces rewarding and sedative effects that are comparable
use, the existing literature on the adult population is relatively thin to the effect of alcohol use (Boys et al., 2001; Heishman et al., 1997),
and limited in scope and rigor (e.g., Harper et al., 2012; Anderson especially low-dose alcohol consumption6 (King et al., 2011). In this
et al., 2011). case, when MML lowers the cost of marijuana use, an individual
In addition to the potential heterogeneity in the response to an may substitute marijuana for alcohol to achieve a similar expe-
MML across age groups, MMLs may not be treated as a homoge- rience such as a general sense of well-being, with perhaps fewer
nous set of laws between states and across time. The variation immediate negative physical symptoms (e.g. hangovers).
in specific states laws implemented during different periods may On the other hand, the overall intoxication experience may
help reconcile the mixed findings from the previous studies. To be enhanced by the simultaneous use of marijuana and alco-
explore this potential heterogeneity, a recent study by Pacula hol together. Evidence suggests that ethanol, especially when
et al. (2013) uses the same data sets as Anderson et al. (2011, consumed in high doses, can facilitate the absorption of delta 9-
2012) but replaces a single dichotomous MML indicator with a tetrahydrocannabinol (THC) (Boys et al., 2001). In a randomized
set of indicators that represent key provisions of MMLs. Although control trial (RCT) conducted by Lukas and Orozco (2001), partici-
none of the estimates using a dichotomous MML indicator are pants reported significantly more episodes and longer durations of
significant, the MMLs that include a provision requiring patient euphoria when consuming marijuana together with high doses of
registry/renewal are found to lower the marijuana use rates and alcohol. The enhanced euphoria following simultaneous consump-
marijuana-related treatment admissions. This protective effect of tion of alcohol and marijuana may subsequently lead to a greater
the patient registry/renewal requirement, however, is offset by urge to drink even more. Such a scenario points toward a competing
another provision of MMLs that allows licensed retailors to dis- hypothesis that marijuana and alcohol, especially high-dose alco-
pense marijuana to medical marijuana patients. The third MML hol consumption, are complements rather than substitutes. In this
provision this study examines, the home cultivation provision, has case, an MML may result in the increased use of both substances.
inconsistent and sometimes counterintuitive effects on marijuana The takeaway of these pharmacologic findings is that whether
use. These study findings are informative as to the importance of marijuana and alcohol are substitutes or complements may depend
distinguishing between MML provisions and recognizing the vari- on individual motives for substance use. For instance, those who
ation in state MMLs. A caveat, however, is that although Pacula only expect a mild feeling of happiness and relaxation from sub-
et al. (2013) take a more nuanced approach to the classification of stance use may consume one of the substances in place of the other.
MMLs, they lump youths and adults together in their full-sample In contrast, those seeking intense euphoria would consume the two
analysis. As a result, the aforementioned age heterogeneity may be substances together, perhaps in higher doses.
obscured.
1.3.2. Relationship between marijuana use and other substance
use
Marijuana is also widely portrayed as a “gateway” drug, essen-
1.3. Spillover from marijuana use to the use of alcohol and other
tially inducing the use of drugs with more serious health, legal and
substances
social consequences (Kandel, 1975, 2002). One hypothesized path-
way is through pharmacological mechanisms: once users tolerate
On top of the spillover of marijuana use from medical marijuana
the psychoactive effects of marijuana use, they may crave and seek
patients to the non-patient population, the potential interdepen-
out more powerful drugs with more intense and longer-lasting
dence of substance use may lead to a further spillover from
effects. This pharmacological mechanism would thus predict an
marijuana use to the use of other psychoactive substances.5 Assum-
increase in subsequent use of hard drugs such as heroin and cocaine
ing marijuana has a downward sloping demand curve, the effect
attributable to the implementation of an MML.
of an MML on marijuana use should be unequivocally positive.
An alternative to this pharmacological mechanism is that the
The effect on other substance use, however, can be positive or
observed sequence from marijuana use to hard drug use may sim-
negative, depending on the relative magnitude of the income
ply reflect common predisposing factors rooted in genes or in the
and substitution effects (Chaloupka and Laixuthai, 1997; Pacula,
environment coupled with an exposure opportunity mechanism
1998). Specifically, contemporaneous substitution of marijuana for
through which marijuana users may be introduced to a shared
another substance in response to the implementation of an MML
market or subculture of hard drugs (Morral et al., 2002; Wagner
is most likely to occur for substances that have pharmacologi-
and Anthony, 2002a). If predisposing factors and exposure oppor-
cal effects most similar to that of marijuana. A complementary
tunities are the primary mechanisms that lead users to transition
relationship, on the other hand, is most likely to occur between
from marijuana use to hard drug use, an MML should not result in
marijuana and another substance if their combined use produces a
an increase in hard drug use because the predisposing factors and
synergistic interaction (Moore, 2010). In addition to the contempo-
exposure opportunities7 for hard drug use remain unaffected.
raneous relationship between marijuana use and other substance
In contrast to the concern about MML’s “gateway” effect, there
use, there may also be a progression from the demand for mari-
has been evidence that increased access to medical marijuana
juana to the craving and thus future demand for a more powerful
resulting from an MML may benefit certain individuals by reducing
substance with more intense and longer-lasting effects (Kandel,
their opioid use. For instance, marijuana may provide analgesia for
1975, 2002).
6
High-dose alcohol consumption, in contrast, tends to lower sedation and
heighten stimulation (King et al., 2011).
5 7
However, if the increased marijuana use arising from an MML is not for recre- The existing MMLs help marijuana users gain access to the drug through medi-
ational purpose (i.e., “intoxication”) but for medical purpose only, the use of other cal marijuana dispensaries and home cultivation, which are unlikely to expose the
substances is unlikely to be affected. marijuana users to the market or subculture of hard drugs.
patients with chronic pain (Lynch and Campbell, 2011). Thus, those Within the context of MMLs, Anderson et al. (2013) provide evi-
who have already received opioid pain medication may experience dence that states with MMLs see a reduction in alcohol-related
improved pain relief and lower their opioid dose after they com- traffic fatalities, alcohol consumption and beer sales. However, the
mence medical marijuana treatment. In addition, those who would authors do not have data on changes in marijuana use, thus their
have otherwise initiated opioid analgesics may choose medical findings do not necessarily imply that marijuana is a substitute for
marijuana instead (Abrams et al., 2011). Furthermore, marijuana (or a complement to) alcohol. In fact, when taking into account
may also benefit those with opioid misuse (i.e., non-medical use) the key provisions of MMLs, the replication study by Pacula et al.
by easing withdrawal symptoms and facilitating recovery (Scavone (2013) suggests that the findings from traffic fatalities and alcohol
et al., 2013). Therefore, one would expect states with MMLs to consumption are more consistent with a complementarity hypoth-
see a reduction in prevalence of opioid use, or other downstream esis. Nonetheless, the authors are only able to assess two outcomes
benefits such as reduced overdose mortality (Bohnert et al., 2011; related to alcohol consumption, which limits the scope of their
Bachhuber et al., 2014). study.9
Another piece of evidence in the context of MMLs comes from
Bachhuber et al. (2014), which assesses the mortality rate related
1.4. Literature on the relationship between marijuana use and to opioid overdose. The authors find a 25 percent reduction in the
the use of alcohol and other substances annual rate of opioid overdose mortality between 1999 and 2010 in
states with MMLs compared to those without such laws. However,
Through increased marijuana use, a further consequence of an the unaccounted state heterogeneity in the underlying prevalence
MML could also be the spillover to alcohol use and the use of other of opioid use or trajectory of overdose deaths may also contribute
psychoactive substances. Identification of the spillover effect in to the reduced mortality rate. Therefore, the reduction in opioid
an observational study hinges on the isolation of the exogenous overdose mortality rate may not necessarily imply a substitution
variation in substance use arising from policy/price shocks from between marijuana and opioids.
the endogenous variation due to “common factors” or “exposure In sum, the majority of the literature on the relationship
opportunities.” between marijuana use and the use of alcohol and other substances
Previous studies have exploited changes in state excise taxes relies on policy/price shocks other than MMLs for identification.
on beer (Pacula, 1998), the minimum legal drinking age (MLDA) Evidence from this line of literature is inconsistent and may not
(DiNardo and Lemieux, 2001; Yörük and Yörük, 2011, 2013; Crost extrapolate to the effect of an MML. Existing literature in the con-
and Guerrero, 2012) composite market prices of alcohol (Saffer text of MML, however, is relatively thin and limited in scope and
and Chaloupka, 1999) and market prices of cocaine (Saffer and rigor.
Chaloupka, 1999; DeSimone and Farrelly, 2003) to tease out the
exogenous changes in the use of alcohol or cocaine as well as the
1.5. Significance of our study
downstream use of marijuana. Although they generally find a direct
policy/price effect on the use of the target substance itself (e.g., alco-
To inform the current debate on MMLs and marijuana lib-
hol and cocaine) that follows a downward sloping demand curve,
eralization policies in general, we examine the effect of state
the downstream effect on marijuana use is mixed. Chaloupka and
implementation of MMLs between 2004 and 2012 on marijuana
Laixuthai (1997), DiNardo and Lemieux (2001), Crost and Guerrero
use, alcohol use, pain medication misuse, and hard drug use in both
(2012), and Crost and Rees (2013) find evidence for a substitution
adolescent and adult populations. Our study advances the existing
between marijuana and alcohol. However, Pacula (1998), Saffer
literature by: (i) providing one of the first estimates of the effect of
and Chaloupka (1999), and Yörük and Yörük (2011) find evidence
MML implementation on adult marijuana use based on micro-level
supporting the complementarity hypothesis between marijuana
nationally-representative data, as well as the updated estimates for
and alcohol. Moreover, evidence from Saffer and Chaloupka (1999)
adolescent marijuana use based on the most recent data; (ii) esti-
and DeSimone and Farrelly (2003) suggests a complementarity
mating the effect of MML implementation on a variety of substance
between marijuana and cocaine.
use outcomes with differential elasticities and expected harms; (iii)
Not only is there a lack of consistent evidence, it is also dif-
estimating the contemporaneous relationship between marijuana
ficult to extrapolate the effect of an MML on the use of other
and alcohol and other substances within the context of MMLs; (iv)
substances from the estimated reduced-form effect of policy/price
estimating explicitly the heterogeneous policy effects of key MML
related to the other substances on the use of marijuana. This diffi-
provisions between different age groups.
culty arises out of the nature of the underlying Marshallian demand
function, which does not require symmetric relationships between
2. Methods
substances (i.e., from substance A to B vs. from substance B to A), nor
does it require symmetric responses to policy/price changes (i.e.,
2.1. Data sources
permissive policy/lower price vs. restrictive policy/higher price).
Thus it is possible for marijuana to be a substitute for alcohol when
We pooled nine years of cross-sectional data from a restricted-
alcohol regulations become more restrictive but for alcohol be a
access version of the National Survey on Drug Use and Health
complement to marijuana when marijuana policies become more
(NSDUH) 2004–2012 (CBHSQ, 2013). NSDUH is a nationally and
permissive.8
9
The first outcome in Pacula et al. (2013), any current alcohol use, may not carry
8
This asymmetric relationship between marijuana use and alcohol use may come as much weight as binge or heavy drinking in terms of health consequences and
into play in the context of the minimum legal drinking age (MLDA) vs. an MML: a policy implications, especially for adults of legal drinking age. The other outcome,
teenager under the MLDA cannot legally acquire either alcohol or marijuana and specialty alcohol abuse treatment admissions, may not show a clear picture of the
may resort to illegal supply channels, whereas an experienced marijuana user living alcohol abuse/dependence prevalence, since more than 90 percent of Americans
in a MML state can get both marijuana and alcohol with little effort. In essence, when who suffer from alcohol abuse/dependence do not receive any treatment for their
identifying the relationship between marijuana and alcohol, using different policies conditions. Furthermore, a large proportion of those receiving the treatment only
may capture the decisions made by different groups from different choices set. Thus, receive it in a self-help group (e.g., Alcoholics Anonymous) or in a primary care
the results from one policy may not applicable to another policy setting. setting as opposed to a specialty alcohol abuse treatment setting (SAMHSA, 2013).
state-representative10 survey sponsored by the Substance Abuse during the past month.15 We created the following measures for
and Mental Health Services Administration (SAMHSA), and the alcohol use: (i) the total amount of drinks consumed during the
primary source of information on substance use behavior by past month,16 (ii) the unconditional frequency of binge drinking
the U.S. civilian, noninstitutionalized11 population aged 12 or days, and (iii) the probability of being classified as having alcohol
above. The majority of the NSDUH interview is conducted by self- abuse/dependence during the past year according to the DSM-IV
administrated audio computer-assisted self-interviewing (ACASI), criteria. We also created two dichotomous indicators to assess: (iv)
a highly private and confidential mode that encourages honest whether a respondent engaged both in marijuana use and in binge
reporting of substance use and other sensitive behaviors (Johnson drinking during the past month, and (v) whether a respondent used
et al., 2010). The response rates range from 73 percent to 76 percent marijuana while drinking alcohol (i.e., on the same occasion) dur-
between 2004 and 2012. ing the past month.17 These two measure of simultaneous use of
marijuana and alcohol can provide further insight into the contem-
poraneous complementarity between the two substances.
2.2. Variable measurement
2.2.3. Other substance use outcomes

2.2.1. Marijuana use outcomes
In light of the previous evidence suggesting a substitution
We created five outcomes related to marijuana use: (i) a
between marijuana and opioids (Bachhuber et al., 2014) and
dichotomous indicator assessing whether a respondent used mar-
a complementarity between marijuana and cocaine (Saffer and
ijuana during the past month prior to the interview; (ii) another
Chaloupka, 1999; DeSimone and Farrelly, 2003), we focused our
dichotomous indicator assessing whether a respondent used mar-
analysis on non-medically used prescription pain medication,18
ijuana “almost daily or daily”, defined as more than 20 days of
heroin, and cocaine. NSDUH defines “non-medical use” as the
marijuana use during the past month; (iii) the number of mari-
intentional use of a medication without a prescription, in a way
juana use days among past-month marijuana users, which is an
other than as prescribed, or simply for the experience or feeling that
conditional frequency ranging from 1 to 3012 ; (iv) a dichotomous
it causes. NSDUH does not include questions about legitimate pain
indicator for using marijuana for the first time during the past
medication used according to the prescription. We created three
year13 ; and (v) a dichotomous indicator for being classified as abus-
dichotomous indicators for the probability of: (i) past-year non-
ing or being dependent on marijuana during the past year according
medical use of prescription pain medication, (ii) past-year heroin
to DSM-IV diagnostic criteria. The DSM-IV defines past-year sub-
use, and (iii) past-year cocaine use.
stance abuse/dependence as a maladaptive pattern of substance
use leading to clinically significant impairment and distress during
the past year. The impairment and distress related to substance use 2.2.4. MML-implementation indicator
can be manifested by symptoms such as tolerance, withdrawal, use The recent launch of the Data Portal system by the CBHSQ pro-
of a substance in a larger amount or over a longer period of time vides us with access to state identifiers and interview dates in
than intended, continued substance use in dangerous situations, a restricted-access version of the NSDUH micro-level data, thus
interference with major obligations, etc. (APA, 2000) (Appendix 1). enabling us to create a dichotomous indicator for the imple-
mentation of a MML in a given state during a given period. As
summarized in Table 1, between 2004 and 2012, MMLs came
2.2.2. Alcohol use outcomes into effect in ten states at various time points. We assigned the
Empirical evidence suggests that marijuana can be a substi- MML-implementation indicator a value of 1 for each full month
tute for and a complement of alcohol, depending on individual subsequent to the effective date of the laws, and a value of 0 for
motives of substance use and doses of consumption. Lower-dose the remaining periods and for the control states.19 Control states
alcohol consumption for mild happiness and relaxation is hypoth- include eight states that had an MML in place prior to 2004 (i.e.,
esized to be replaced by marijuana use (King et al., 2011), whereas “always MML states”) and those that did not have any MML by the
higher-dose alcohol consumption for intense euphoria is hypoth- end of 2012 (i.e., “no MML states”).20
esized to be accompanied by marijuana use (Lukas and Orozco,
2001). In this regard, we studied any alcohol use as well as binge
drinking.14 Binge drinking, in the NSDUH, is defined as having
15
A commonly used alternative defines “binge drinking” as five or more drinks
five or more drinks on the same occasion on at least one day
for men and four or more drinks for women consumed on one occasion (Wechsler
et al., 1995). Our estimates are robust to this gender-specific definition (not shown).
16
One drink refers to a can or a bottle of beer, a glass of wine or a wine cooler, a
shot of liquor, or a mixed drink with liquor in it.
10 17
The NSDUH sampling frame is state-based, with an independent, multistage The question about simultaneous use of marijuana and alcohol is not included
area probability sample within each state and the District of Columbia. The eight in the NSDUH 2004 and 2005 surveys, while the MMLs in Vermont and Montana
states with the largest population (i.e., California, Florida, Illinois, Michigan, New both came into effective in 2004. Thus we cannot estimate the effect of these two
York, Ohio, Pennsylvania, and Texas) have an annual sample size of about 3600 states’ implementation of the MMLs on this outcome.
18
each. For the remaining 42 states and the District of Columbia, each has a sample NSDUH attempts to capture all types of pain medication by including in its
size of about 900 annually. questionnaire a list of commonly prescribed and misused pain medications in
11
Institutionalized individuals (e.g. in jails/prisons or hospitals), homeless or tran- their generic names (e.g., Codeine, Oxycodone, Hydrocodone, Morphine, Hydromor-
sient persons not in shelters, and military personnel on active duty were excluded phone, Fentanyl, Tramadol, etc.), brand names (e.g., OxyContin, Vicodin, MSContin,
from the NSDUH sample. Dilaudid, Duragestic, Ultram, etc.) and street names, along with an open-ended
12
The majority of past-month marijuana users either use marijuana on a few occa- question about other pain medications.
19
sions or use it regularly, with a very small proportion of marijuana users between Note that most previous studies based on annual surveys were only able to
these two extremes. Therefore, we assessed both the average change in the fre- estimate year-on-year policy effect. In our study, we linked the NSDUH interview
quency of marijuana use days and the change in the right tail of the frequency dates with the MML effective dates and matched the month-to-month implemen-
distribution (i.e., almost daily/daily marijuana use). tation window of the MMLs with the behavior window of the NSDUH respondents.
13
Marijuana use initiation is examined in an “at-risk” sample, which excludes This approach minimizes the potential measurement error from misclassification
those who first tried marijuana more than a year prior to the interview thus no of pre-MML and post-MML behaviors.
20
longer at risk of initiating marijuana use during the preceding year. We also estimated two alternative model specifications: the first specification
14
Carpenter and Dobkin (2009), for instance, find evidence for the differential elas- classifies the “always MML states” as the control states, whereas the second specifi-
ticity of alcohol demand along the distribution of drinking intensity and frequency. cation classifies the “no MML states” as the control states. As shown in Appendix 2,
Table 1
Implementation and key provisions of state medical marijuana laws (MMLs).
Approved date Effective date Key statutory provisions
Non-specific pain Patient registry Retail dispensarya Home cultivation
2004–2012 (10 states)

Vermont 2004/05 2004/07 2007/07 2004/07 n/a 2004/07
Montana 2004/11 2004/11 2004/11 n/a n/a 2004/11
Rhode Island 2005/06 2006/01 2006/01 2006/01 2009/07 2006/01
New Mexico 2007/03 2007/07 n/a 2007/07 2007/07 2007/07
Michigan 2008/11 2008/12 2008/12 n/a n/a 2008/12
New Jersey 2010/01 2010/10b 2010/10 2010/10 2010/10 n/a
District of Columbia 2010/05 2010/07 n/a 2010/07 2010/07 n/a
Arizona 2010/11 2011/04 2011/04 2011/04 2011/04 2011/04
Delaware 2011/05 2011/07 2011/07 2011/07 2011/07 n/a
Connecticut 2012/05 2012/05c n/a 2012/05 n/a n/a
1996–2003 (8 states)
California 1996/11 1996/11 1996/11 n/a 1996/11 1996/11
Washington 1998/11 1998/11 1998/11 n/a n/a n/a
Oregon 1998/11 1998/12 1998/12 2007/01 n/a 1998/12
Alaska 1998/11 1999/03 1999/03 1999/03 n/a 1999/03
Maine 1999/11 1999/12 n/a 2009/12 2009/12 1999/12
Hawaii 2000/06 2000/12 2000/12 2000/12 n/a 2000/12
Colorado 2000/11 2001/06 2001/06 2001/06 2001/06 2001/06
Nevada 2000/11 2001/10 2001/10 2001/10 n/a 2001/10
Note:
Maryland passed two laws in 2003 and in 2011 favorable to medical marijuana, albeit not legalizing it.
a
Despite the allowance for retail medical marijuana dispensary under the laws, only four states actually opened their first dispensaries between 2004 and 2012, including
Colorado (2005/07), New Mexico (2009/06), Maine (2011/04), and New Jersey (2012/12).
b
The effective date of New Jersey MML is 2010/07 as specified in the statute, while the state governor Chris Christie delays its implementation.
c
Most sections of Connecticut MML came into effect from its passage (2012/05), while a few sections on 2012/10.
In addition to examining the effect of implementation of an One major policy change during the study period concerns state
MML as a whole, Pacula et al. (2013) recognize the importance of implementation of beer taxes.21 The other policy change is mari-
scrutinizing the potential heterogeneous effects between individ- juana decriminalization/depenalization: Massachusetts, California,
ual components of an MML. As highlighted in their study, four key and several cities and counties in other states relaxed penalties for
components that may be included in an MML and lead to hetero- recreational marijuana use or placed it “the lowest law enforce-
geneity in the policy effect are: (i) “non-specific pain” provision, ment priority.” We therefore created a dichotomous indicator for
which lists a generic “chronic pain” in the eligible conditions for the implementation of a decriminalization/depenalization policy
medical marijuana, rather than specifying diseases causing the in a given state during a given month.22 Table 2 provides descrip-
pain; (ii) “patient registry” provision, which requires a patient tive summary for the individual-level and state-level covariates
registry/renewal system; (iii) “retail dispensary” provision, which discussed above.
allows licensed marijuana retailors to dispense marijuana legally
to medical marijuana patients; and (iv) “home cultivation” pro- 2.3. Identification strategy
vision, which allows qualified patients and caregivers to grow a
certain amount of marijuana plants indoors for the patients’ own To identify the effect of MML implementation on individual
medical use. Accordingly, we created four indicators each rep- marijuana use, alcohol use, pain medication misuse, and hard drug
resenting the inclusion of a key MML provision. Note that for use, we estimated the following two-way fixed effects models:
an MML state, the inclusion date of a MML provision may differ
from the effective date of the MML, as the state may include the Y ist = ˇ0 + ˇ1 MML st + ˇ2 X 1ist + ˇ3 X 2st + s + t + s t + εist (1)
provision in the original statute, add it in a subsequent amend- where i denotes an individual, s denotes the state, and t denotes the
ment, or not include it in the law until the end of the study year. Yist represents the substance use outcomes. MMLst is the pol-
period. icy indicator for the implementation of an MML in a state s during
a year t. X1ist is the full vector of individual-level covariates. X2st is
2.2.5. Covariates the full vector of state-level covariates. The two-way fixed effects
We controlled for individual-level and state-level factors that are captured in our models by s and t to account for the time-
are correlated with both the individual choice to use substances invariant state heterogeneity as well as the national secular trend
and with state decisions about MMLs. Individual-level covariates
for adolescents and adults include a rich set of sociodemographic
characteristics. State-level covariates include three time-varying 21
We did not control for the market price of heroin or cocaine. The most com-
measures reflecting the fluctuation in state economic conditions: monly used source is the U.S. Drug Enforcement Administration’s System to Retrieve
(i) unemployment rate, (ii) average personal income, and (iii) Information from Drug Evidence (STRIDE) data set. Empirical studies often find that
STRIDE prices are not predictive or only weakly predictive of drug use (Horowitz,
median household income of the state, as well as two additional 2001). As French and Popovici (2011) pointed out, “part of difficulty here is that con-
measures reflecting relevant changes in state policy environment. ventional prices for illicit drug are not readily available and alternative measures
are not yet found.” Nonetheless, fluctuations in heroin prices and cocaine prices
are unlikely to be correlated with the MML implementation, thus omitting these
variables is unlikely to bias our results.
22
the estimated policy effects on the main outcomes are very similar across the mod- For lack of policy variations during the study period, the effect of a decriminal-
els. ization/depenalization policy itself cannot be precisely estimated.
Table 2
Descriptive summary of individual- and state-level covariates, sampling-weight adjusted.
Age 12–20 Age 21+
MML states No and always MML states MML states No and always MML states
Mean S.D. Mean S.D. Mean S.D. Mean S.D.
Panel A: individual-level covariates

# Age 16.0 (2.53) 16.0 (2.56) 48.0 (16.8) 47.5 (16.8)
% Male 51.4 (50.0) 51.2 (50.0) 48.1 (50.0) 48.0 (50.0)
Race/Ethnicity: Non-Hispanic White (ref.)
% Hispanic/Latino 18.8 (39.1) 19.1 (39.3) 13.1 (33.8) 13.3 (33.9)
% Non-Hispanic African Black 12.8 (33.4) 15.0 (35.7) 10.0 (29.8) 11.5 (31.9)
% Non-Hispanic Asian 3.46 (18.3) 4.37 (20.5) 3.63 (18.7) 4.56 (20.8)
% Other Origins 4.15 (19.9) 2.71 (16.2) 2.43 (15.4) 1.87 (13.6)
Self-Reported Health: Excellent (ref.)
% Very Good 42.6 (49.5) 41.5 (49.3) 36.5 (48.1) 35.9 (48.0)
% Good 21.0 (40.7) 21.4 (41.0) 27.1 (44.4) 27.7 (44.8)
% Fair/Poor 4.26 (20.2) 4.15 (19.9) 12.8 (33.5) 13.9 (34.6)
Cigarette Smoking: Non-Smoker (ref.)
% Non-Daily Smoker 11.3 (31.7) 11.2 (31.5) 9.01 (28.6) 8.75 (28.3)
% Daily Smoker 6.75 (25.1) 6.36 (24.4) 15.8 (36.4) 16.1 (36.7)
Health Insurance: Uninsured (ref.)
% Private Health Insurance 65.9 (47.4) 60.9 (48.8) 71.7 (45.0) 68.3 (46.5)
% Medicaid 22.9 (42.0) 24.5 (43.0) 8.59 (28.1) 8.14 (27.3)
% Other Health Insurance 3.15 (17.5) 4.02 (19.6) 7.82 (26.8) 8.84 (28.4)
Family Income: >200% FPL (ref.)
% Living 100–200% FPL 19.8 (39.9) 22.9 (42.0) 17.0 (37.6) 19.3 (39.5)
% Living <100% FPL 19.4 (39.6) 22.1 (41.5) 10.3 (30.4) 12.0 (32.5)
Urbanicity: Non-CBSA (ref.)
% Living in a Micropolitan 8.52 (27.9) 10.0 (30.0) 8.29 (27.6) 10.1 (30.0)
% Living in a Metropolitan 87.6 (33.0) 83.8 (36.8) 87.6 (33.0) 83.3 (37.3)
Marital Status: Married (ref.)
% Never Married 22.5 (41.8) 21.3 (41.0)
% Separated/Divorces 6.91 (25.4) 6.42 (24.5)
% Widowed 13.7 (34.3) 14.4 (35.1)
Education Attainment: College Graduate (ref.)
% Some College 24.9 (43.2) 25.5 (43.6)
% High School Graduate 29.3 (45.5) 29.9 (45.8)
% Less than High School 13.5 (34.2) 15.3 (36.0)
College Enrollment: Not Enrolled (ref.)
% Part-Time Enrolled 3.81 (19.1) 3.31 (17.9)
% Full-Time Enrolled 3.86 (19.3) 4.33 (20.4)
Employment: Full-Time Employed (ref.)
% Part-Time Employed 12.6 (33.2) 12.3 (32.8)
% Unemployed 4.49 (20.7) 4.15 (20.0)
% Not in Labor Force 29.0 (45.4) 29.2 (45.5)
Panel B: state-level covariates

% Unemployment Rate 7.36 (2.60) 6.84 (2.36) 7.36 (2.60) 6.86 (2.36)
$ Average Personal Income (10 K) 4.05 (0.90) 3.86 (0.54) 4.07 (0.90) 3.87 (0.54)
$ Median Household Income (10 K) 5.64 (0.89) 5.25 (0.67) 5.65 (0.90) 5.24 (0.68)
Beer Tax Rates
$ Specific Excise Tax (per gallon) 0.18 (0.07) 0.28 (0.25) 0.18 (0.07) 0.28 (0.25)
% Ad Valorem Tax (on-premises) 0.35 (1.84) 0.76 (3.05) 0.36 (1.88) 0.79 (3.10)
% Ad Valorem Tax (off-premises) 0.14 (1.06) 0.73 (2.97) 0.15 (1.15) 0.75 (3.01)
Number of observations ≈46,700 ≈222,800 ≈56,400 ≈267,500
and common shocks related to substance use. We also included We stratified the sample into two age groups, adolescents and
state-specific linear time trends s t to account for the unobserved young adults aged 12–20 (N ≈ 269,500) and adults aged 21 or above
state-level factors that evolve over time at a constant rate (e.g., (N ≈ 323,900). We chose age 21 as the cut-off point in light of the
social norms and public sentiments related to substance use). previous evidence of an age 21 discontinuity in both alcohol use
Standard errors were clustered at the state level to correct and marijuana use (Crost and Guerrero, 2012; Yörük and Yörük,
for the serial correlation. The clustered standard errors allow for 2011, 2013). We tested four cut-off points in our analyses, age 18,
arbitrary within-state correlation in error terms but assume inde- age 21, age 25 and age 30. Only the age 21 stratification, which
pendence across the states (Bertrand et al., 2004).23
of the NSDUH sampling design would suppress the state-clustering adjustment.

23
It is worth noting that NSDUH employs a multistage (stratified cluster) design When considering the choice between the two, Solon et al. (2013) noted that theo-
for the sample selection. The sampling design elements include survey weights, vari- retically “neither strictly dominates the other (in identifying the population average
ance estimation cluster replicates and variance estimation stratum. The descriptive effect)” (Solon et al., 2013, p. 21). Furthermore, in our study, the results from both the
statistics were adjusted for these survey design elements to make the analytic sam- unweighted, state-clustering adjusted models and the weighted, sampling-design
ple representative of the U.S. population. However in regression analysis, using the adjusted were similar (Appendix 3). Therefore, we report the unweighted, state-
STATA “svy” procedure to adjust for the weighting, clustering, and stratification clustering adjusted estimates.
also coincides with the legal drinking age, produces significant and least three years afterwards. Among adolescents and young adults
meaningful differences in the estimated policy effect between age aged 12–20, however, the corresponding trend in past-month mar-
groups. ijuana use rates is not consistent. Bear in mind that the relative
We estimated Probit regressions for the dichotomous depend- trends shown in Fig. 1 are equivalent to unadjusted DD estimates
ent variables in our study. The other three discrete dependent that only partial out the two-way fixed effects (i.e., time-invariant
variables we study (i.e., the conditional frequency of marijuana state heterogeneity and national secular trend in past-month mar-
use days, the number of alcohol drinks, and the unconditional fre- ijuana use), but do not adjust for the individual- and state-level
quency of binge drinking days) possess positive skewness and/or covariates or state-specific linear trends. Nonetheless, this observa-
“excess zeroes” compared to a standard normal distribution, which tional trend-comparison suggests a potential association between
requires a more flexible estimation approach than an ordinary least MML implementation and increased current marijuana use among
squares (OLS) estimation. A generalized linear model (GLM) with adults aged 21 or above, but not among adolescents and younger
a gamma distribution and log link24 was estimated for the total adults.
amount of drinks during the past month among those aged 21 or Table 3 presents the marginal effects of MML implementation
above. For the total amount of drinks among those aged 12–20, on the four marijuana use outcomes, adjusted for the two-way fixed
on the other hand, we estimated a two-part model using Pro- effects, the full vector of individual- and state-level covariates, and
bit in the first part and GLM (gamma distribution and log link) the state-specific linear trends. Among adults aged 21 or above,
in the second part. Because there is an explicit decision process the implementation of an MML increases the probability of using
regarding legality of alcohol consumption among those under 21, marijuana during the past month by 1.32 percentage points (Panel
we use the TPM to model the decision to engage in underage B, Column 1, Row 1). This percentage point change can be trans-
drinking and the quantity consumed conditional upon deciding lated into a 14 percent relative increase from a baseline predicted
to engage in underage drinking as separate processes. We fol- marijuana use probability of 9.33 percentage points.
lowed the same logic when estimating the frequency variables. The NSDUH data do not allow us to distinguish between medical
Considering the underlying decision processes and the propor- marijuana patients and the non-patient population. Nonetheless,
tions of zero values, we estimated a zero-truncated negative according to the registry data (Anderson et al., 2013), the number
binomial regression25 for the conditional frequency of marijuana of registered medical marijuana patients accounts for an average of
use days and a zero-inflated negative binomial regression26 for 0.8 percent of the population across the five MML states on which
the unconditional frequency of binge drinking days in both age the registry information is available. Therefore, the 1.3 percentage
groups. point increase in the probability of marijuana use we find among
For ease of interpretation, we converted the coefficient of MMLst adults aged 21 or above is not likely to come exclusively from
in each of the estimations to the average marginal effect calculated an increase in use among registered patients. Though we cannot
at MMLst = 0 and the observed values of other covariates. test this directly, it suggests that there may also be a considerable
spillover effect of MML implementation on recreational marijuana
3. Results use or self-medication by the non-patient population.
Among adults aged 21 or above, we also find a 0.58 percent point
3.1. Estimated effect of MML implementation on marijuana use or a 15 percent increase in the probability of almost daily/daily
marijuana use (Panel B, Column 2, Row 1) attributable to MML
Fig. 1 shows an upward trend in past-month marijuana use rates implementation. Among adolescents and young adults aged 12–20,
among adults aged 21 or above in parallel with the implementation in contrast, no change in the probability or frequency of past-month
of MMLs. A relative increase in adult marijuana use in MML states marijuana use can be attributed to MML implementation (Panel A,
emerges immediately after the laws take effect, and persists at Columns 1–3).
With regard to marijuana use initiation during the preceding
year, MML implementation leads to 0.32 percentage point or a
5 percent increase in the probability of first-time marijuana use
24
The selection of distribution family under the GLM was made based on the among adolescents and young adults aged 12–20 (Panel A, Col-
modified Park test results.
25
The likelihood ratio test for overdispersion rejects a Poisson distribution in favor
umn 4, Row 1). Yet, the lack of a policy effect on the probability
of a binomial distribution. and frequency of past-month marijuana use among this age group
26
The likelihood ratio tests for overdispersion reject a Poisson distribution in favor suggests that many of these young people may be engaging in
of a binomial distribution. Furthermore, the Vuong tests for zero-inflation confirm experimental use with relatively low health, behavioral, and social
our choice of a zero-inflated model instead of an ordinary negative binomial model.
consequences. In other words, these findings are consistent with a
The zero-inflated Poisson/negative binomial model assumes that the sample con-
sists of two distinct groups of people: one group whose counts are generated by the scenario in which adolescents and young adults aged 12–20 who
standard Poisson/negative binomial model, and the other group, so-called “abso- experiment with marijuana use in response to an MML are not
lute zero” group, who have zero probability of a count greater than zero; observed transitioning to regular use, at least in the short term.
zeroes can come from either group (Greene, 2011; Wang, 2003). The absolute zero In contrast to the findings among adolescents and younger
group, in our case, may be those who abstain from alcohol for religious, cultural,
familial or other reasons. Thus, this group of people, as distinct from the majority
adults, we find no change in marijuana use initiation among those
of people who drink alcohol at least occasionally, have “absolute zero” risk of binge aged 21 or above (Panel B, Column 4) as a result of MML imple-
drinking. mentation, despite the aforementioned significant increases in any
An alternative to a zero-inflated regression is a hurdle model (i.e., a TPM for counts) past-month marijuana use and almost daily/daily use (Panel B,
with first-part Probit and second-part zero-truncated negative binomial. A practical
Columns 1 and 2). These findings suggest that the adults who
challenge, however, is that cluster-adjusted standard errors are difficult to compute
when combining the first- and second-part estimates from a hurdle model (Belotti respond to an MML by increasing current and regular use come
et al., 2014). Nonetheless, the point estimates for the combined effects we obtained largely from those who first tried marijuana long before its medi-
from the hurdle models (not shown) were very similar to the zero-inflated negative cal use was permitted. After the introduction of an MML that helped
binomial estimates from our main analyses. In another set of sensitivity analyses, we reduce costs of marijuana use (i.e., market prices as well as non-
also treated the count variables as continuous and estimated the combined marginal
effects and their cluster-adjusted standard errors using the STATA command “TPM”
market health, legal and social consequences), those with prior
(Belotti et al., 2014). The TPM estimates (not shown) were slightly larger and more marijuana use experience would likely reinitiate or increase their
significant than the zero-inflated negative binomial estimates. marijuana use.
Fig. 1. Pre- and post-trend in past-month marijuana use rates in medical marijuana law (MML) states relative to the control states. Note: The differences in past-month
marijuana use rate are equivalent to unadjusted difference-in-differences (DD) estimates that partial out the two-way fixed effects, but not adjust for individual- and state-
level covariates or state-specific linear trends. The time 0 is centered at the period when each medical marijuana law (MML) state started to implement its law, so the time
1 represents the first full month subsequent to the effective date of an MML. We calculate the differences between each of the MML state and the control states during each
month, and average them across all 10 MML states and over a 3-month period (to smooth the fluctuations in the monthly rate). Whiskers indicate 95% confidence intervals.
3.2. Estimated effect of MML implementation on alcohol use Among adolescents and young adults aged 12–20, we find no
significant change in any measure of alcohol use (Panel A), which
To the extent that alcohol is a complement or substitute to mar- suggests that the increased marijuana use initiation we reported
ijuana, the effect of MML implementation on marijuana use may previously is unlikely to spread to underage drinking.
spread to alcohol use (Table 4). Our estimates indicate that, among
adults aged 21 or above, MML implementation is not associated 3.3. Immediate and delayed effect of MML implementation on
with the total number of drinks (Panel B, Column 1), but positively other downstream outcomes
associated with the frequency of binge drinking. Our estimates
indicate an effect size of 0.16 more binge drinking days or a relative In addition to marijuana use and binge drinking, MML
increase of 10 percent (Panel B, Column 2, Row 1). The spillover implementation may have a spillover effect on marijuana
increase in binge drinking implies a complementary relationship abuse/dependence, alcohol abuse/dependence, non-medical use of
between marijuana use and high-dose alcohol consumption among prescription pain medication, and the use of hard drugs such as
adults aged 21 or above. Not only is this contemporaneous com- heroin and cocaine. The progression from marijuana use and binge
plementarity reflected in the independent measures of marijuana drinking to these downstream outcomes may be a gradual transi-
use and binge drinking, it is further confirmed by the measure of tion (Wagner and Anthony, 2002b). As such, we estimated not only
simultaneous use of the two substances. Among adults aged 21 or the contemporary policy effect but also the one-year and two-year
above, we find a 1.44 percentage point or a 22 percent increase in lagged policy effect (Table 5).
the probability of both marijuana use and binge drinking during The effect arguably most salient to the public health impli-
the past month (Panel B, Column 3, Row 1) and a 0.82 percent- cations of MMLs is the effect on marijuana abuse/dependence
age point or a 18 percent increase in the probability of marijuana among adults aged 21 or above. We found a delayed policy effect
use while drinking (i.e., in the same occasion) as a result of MML on increasing the probability of marijuana abuse/dependence by
implementation (Panel B, Column 4, Row 1). a relative 10 percent (Panel B, Column 1, Rows 2 and 3). The
Table 3
Estimated marginal effect of implementation and provisions of medical marijuana laws (MMLs) on marijuana use.
Marijuana use outcomes (1) (2) (3) (4)

% past-month % marijuana Cond. # marijuana % marijuana use
marijuana use daily/almost daily use use days initiation
Panel A: age 12–20

MML implementation −0.43 (0.48) −0.25 (0.17) −0.28 (0.45) 0.32** (0.16)
MML provisions
∼ Non-specific pain −0.05 (0.41) −0.46 (0.28) −0.74 (0.44) 0.43 (0.26)
∼ Patient registry −0.74 (0.63) −0.14 (0.27) 0.28 (0.36) 0.04 (0.23)
∼ Retail dispensary 0.89** (0.34) −0.20 (0.36) −0.46 (0.59) 0.45 (0.33)
∼ Home cultivation 0.12 (0.61) 0.43 (0.27) 0.93 (0.56) 0.18 (0.24)
Baseline predicted mean [10.68] [3.52] [12.29] [6.47]

Panel B: age 21+

MML Implementation 1.32** (0.58) 0.58** (0.26) 0.17 (0.64) 0.15 (0.23)
MML provisions
∼ Non-specific pain 1.56** (0.73) 0.86** (0.42) 0.28 (0.88) 0.31 (0.78)
∼ Patient registry −0.45 (0.73) −0.35 (0.52) 0.55 (0.76) −0.05 (0.44)
∼ Retail dispensary −0.12 (0.79) −0.09 (0.64) −0.67 (0.85) 0.07 (0.53)
∼ Home cultivation 0.55 (0.76) −0.10 (0.41) −0.48 (0.76) 0.02 (0.77)

Note:
Standard errors in parentheses are clustered at the state level.
Baseline predicted mean is calculated as the average of predicted probabilities/counts when setting MMLst to 0 and leaving the other covariates as the observed values.
*Significant at the 10 percent level.
**
Significant at the 5 percent level.
***Significant at the 1 percent level.
Table 4
Estimated marginal effect of implementation and provisions of medical marijuana laws (MMLs) on alcohol use.
Alcohol use outcomes (1) (2) (3) (4)

# past-month total # binge drinking % marijuana use % marijuana use
alcohol drinks days and binge drinking while drinking

MML implementation −0.03 (1.74) 0.04 (0.18) −0.63 (0.39) −0.38 (0.49)
MML provisions
∼ Non-specific pain −0.54 (2.86) 0.03 (0.03) −0.59 (0.40) 0.07 (0.67)
∼ Patient registry −0.53 (2.51) −0.05 (0.04) −0.39 (0.80) −0.42 (0.63)
∼ Retail dispensary 0.65 (2.04) 0.06 (0.06) 0.54 (0.47) 0.87 (0.74)
∼ Home cultivation 0.52 (3.26) −0.01 (0.04) −0.03 (0.56) −0.65 (0.64)

Panel B: age 21+

MML implementation 0.95 (1.18) 0.16** (0.08) 1.44*** (0.35) 0.82* (0.45)
MML provisions
∼ Non-specific pain 0.65 (1.56) 0.19* (0.10) 1.03** (0.49) 1.23** (0.59)
∼ Patient registry 0.27 (1.22) −0.17 (0.11) −0.10 (0.57) −0.33 (0.55)
∼ Retail dispensary 0.47 (1.18) 0.07 (0.06) 0.21 (0.58) −0.03 (0.66)
∼ Home cultivation 0.18 (1.52) 0.13 (0.09) 0.20 (0.62) −0.57 (0.74)

Note:
Baseline predicted mean is calculated as the average of predicted probabilities/counts when setting MMLst to 0 and leaving the other covariates as the observed values.
*
**
***
increase in marijuana abuse/dependence of such magnitude is of 3.4. Policy heterogeneity between key MML provisions
concern. It suggests that those who used marijuana in response
to MML implementation are at high risk of progressing to abuse/ Our main estimates, in essence, capture the average policy effect
dependence. across all ten MMLs implemented between 2004 and 2012. How-
For both age groups, we found neither an immediate nor a ever, the policy effect of each of these laws may not necessarily
delayed effect of MML implementation on other downstream out- have the same magnitude or even the same direction. As noted by
comes including alcohol abuse/dependence, non-medical use of Pacula et al. (2013), four key MML provisions, namely the ambiguity
prescription pain medication, heroin use and cocaine use. in “non-specific pain”, the requirement for patient registry/renewal
Table 5
Estimated immediate and delayed marginal effect of implementation of medical marijuana laws (MMLs) on marijuana abuse/dependence, alcohol abuse/dependence,
prescription pain medication misuse, cocaine use, and heroin use.
Downstream outcomes (1) (2) (3) (4) (5)

% marijuana % alcohol % prescription % cocaine use % heroin use
abuse/dependence abuse/dependence painkiller misuse

MML contemporary −0.07 (0.34) −0.12 (0.52) −0.05 (0.22) 0.03 (0.14) 0.008 (0.06)
MML lags
∼ 1-Year lag −0.10 (0.24) −0.22 (0.54) 0.03 (0.35) 0.01 (0.23) −0.01 (0.07)
∼ 2-Year lag 0.03 (0.27) −0.28 (0.34) −0.08 (0.42) −0.01 (0.18) −0.05 (0.11)
Baseline predicted mean [4.59] [7.77] [8.26] [2.41] [0.26]

Number of observations ≈269,500 ≈269,500 ≈269,500 ≈269,500 ≈269,500
Panel B: age 21+

MML contemporary 0.19 (0.13) 0.65 (0.44) −0.02 (0.39) 0.06 (0.18) 0.01 (0.11)
MML lags
∼ 1-Year lag 0.25** (0.11) 0.37 (0.35) −0.05 (0.21) −0.11 (0.20) 0.007 (0.08)
∼ 2-Year lag 0.23* (0.12) 0.25 (0.55) −0.09 (0.14) −0.09 (0.21) 0.005 (0.09)
Baseline predicted mean [2.30] [10.87] [6.65] [3.28] [0.32]

Number of observations ≈323,900 ≈323,900 ≈323,900 ≈323,900 ≈323,900
Note:
Baseline predicted means in square brackets are calculated as the average of predicted probabilities/counts when setting MMLst to 0 and leaving the other covariates as the
observed values.
*
**
system, the allowance for retail dispensaries, and the permission 3.5. Policy endogeneity of MML adoption
for home cultivation, may have different implications for peo-
ple’s marijuana use behavior. Specifically, the “patient registry” There is a geographic concentration of MMLs states that have
provision may in effect reduce marijuana use in the general pop- adopted MMLs are all in the West and Northeast. This geographic
ulation. This protective effect of the “patient registry” provision, similarity raises concern that there may be some past disturbances
however, can be offset by the effect of “retail dispensary” provi- in marijuana use in these regions leading to their adoption of
sion which increases marijuana use significantly. In contrast to MMLs and not accounted for by the state fixed effects and the
Pacula et al. (2013), our study finds no consistent protective or state-specific linear trends. In other words, MML adoption may
offsetting effect in either provision (Tables 3 and 4, Panels A and be endogenous to marijuana use. To check for this potential pol-
B, Rows 2–4). A plausible explanation is the discrepancy between icy endogeneity, specifications with a series of lagged and leading
the time when a “patient registry” provision or a “retail dispen- indicators for adopting an MML were estimated for the probabil-
sary” provision was included a state’s MML and the time when the ity of past-month marijuana use (Table 6). We find that only the
state’s registry/renewal system or its legal dispensaries actually contemporary and 6-month lagged policy indicators had signifi-
began to operate (Anderson and Rees, 2014). Due to the contro- cant effects, and the indicators for approved but not implemented
versy and complexity surrounding its implementation, the time lag MMLs and the 12-month policy lag had moderate albeit imprecisely
between the effective date of a “retail dispensary” provision and estimated effects. All the leads had small and statistically insignif-
the actual opening of the first medical marijuana store may be par- icant effects (Panel B, Column 2). These estimates suggest that it
ticularly long.27 Although we find no consistent effect of “patient is in fact the policy shock from adopting an MML that drives the
registry” or “retail dispensary”, we observe a consistent and sig- changes in marijuana use, rather than some past disturbances in
nificant effect of the “non-specific pain” provision on increasing marijuana use that drive the adoption of an MML.
marijuana use, binge drinking and simultaneous use of marijuana
and alcohol among adults aged 21 or above. The observed effect
3.6. State-aggregate effect of MML implementation
of “non-specific pain” provision suggests that including a generic
term “chronic pain” in the eligible conditions for medical marijuana
To further check the robustness of our individual-level esti-
may extend the patient base to adults with less severe conditions
mates with regard to serial correlation, we aggregated the data to
or possibly those who pretend to be pain patients. Nonetheless,
the state level and estimated the effect of MML implementation on
considering the limited policy variations across the four MML pro-
state-level prevalence rates of our main individual-level findings.29
visions during our study period, the estimated individual effects of
these provisions should be interpreted with caution.28
policy effect between states by replacing the single indicator for MML implementa-
tion with ten separate indicators for MML implementation in each of the MML states.
27
Anderson and Rees (2014) pointed out that, for instance, Colorado included We find, in most cases, across-the-board significant policy effects in the same direc-
a “retail dispensary” provision in its original MML effective in 2001, but medical tion, albeit with varied effect sizes (Appendix 4). We cannot come to a conclusion,
marijuana dispensaries did not become commonplace until 2009. Moreover, Maine therefore, as to whether the heterogeneous policy effect comes from states’ unique
and Rhode Island added “retail dispensary” provisions to their MMLs in 2009, but experiences with implementing the MMLs or their inclusion/exclusion of certain
the first legal dispensary in Maine did not open until 2011 and the first Rhode Island provisions.
29
dispensary did not open until 2013. In Columns 1 and 3 of Table 7, we clustered the standard errors at the state
28
From a statistical standpoint, a substantial policy effect from one or two states level; while in Columns 2 and 4, we removed the time-series information from the
could potentially account for the overall findings. We tested for the heterogeneous standard errors by averaging the pre-MML data and the post-MML data (Donald and
Table 6
Robustness check for policy endogeneity by including policy leads and lags.
(1) (2)
% past-month marijuana use % past-month marijuana use

MML contemporary −0.43 (0.48) −0.81 (0.63)
MML leads and lags
∼ 24-Month lead (before approval) 0.27 (0.37)
∼ 18-Month lead −0.25 (0.40)
∼ 12-Month lead 0.14 (0.93)
∼ 6-Month lead 0.57 (0.82)
∼ Approved NOT implemented 0.62 (0.69)
∼ 6-Month lag (after implementation) −0.04 (0.34)
∼ 12-Month lag −0.44 (0.68)
∼ 18-Month lag −0.30 (0.64)
∼ 24-Month lag 0.47 (0.54)
Baseline predicted mean [10.68] [10.68]

Number of observations ≈269,500 ≈269,500
Panel B: age 21+

MML contemporary 1.32** (0.58) 1.02** (0.46)
MML leads and lags
∼ 24-Month lead (before approval) 0.20 (0.71)
∼ 18-Month lead −0.36 (0.64)
∼ 12-Month lead 0.18 (0.41)
∼ 6-Month lead 0.24 (0.55)
∼ Approved NOT implemented 0.52 (0.39)
∼ 6-Month lag (after implementation) 0.73* (0.38)
∼ 12-Month lag 0.41 (0.34)
∼ 18-Month lag 0.04 (0.47)
∼ 24-Month lag 0.11 (0.64)
Baseline predicted mean [9.33] [9.33]

Number of observations ≈323,900 ≈323,900
Note:
Baseline predicted mean in square brackets is calculated as the average of predicted probabilities/counts when setting MMLst to 0 and leaving the other covariates as the
observed values.
*
**
The previously highlighted policy effects on youth marijuana use increases in any marijuana use and regular use come from those
initiation, as well as on adult past-month marijuana use, marijuana who use the drug for legitimate medical purposes, there may still
almost daily/daily use, marijuana abuse/dependence, past-month be possibility that marijuana abuse/dependence would increase as
binge drinking, and simultaneous use of marijuana and alcohol a result of MML implementation. The effect of MML implementa-
remain significant with similar effect size in these state-level esti- tion on marijuana abuse/dependence constitutes a potential public
mates (Table 7). health concern similar to that of prescription drug abuse epidemic
in the U.S. (CDC, 2012).
4. Discussion Second, among those aged 21 or above, we find a spillover
effect of MML implementation on the increasing frequency of binge
Three main pieces of evidence from our study inform the pol- drinking, possibly through increased use of the two substances
icy discussions of MMLs. First, we find a significant effect of MML simultaneously. The complementarity between marijuana use and
implementation on increasing marijuana use. Estimates suggest binge drinking among adults of legal drinking age could magnify
that the populations responsive to MMLs are adolescents and young the expected harms of an MML. As Pacula and Sevigny (2014) com-
adults aged 12–20 who experimented with marijuana for the first mented, “even if consumption (of marijuana) were assumed to rise
time and adults aged 21 or above who tried marijuana prior to by 100 percent, the savings of liberalization policies would dwarf
the introduction of the law. This latter group also has an increased the known health costs associated with using marijuana. However,
risk of progression to almost daily/daily marijuana use and mari- all potential savings . . . could be entirely erased, and tremendous
juana abuse/dependence.30 We caution that even if we assume the losses incurred, if alcohol and marijuana turn out to be economic
complements.” The 10 percent increase in the frequency of binge
drinking and the 18–22 percent increase in the probability of
Lang, 2007). We followed a two-step procedure described in Bertrand et al. (2004,

p. 267) to accommodate staggered adoption of the MMLs across states. As a result,
the data were collapsed into pre- and post-MML two periods across 7 MML states.
The standard errors were adjusted to take into account the smaller number of MML
states (Donald and Lang, 2007). abuse and marijuana dependence to be valid psychiatric disorders, and marijuana
30
A diagnosis of substance abuse/dependence, by definition, indicates that an abuse/dependence as experienced in clinical population and general population
individual is experiencing a cluster of psychological, physical, cognitive, and behav- appears very similar to other substance abuse/dependence disorders (Budney et al.,
ioral symptoms associated with substance use. The DSM-IV considers marijuana 2007).
Table 7
Robustness check for serial correlation by examining state-aggregated data.
State-aggregated rates Age 12–20 Age 21+
(1) (2) (3) (4)

State-cluster 2-Period panelsa State-cluster 2-Period panelsa
Marijuana use outcomes

% past-month marijuana use −0.63 (0.65) −0.33 (0.50) 1.34** (0.52) 1.17** (0.56)
[11.81] [10.86] [9.40] [8.61]
% marijuana almost −0.22 (0.26) −0.10 (0.24) 0.56** (0.22) 0.51** (0.29)
daily/daily use [3.57] [3.30] [3.81] [3.45]
* **
% marijuana use initiation 0.28 (0.17) 0.29 (0.12) 0.14 (0.09) 0.11 (0.07)
[6.85] [6.28] [0.94] [0.86]
Alcohol use outcomes

# past-month drinks per capita 0.05 (0.79) −0.06 (0.74) 0.69 (0.65) 0.72 (0.82)
[8.28] [7.62] [19.02] [18.38]
** ***
# binge drinking days per capita 0.01 (0.03) 0.02 (0.05) 0.18 (0.08) 0.12 (0.04)
[0.72] [0.63] [1.54] [1.37]
% past-month marijuana use −0.54 (0.37) −0.65* (0.39) 1.22*** (0.38) 1.29*** (0.45)
and binge drinking [7.45] [6.65] [6.51] [5.92]
% marijuana use while −0.24 (0.43) −0.15 (0.25) 0.63** (0.31) 0.61* (0.35)
drinking [4.65] [4.12] [4.45] [3.52]
Other downstream outcomes

% marijuana abuse/dependence −0.26 (0.45) −0.16 (0.39) 0.35** (0.18) 0.41** (0.20)
(1-Year lag) [4.89] [4.61] [2.27] [2.15]
% marijuana abuse/dependence −0.15 (0.44) −0.02 (0.36) 0.34* (0.20) 0.26 (0.17)
(2-Year lag) [4.89] [4.60] [2.29] [2.16]
% alcohol abuse/dependence −0.10 (0.47) −0.04 (0.44) 0.67 (0.51) 0.49 (0.41)
(1-Year lag) [8.24] [8.11] [11.02] [10.73]
% alcohol abuse/dependence −0.26 (0.46) −0.28 (0.48) 0.21 (0.53) 0.28 (0.57)
(2-Year lag) [8.22] [8.10] [11.03] [10.74]
% prescription painkiller misuse 0.04 (0.50) −0.04 (0.17) −0.08 (0.36) −0.10 (0.26)
(1-Year lag) [8.75] [8.51] [6.76] [6.47]
% crack/cocaine use −0.01 (0.20) 0.01 (0.15) 0.02 (0.16) 0.01 (0.21)
(1-Year lag) [2.84] [2.58] [3.22] [2.96]
% heroin use −0.01 (0.09) −0.02 (0.08) −0.008 (0.11) 0.005 (0.10)
(1-Year lag) [2.84] [2.58] [3.22] [2.96]
Note:
observed values.
a
We average the pre-MML data and the post-MML data (Donald and Lang, 2007) following a two-step procedure described in Bertrand et al. (2004, p. 267). The second-step
equation is estimated based on pre- and post-MML two-period panels of 10 “MML states”. The standard errors are adjusted to take into account the small number of “MML
states” (Donald and Lang, 2007).
*
**
***
simultaneous marijuana and alcohol use31 that we estimated may Third, neither underage drinking among those aged 12–20 nor
result in considerable economic and social costs from downstream other substance use (i.e., non-medical use of prescription pain med-
health care expenditures and productivity loss (Naimi et al., 2003). ication, heroin use and cocaine use) in both age groups is affected
It is worth noting that this implied complementarity between by MML implementation. In this regard, the often-voiced concerns
marijuana use induced by an MML and binge drinking does not about the potential gateway effect of marijuana is not supported by
necessarily contradict a conclusion made by Anderson et al. (2011, our findings. We caution that our study is not intended to refute the
2012) that the implementation of an MML results in reduced traf- gateway hypothesis. Rather it suggests that the gateway effect is
fic fatalities, and that the reduction is more pronounced in those not likely to occur in the context of an MML: for those who respond
involving alcohol. A possible interpretation that may reconcile our to MML implementation and use marijuana, their marijuana use is
findings with theirs is that MML implementation may lead to a not likely to act as a gateway to more dangerous substance use
shift of alcohol consumption from public places such as restaurants through the pharmacological properties of marijuana.32 On the
and bars to one’s own home. Thus, we may see a reduction in the other hand, our findings do not lend support to an area of potential
traffic fatalities, even if the implementation of an MML, in effect, benefits of the law either, which is to benefit those who misuse
increases binge drinking and simultaneous use of both alcohol and opioid pain medication by helping them ease opiate withdrawal
marijuana. The reduced traffic fatalities may result from the fact
that those potential high-risk drivers are now more likely to stay
at home and less likely to engage in driving.
32
Nonetheless marijuana may still be a gateway drug for other marijuana users
through other pathways. For instance, those who use marijuana regardless of the
laws or those who use marijuana in response to decriminalization may progress to
31
The interaction between marijuana and alcohol may magnify the risks posed by hard drug use because marijuana introduces them to a shared market or subculture
the two substances individually (Liguori et al., 2002; Medina et al., 2007). of hard drugs.
symptoms and achieve success in early recovery. However, NSDUH Taken together, our study findings provide evidence for a sig-
only includes questions about “non-medical use” of pain medica- nificant effect of MML implementation on increasing marijuana
tion, so we cannot examine the effect of MML implementation on use, and a spillover effect among adults of legal drinking age from
patients who use pain medication according to the prescription. increased marijuana use to increased binge drinking. The findings
The previously documented beneficial effect of an MML on reduc- do not, however, provide evidence to support other types of sub-
ing opioid overdose mortality may primarily come from this group stance use spillovers such as underage drinking, pain medication
of legitimate pain patients.33 An MML may benefit these patients misuse, and hard drug use.
by allowing them to start with medical marijuana treatment in lieu
of opioid pain medication or to switch partially or entirely from Appendix 1. DSM-IV criteria for substance abuse and
opioids to marijuana. Whether and to what extent the legitimate substance dependence
pain patients may benefit from MML implementation merit further
investigation, but are beyond the scope of our study.
Substance dependence Substance abuse
A maladaptive pattern of substance use leading to clinically significant A maladaptive pattern of substance use leading to clinically significant
impairment or distress, as manifested by 3 or more of the following impairment or distress, as manifested by 1 or more of the following
occurring at any time in the same 12-month period: occurring at any time in the same 12-month period:
1. Tolerance or markedly increased amounts of the substance to 1. Recurrent substance use resulting in a failure to fulfill major role
achieve intoxication or desired effect or markedly diminished effect obligations at work, school, or home (e.g., repeated absences or poor
with continued use of the same amount of substance. work performance related to substance use; substance-related
2. Characteristic withdrawal symptoms or the use of certain absences, suspensions, or expulsions from school; neglect of children
substances to relieve or avoid withdrawal symptoms. or household).
3. Use of a substance in larger amounts or over a longer period than 2. Recurrent substance use in physically hazardous situations (e.g.,
was intended. driving an automobile or operating a machine when impaired by
4. Persistent desire or unsuccessful efforts to cut down or control substance use).
substance use. 3. Recurrent substance-related legal problems (e.g. arrests for
5. Involvement in chronic behavior to obtain or use the substance, or obtaining or using the substance, substance-related disorderly
recover from its effects. conduct).
6. Important social, occupational or recreational activities given up or 4. Continued substance use despite persistent or recurrent social and
reduced due to substance use. interpersonal problems caused or exacerbated by the substance (e.g.,
7. Continued substance use despite knowledge of a persistent or arguments with spouse about consequences of intoxication, physical
recurrent physical or psychological problem that is likely to have been fights).
caused or exacerbated by the substance.
33
More than 60 percent of the opioid pain medication users receive and take the
drug according to the prescription (Bachhuber et al., 2014).
Appendix 2. Robustness check for alternative classification of control states
Main outcomes Age 12–20 Age 21+
(1) (2) (3) (4) (5) (6)

MML states vs. ALL MML states vs. MML states vs. MML states vs. ALL MML states vs. MML states vs.
other states always MML states no MML states other states always MML states no MML states
% past-month −0.43 (0.48) −0.52 (0.48) −0.34 (0.43) 1.32** (0.58) 1.22** (0.60) 1.37** (0.59)
marijuana use [10.68] [12.75] [10.39] [9.33] [10.57] [8.90]
% marijuana almost −0.25 (0.17) −0.28 (0.18) −0.21 (0.15) 0.58 **
(0.26) 0.50 **
(0.24) 0.62 **
(0.26)
daily/daily use [3.52] [4.29] [3.40] [3.78] [4.43] [3.56]
Cond. # marijuana use −0.28 (0.45) −0.28 (0.48) −0.23 (0.47) 0.17 (0.64) 0.08 (0.61) 0.29 (0.65)
days [12.29] [12.37] [12.25] [14.15] [14.36] [14.02]
% marijuana use 0.32** (0.16) 0.34** (0.13) 0.31** (0.15) 0.15 (0.23) 0.18 (0.28) 0.15 (0.22)
initiation [6.47] [7.14] [6.32] [0.92] [1.14] [0.95]
# past-month total −0.03 (1.57) −0.06 (1.31) −0.01 (1.46) 0.95 (1.18) 0.90 (1.11) 0.99 (1.20)
drinks [7.76] [7.70] [7.83] [18.69] [18.62] [18.75]
# binge drinking days 0.04 (0.18) 0.01 (0.14) 0.05 (0.18) 0.16** (0.08) 0.14** (0.06) 0.16* (0.09)
[0.66] [0.67] [0.66] [1.52] [1.48] [1.53]
% past-month marijuana −0.63 (0.39) −0.73 (0.46) −0.57 (0.36) 1.44*** (0.35) 1.41*** (0.38) 1.50*** (0.37)
use and binge drinking [6.41] [7.49] [6.30] [6.44] [7.09] [6.21]
% marijuana use while −0.38 (0.49) −0.42 (0.61) −0.34 (0.47) 0.82* (0.45) 0.71 (0.53) 0.84* (0.46)
drinking [4.10] [5.03] [4.01] [4.45] [5.17] [4.24]
Number of observations ≈269,500 ≈86,700 ≈229,400 ≈323,900 ≈104,400 ≈275,800

Note:
observed values.
*
**
***
Appendix 3. Robustness check for sampling design-based adjustment
Main outcomes Age 12–20 Age 21+
(1) (2) (3) (4)

Unweighted state-cluster Weighted sampling-design Unweighted state-cluster Weighted sampling-design
% past-month marijuana use −0.43 (0.48) −0.47 (0.65) 1.32** (0.58) 1.19*** (0.40)
[10.68] [11.28] [9.33] [8.50]
% marijuana almost daily/daily −0.25 (0.17) −0.28 (0.25) 0.58** (0.26) 0.53*** (0.13)
use [3.52] [3.76] [3.78] [3.53]
Cond. # marijuana use days −0.28 (0.45) −0.38 (0.67) 0.17 (0.64) 0.19 (0.60)
[12.29] [12.44] [14.15] [14.49]
% marijuana use initiation 0.32** (0.16) 0.33** (0.11) 0.15 (0.23) 0.14 (0.17)
[6.47] [6.58] [0.92] [0.85]
# past-month total drinks −0.03 (1.57) −0.04 (2.32) 0.95 (1.18) 0.76 (0.87)
[7.76] [8.59] [18.69] [16.32]
# binge drinking days 0.04 (0.18) 0.04 (0.17) 0.16** (0.07) 0.15*** (0.04)
[0.66] [0.74] [1.52] [1.49]
% past-month marijuana use −0.63 (0.39) −0.68* (0.37) 1.44*** (0.35) 1.28*** (0.31)
and binge drinking [6.41] [6.86] [6.44] [5.74]
% marijuana use while drinking −0.38 (0.49) −0.43 (0.37) 0.82* (0.45) 0.69* (0.38)
[4.10] [4.44] [4.45] [3.73]

Note:
observed values.
*
**
***
Appendix 4. State heterogeneity in policy effect
Age 12–20 Age 21+
(1) (2) (3) (4) (5) (6)

% marijuana % past-month % marijuana % marijuana # binge % marijuana use
initiation marijuana use (almost)/daily abuse/dependence drinking days and binge drinking
MML implementation 0.32** (0.16) 1.32** (0.58) 0.58** (0.26) 0.25** (0.11) 0.16** (0.08) 1.44*** (0.35)
State MMLs
∼ Vermont 0.57 ***
(0.18) 1.76 ***
(0.21) 2.27 ***
(0.12) 1.19 ***
(0.12) 0.11 **
(0.03) 2.03***
(0.15)
∼ Montana 0.05 (0.20) 4.92*** (0.27) 1.26*** (0.15) 2.51*** (0.15) 0.47*** (0.05) 4.16*** (0.20)
∼ Rhode Island 0.61*** (0.16) −0.64 (0.37) −0.42* (0.23) 0.29** (0.13) 0.12*** (0.03) 0.25* (0.14)
∼ New Mexico −0.04 (0.23) −0.23 (0.17) −0.15 (0.16) 0.16 (0.11) 0.46*** (0.04) 0.08 (0.18)
∼ Michigan 0.43** (0.22) 2.37*** (0.34) 1.44*** (0.22) 0.17 (0.10) 0.22*** (0.06) 1.57*** (0.28)
∼ New Jersey 0.007 (0.19) 1.52** (0.23) 0.94** (0.15) −0.15 (0.11) −0.10 (0.07) 2.17*** (0.18)
∼ District of Columbia 0.33* (0.19) 0.59** (0.27) 0.87*** (0.21) 0.30** (0.13) 0.15** (0.06) 0.70*** (0.21)
∼ Arizona −0.02 (0.20) −0.23 (0.20) 0.47*** (0.10) −0.25 (0.16) −0.12 (0.07) 0.10 (0.12)
∼ Delaware 0.79*** (0.25) 0.39* (0.23) 0.06 (0.13) 0.21* (0.12) 0.27*** (0.04) 0.58** (0.24)
∼ Connecticut 0.21 (0.18) 1.27** (0.21) 0.85** (0.12) 0.59*** (0.11) 0.05 (0.03) 1.09*** (0.17)
Baseline predicted mean [6.47] [9.33] [3.78] [4.59] [1.52] [6.44]

Number of observations ≈323,900 ≈323,900 ≈323,900 ≈323,900 ≈323,900 ≈323,900
Note:
Baseline predicted means in square brackets are calculated as the average of predicted probabilities/counts when setting MMLst (VT) ∼ MMLst (CT) to 0 and leaving the other
covariates as the observed values.
*
**
***
References Centers for Disease Control and Prevention (CDC), 2012. CDC grand rounds: prescrip-
tion drug overdoses – a US epidemic. Morbidity and Mortality Weekly Report
Abrams, D.I., Couey, P., Shade, S.B., Kelly, M.E., Benowitz, N.L., 2011. 61 (1), 10.
Cannabinoid–opioid interaction in chronic pain. Clinical Pharmacology & Chu, Y.L., 2014. The effects of medical marijuana laws on illegal marijuana use.
Therapeutics 90 (6), 844–851. Journal of Health Economics 38, 43–61.
American Psychiatric Association (APA) (Ed.), 2000. Diagnostic and Statistical Man- Chaloupka, F.J., Laixuthai, A., 1997. Do youths substitute alcohol and marijuana?
ual of Mental Disorders: DSM-IV-TR. American Psychiatric Publishing, Arlington, Some econometric evidence. Eastern Economic Journal 23 (3), 253–276.
VA. Cohen, P.J., 2010. Medical marijuana 2010: it’s time to fix the regulatory vacuum.
Anderson, D.M., Hansen, B., Rees, D.I., 2011. Medical marijuana laws, traffic fatalities, The Journal of Law, Medicine and Ethics 38 (3), 654–666.
and alcohol consumption. Institute for the Study of Labor (IZA) Discussion Paper Crost, B., Guerrero, S., 2012. The effect of alcohol availability on marijuana use:
Series No. 6112. http://ftp.iza.org/dp6112.pdf (accessed 10.10.13). evidence from the minimum legal drinking age. Journal of Health Economics 31
Anderson, D.M., Hansen, B., Rees, D.I., 2012. Medical marijuana laws and teen mari- (1), 112–121.
juana use. Institute for the Study of Labor (IZA) Discussion Paper Series No. 6592. Crost, B., Rees, D.I., 2013. The minimum legal drinking age and marijuana use: new
http://ftp.iza.org/dp6112.pdf (accessed 10.10.13). estimates from the NLSY97. Journal of Health Economics 32 (2), 474–476.
Anderson, D.M., Hansen, B., Rees, D.I., 2013. Medical marijuana laws, traffic fatalities, DeSimone, J., Farrelly, M.C., 2003. Price and enforcement effects on cocaine and
and alcohol consumption. Journal of Law and Economics 56, 333–369. marijuana demand. Economic Inquiry 41 (1), 98–115.
Anderson, D.M., Rees, D.I., 2014. The role of dispensaries: the devil is in the details. DiNardo, J., Lemieux, T., 2001. Alcohol, marijuana, and American youth: the unin-
Journal of Policy Analysis and Management 33 (1), 235–240. tended consequences of government regulation. Journal of Health Economics
Bachhuber, M.A., Saloner, B., Cunningham, C.O., Barry, C.L., 2014. Medical cannabis 20 (6), 991–1010.
laws and opioid analgesic overdose mortality in the United States, 1999–2010. Donald, S.G., Lang, K., 2007. Inference with difference-in-differences and other panel
JAMA Internal Medicine 174 (10), 1668–1673. data. The Review of Economics and Statistics 89 (2), 221–233.
Belotti, F., Deb, P., Manning, W.G., Norton, E.C., 2014. TPM: estimating two-part Gloss, D., Vickrey, B., 2012. Cannabinoids for epilepsy. Cochrane Database Systematic
models. The Stata Journal, http://econ.hunter.cuny.edu/people/economics- Reviews 6.
faculty/pdeb/ihea-minicourse/tpm-estimating-two-part-models-working- French, M.T., Popovici, I., 2011. That instrument is lousy! In search of agreement
paper/at download/file (accessed 21.03.14; forthcoming). when using instrumental variables estimation in substance use research. Health
Ben Amar, M., 2006. Cannabinoids in medicine: a review of their therapeutic poten- Economics 20 (2), 127–146.
tial. Journal of Ethnopharmacology 105 (1), 1–25. Gostin, L.O., 2005. Medical marijuana, American federalism, and the Supreme Court.
Bertrand, M., Duflo, E., Mullainathan, S., 2004. How much should we trust The Journal of the American Medical Association 294 (7), 842–844.
differences-in-differences estimates? The Quarterly Journal of Economics 119 Greene, W.H., 2011. Econometric Analysis, 5th edition. Prentice Hall, Upper Saddle
(1), 249–275. River, NJ.
Bohnert, A.S.B., Valenstein, M., Bair, M.J., Ganoczy, D., McCarthy, J.F., Ilgen, M.A., Harper, S., Strumpf, E.C., Kaufman, J.S., 2012. Do medical marijuana laws increase
Blow, F.C., 2011. Association between opioid prescribing patterns and opioid marijuana use? Replication study and extension. Annals of Epidemiology 22 (3),
overdose-related deaths. The Journal of the American Medical Association 305 207–212.
(13), 1315–1321. Hathaway, A.D., Comeau, N.C., Erickson, P.G., 2011. Cannabis normalization and
Boys, A., Marsden, J., Strang, J., 2001. Understanding reasons for drug use amongst stigma: contemporary practices of moral regulation. Criminology and Criminal
young people: a functional perspective. Health Education Research 16 (4), Justice 11 (5), 451–469.
457–469. Heishman, S.J., Arasteh, K., Stitzer, M.L., 1997. Comparative effects of alcohol and
Budney, A.J., Roffman, R., Stephens, R.S., Walker, D., 2007. Marijuana dependence marijuana on mood, memory, and performance. Pharmacology Biochemistry
and its treatment. Addiction Science and Clinical Practice 4 (1), 4. and Behavior 58 (1), 93–101.
Campbell, V.A., Gowran, A., 2007. Alzheimer’s disease: taking the edge off with Hoffmann, D.E., Weber, E., 2010. Medical marijuana and the law. New England
cannabinoids? British Journal of Pharmacology 152 (5), 655–662. Journal of Medicine 362 (16), 1453–1457.
Carpenter, C., Dobkin, C., 2009. The effect of alcohol consumption on mortality: Horowitz, J.L., 2001. Should the DEA’s STRIDE data be used for economic analy-
regression discontinuity evidence from the minimum drinking age. American ses of markets for illegal drugs? Journal of American Statistical Association 96,
Economic Journal: Applied Economics 1 (1), 164. 1254–1271.
Center for Behavioral Health Statistics and Quality (CBHSQ), 2013. National Sur- Johnson, T.P., Fendrich, M., Mackesy-Amiti, M.E., 2010. Computer literacy and the
vey on Drug Use and Health, 2004–2011 [data files and code books]. U.S. accuracy of substance use reporting in an ACASI survey. Social Science Computer
Dept. of Health and Human Services (HHS), Substance Abuse and Mental Review 28 (4), 515–523.
Health Services Administration (SAMHSA), Center for Behavioral Health Statis- Kandel, D.B., 1975. Stages in adolescent involvement in drug use. Science 190 (4217),
tics and Quality (CBHSQ), Rockville, MD, https://www.datafiles.samhsa.gov, 912–914.
http://www.icpsr.umich.edu/icpsrweb/SAMHDA (accessed from 26.09.13 to Kandel, D.B. (Ed.), 2002. Stages and Pathways of Drug Involvement: Examining the
21.03.14). Gateway Hypothesis. Cambridge University Press, Cambridge, UK.
King, A.C., de Wit, H., McNamara, P.J., Cao, D., 2011. Rewarding, stimulant, and seda- Pertwee, R.G., 2012. Targeting the endocannabinoid system with cannabinoid
tive alcohol responses and relationship to future binge drinking. Archives of receptor agonists: pharmacological strategies and therapeutic possibilities.
General Psychiatry 68 (4), 389–399. Philosophical Transactions of the Royal Society B: Biological Sciences 367 (1607),
Krishnan, S., Cairns, R., Howard, R., 2009. Cannabinoids for the treatment of demen- 3353–3363.
tia. Cochrane Database Systematic Reviews 2. Saffer, H., Chaloupka, F., 1999. The demand for illicit drugs. Economic Inquiry 37 (3),
Liguori, A., Gatto, C.P., Jarrett, D.B., 2002. Separate and combined effects of marijuana 401–411.
and alcohol on mood, equilibrium and simulated driving. Psychopharmacology Scavone, J.L., Sterling, R.C., Van Bockstaele, E.J., 2013. Cannabinoid and opioid inter-
163 (3/4), 399–405. actions: implications for opiate dependence and withdrawal. Neuroscience 248,
Lukas, S.E., Orozco, S., 2001. Ethanol increases plasma delta-9-tetrahydrocannabinol 637–654.
(THC) levels and subjective effects after marihuana smoking in human volun- Sekhon, V., 2009. Highly uncertain times: an analysis of the executive branch’s
teers. Drug and Alcohol Dependence 64 (2), 143–149. decision to not investigate or prosecute individuals in compliance with
Lynch, M.E., Campbell, F., 2011. Cannabinoids for treatment of chronic non-cancer state medical marijuana laws. Hastings Constitutional Law Quarterly 37 (3),
pain: a systematic review of randomized trials. British Journal of Clinical Phar- 553–564.
macology 72 (5), 735–744. Solon, G., Haider, S.J., Wooldridge, J., 2013. What are we weighting for? National
Lynne-Landsman, S.D., Livingston, M.D., Wagenaar, A.C., 2013. Effects of state med- Bureau of Economic Research (NBER) Working Paper No. w18859.
ical marijuana laws on adolescent marijuana use. American Journal of Public Substance Abuse and Mental Health Services Administration (SAMHSA), 2013.
Health 103 (8), 1500–1506. Results from the 2012 National Survey on Drug Use and Health: Summary of
Maldonado, R., Valverde, O., Berrendero, F., 2006. Involvement of the endocannabi- National Findings. U.S. Dept. of Health and Human Services (HHS), Substance
noid system in drug addiction. Trends in Neurosciences 29 (4), 225–232. Abuse and Mental Health Services Administration (SAMHSA), Rockville,
Medina, K.L., Schweinsburg, A.D., Cohen-Zion, M., Nagel, B.J., Tapert, S.F., 2007. MD http://www.samhsa.gov/data/nsduh/2k11results/nsduhresults2011.htm
Effects of alcohol and combined marijuana and alcohol use during adolescence (accessed 10.10.13).
on hippocampal volume and asymmetry. Neurotoxicology and Teratology 29 Wagner, F.A., Anthony, J.C., 2002a. Into the world of illegal drug use: exposure oppor-
(1), 141–152. tunity and other mechanisms linking the use of alcohol, tobacco, marijuana, and
Moore, S.C., 2010. Substitution and complementarity in the face of alcohol-specific cocaine. American Journal of Epidemiology 155 (10), 918–925.
policy interventions. Alcohol and Alcoholism 45 (5), 403–408. Wagner, F.A., Anthony, J.C., 2002b. From first drug use to drug dependence: devel-
Morral, A.R., McCaffrey, D.F., Paddock, S.M., 2002. Reassessing the marijuana gate- opmental periods of risk for dependence upon marijuana, cocaine, and alcohol.
way effect. Addiction 97 (12), 1493–1504. Neuropsychopharmacology 26, 479–488.
Naimi, T.S., Brewer, R.D., Mokdad, A., Denny, C., Serdula, M.K., Marks, J.S., 2003. Binge Wang, P., 2003. A bivariate zero-inflated negative binomial regression model for
drinking among US adults. The Journal of the American Medical Association 289 count data with excess zeros. Economics Letters 78 (3), 373–378.
(1), 70–75. Wechsler, H., Dowdall, G.W., Davenport, A., Rimm, E.B., 1995. A gender-specific
Pacula, R.L., 1998. Does increasing the beer tax reduce marijuana consumption? measure of binge drinking among college students. American Journal of Public
Journal of Health Economics 17 (5), 557–585. Health 85 (7), 982–985.
Pacula, R.L., Kilmer, B., Grossman, M., Chaloupka, F.J., 2010. Risks and prices: the role Yörük, B.K., Yörük, C.E., 2011. The impact of minimum legal drinking age laws on
of user sanctions in marijuana markets. The BE Journal of Economic Analysis & alcohol consumption, smoking, and marijuana use: evidence from a regression
Policy 10 (1), 1–36. discontinuity design using exact date of birth. Journal of Health Economics 30
Pacula, R.L., Powell, D., Heaton, P., Sevigny, E.L., 2013. Assessing the effects of medical (4), 740–752.
marijuana laws on marijuana and alcohol use: the devil is in the details. National Yörük, B.K., Yörük, C.E., 2013. The impact of minimum legal drinking age laws on
Bureau of Economic Research (NBER) Working Paper No. w19302. alcohol consumption, smoking, and marijuana use revisited. Journal of Health
Pacula, R.L., Sevigny, E.L., 2014. Marijuana liberalization policies: why we can’t learn Economics 32 (2), 477–479.
much from policy still in motion. Journal of Policy Analysis and Management 33
(1), 212–221.

The interaction of direct and indirect risk selection

Normann Lorenz ∗
Universität Trier, Universitätsring 15, 54286 Trier, Germany
Article history: This paper analyzes the interaction of direct and indirect risk selection in health insurance markets. It is
Received 15 April 2014 shown that direct risk selection – using measures unrelated to the benefit package like selective adver-
Received in revised form 6 September 2014 tising or ‘losing’ applications of high risk individuals – nevertheless has an influence on the distortions of
Accepted 9 December 2014
the benefit package caused by indirect risk selection. Direct risk selection (DRS) may either increase or
Available online 17 December 2014
decrease these distortions, depending on the type of equilibrium (pooling or separating), the type of DRS
(positive or negative) and the type of cost for DRS (individual-specific or not). Regulators who succeed in
reducing DRS by, e.g., banning excessive advertising or implementing fines for ‘losing’ applications, may
I13
I18
therefore (unintendedly) mitigate or exacerbate the distortions of the benefit package caused by indirect
L13 risk selection. It is shown that the interaction of direct and indirect risk selection also alters the formula
for optimal risk adjustment.
Risk selection
Risk adjustment
Health insurance
Discrete choice
Imperfect competition
1. Introduction advertising or ‘losing’ applications of high risk individuals.3 It has

been shown that potential profits associated with successful DRS
Risk selection is considered to be one of the main problems in can be substantial.4
regulated health insurance markets. If there is community rating, With IRS, insurers do not know which particular individual is
so that insurers are not allowed to charge premiums according to of high or low risk (or are prevented from using this information),
risk, they will make profits with some individuals and losses with and so only act on their knowledge that there are different risk
others. Insurers who act on these incentives to attract profitable and types in the population. The measures taken to engage in IRS usually
repel unprofitable individuals are said to engage in risk selection.1 consist of distorting the benefit package, so that it is attractive for
Two forms of risk selection can be distinguished: direct risk low risks, but not for high risks. Several studies have shown that
selection (DRS) and indirect risk selection (IRS).2 With DRS, insur- the incentives for IRS can be severe, and that insurers do indeed act
ers know that a particular individual or group of individuals is on these incentives.5
characterized by non-average risk. DRS is therefore targeted at an A regulator can counteract the incentives for both DRS and IRS by
individual the insurer has identified to either be a high or a low implementing a risk adjustment scheme, setting transfers to (and
risk type (like, e.g., a hypochondriac) or at a group of individuals from) insurers depending on signals which are informative about
the insurer knows to have non-average expected cost (like, e.g., a individuals’ expected cost. In almost all risk adjustment schemes,
certain age group or individuals living in a high cost area). Usu- the formula used to calculate these transfers is based on a regres-
ally, the measures taken for DRS are not related to the benefit sion of actual health care expenditures on a set of explanatory
package (i.e. the medical services) offered; examples are selective variables like age, gender and morbidity. Most of the literature on
3
van de Ven and van Vliet (1992) provide an extensive list of measures insur-
∗ Tel.: +49 651 2012624. ers may use for risk selection; for differential treatment of low and high risks’
E-mail address: lorenzn@uni-trier.de applications see Bauhoff (2012).
1 4
See van de Ven and Ellis (2000). See Shen and Ellis (2002).
2 5
See Breyer et al. (2011). See Frank et al. (2000), Cao and McGuire (2003) and Ellis and McGuire (2007).
82 N. Lorenz / Journal of Health Economics 42 (2015) 81–89
risk adjustment has been concerned with improving this underly- Table 1
Effect of DRS with individual-specific cost on the distortion of the benefit package.
ing regression by, e.g., including additional variables or altering the
grouping algorithm for diagnoses in morbidity based risk adjust- Type of equilibrium Positive DRS Negative DRS
ment, so that a larger part of the variance of actual expenditures is Pooling equilibrium Distortion decreases Distortion increases
explained. The larger the explained part of the variance, the closer Separating equilibrium Distortion decreases Distortion decreases
transfers are to actual cost, and the lower the incentives for risk
selection should be.
Initiated by the very influential study of Glazer and McGuire high risks is reduced. Negative DRS on the other hand creates a ‘sub-
(2000), there has developed a small literature that departs from this stitution effect’: If insurers are (somewhat) successful in repelling
statistical approach and instead explicitly models insurers’ incen- the high risks by DRS, they can reduce the degree of IRS. For an
tives for risk selection. One study in this literature has shown that overview of these results see, Table 1.
conventional, i.e. regression-based, risk adjustment may decrease A regulator who succeeds in reducing negative DRS (by, e.g.,
welfare if there is imperfect competition, another, that it may even charging a fine for ‘losing’ applications of high risk individuals) will
increase the extent of risk selection.6 These undesirable effects of therefore simultaneously reduce IRS in the pooling equilibrium, but
conventional risk adjustment exemplify the need for what Glazer unintendedly increase the distortion of the benefit package in the
and McGuire (2000) have termed optimal risk adjustment. They separating equilibrium. The distortions caused by IRS will also be
have shown that a regulator can increase the effectiveness of a risk increased if he succeeds in reducing positive DRS (by, e.g., banning
adjustment scheme by distorting the payments as calculated from excessive advertising), in this case for both the pooling and the
a regression: there has to be overpayment for signals which are separating equilibrium.
correlated with high risks and underpayment for signals which are In three of the four cases, optimal risk adjustment then becomes
correlated with low risks. If the over- and underpayments are cho- even more important. We therefore derive the impact of DRS
sen optimally, incentives for IRS can be eliminated completely.7 on the formula for optimal risk adjustment developed by Glazer
Optimal risk adjustment has also been derived for a setting where and McGuire (2000). We show that the overpayment for a signal
individuals differ in their elasticity to switch insurers or where that indicates a high risk has to be increased exactly by insurers’
insurers are allowed to vary their premium in some dimension, expenditures on positive DRS; likewise, the underpayment has to
as is the case in the insurance exchanges in the US.8 be reduced by the expenditures on negative DRS. With this modi-
A concern, already raised by Glazer and McGuire (2000) them- fication, their formula can eliminate the incentives for IRS even in
selves, is that such over- and underpayments create incentives the presence of DRS.
for DRS regarding the signal, but so far it has not been analyzed In the literature on optimal risk adjustment, some of the results
whether this has an influence on optimal risk adjustment. In fact, regarding the distortions caused by IRS have been derived under
we are not aware of any theoretical study that explicitly models perfect competition, but DRS seems incompatible with such a
the interaction of direct and indirect risk selection, even in the setting where individuals are perfectly informed about all ben-
absence of risk adjustment.9 In this study we therefore develop efit packages and premiums and always choose the insurer that
such a model and show that in general (the degree of) DRS has offers the best benefit package-premium combination. We there-
an influence on the distortions of the benefit package caused by fore derive our results within a discrete choice model, which can
IRS and that this alters the formula for optimal risk adjustment. easily capture different levels of competition. To keep the model
DRS may either increase or decrease the distortions caused by simple, we assume that the benefit package is one-dimensional, but
IRS, depending on whether insurers try to attract the low risks the model can be extended to a multi-dimensional benefit package.
(positive DRS) or to repel the high risks (negative DRS), whether a Also, to simplify the notation when deriving the results, we first
pooling or a separating equilibrium emerges, and whether the cost consider the case of two risk types, but then show that the results
for DRS is individual-specific or not. also hold for an arbitrary number of risk types.
If insurers’ expenditures for DRS are at least to some degree The remainder of this paper is organized as follows: In Section
individual-specific (and not just a fixed cost), they affect risk-type- 2, we introduce the basic discrete choice model and show how DRS
specific cost: Positive DRS increases the cost per low risk, negative can be incorporated in such a model. We analyze the pooling equi-
DRS the cost per high risk. In the first case, the cost difference librium in Section 3 and the separating equilibrium in Section 4.
between the risk types is reduced, in the second, it is increased. Section 5 concludes.
In the pooling equilibrium where both risk types pay the same pre-
mium, positive DRS therefore reduces the incentives for IRS, while
negative DRS increases it. 2. The model
In the separating equilibrium, insurers’ expenditures for posi-
tive DRS translate into a higher premium for the contract offered 2.1. Basic model without DRS
for the low risks; this makes this contract less attractive for the high
risks, so the distortion of the benefit package necessary to repel the Individual preferences regarding the benefit-premium bundle
are given by
u = pr v(m) − R, (1)
6
See Lorenz (2013) and Brown et al. (2011), respectively. In the empirical part of where R denotes the premium and m the level of medical services
their study, Brown et al. (2011) find such an increase in the extent of risk selection (measured in monetary terms). pr is the probability of becoming
for the Medicare Advantage program in the U.S.; however, there has been some
ill, and there are two risk types r = H, L, with pH > pL ; the share of
disagreement on this finding, see Newhouse et al. (2012).
7
See also Glazer and McGuire (2002) and Jack (2006). the low risks is . The utility of receiving medical treatment, v(m),
8
For the first setting, see Bijlsma et al. (2011), and for the second, McGuire et al. is increasing at a decreasing rate, i.e. v (m) > 0 and v (m) < 0. The
(2013) and Shi (2013). efficient level of medical services is implicitly defined by v (mFB ) =
9
Eggleston (2000) derives the optimal mix of supply and demand side cost shar- 1.
ing for a setting with a single (semi-altruistic) HMO that can influence the level of
medical services (according to the outcome of a patient–provider bargaining pro-
There are n insurers j, each offering a benefit-premium bundle
cess) and can dump a share of the high risks at some cost; however, there is no {mj , Rj }. The individual’s decision of which insurer to choose may,
competition as there is only one provider. however, not only depend on these benefit-premium bundles, but
N. Lorenz / Journal of Health Economics 42 (2015) 81–89 83
also on some other factors, like perceived friendliness of personnel, 2.2. The model with positive and negative DRS
location, or which insurer was recommended by family or friends.
In a discrete choice model, these other factors are captured by aug- We consider positive DRS to be an activity each insurer is
menting the individual’s utility as given in (1) by an individual- and engaged in which generates some cost and increases the proba-
j
insurer-specific utility component εi ; the utility of an individual i bility of being chosen by the individual (or group of individuals)
(being of risk type r) when choosing an insurer j therefore is the activity is targeted at. We model this increase in the probability
of being chosen to stem from an increase in the utility the individ-
j
ui (mj , Rj ) = pr v(mj ) − Rj + εi . (2) ual receives, which may either be real (as, e.g., with a discount for
a fitness club membership) or just perceived (as with advertising).
j
If εi is assumed to be i.i.d. extreme value, the logit model with its We denote the cost by aj and the increase in utility by g(aj ), where
analytically tractable choice probabilities arises. Denoting risk type g(aj ) is increasing and concave.13 With positive DRS, the (perceived)
r’s utility of the benefit-premium bundle offered by insurer j by utility of individual i choosing an insurer j therefore is
j
j
Vr = pr v(mj ) − Rj , ui (mj , Rj ) = pr v(mj ) − Rj + g(aj ) + εi , (4)
j j so that insurer k’s market share among risk type r is given by14
and specifying the variance of εi as Var(εi ) = 2 2 /6, the probabil-
ity of individual i (being of risk type r) choosing a particular insurer
k k ))/
k is10 e(Vr +g(a
Prk = j +g(aj ))/
. (5)
j
e(Vr
Prob(i chooses k) = Prob(Vr k + εki > Vrl + εli ∀ l =
/ k)
Two cases regarding the cost aj can be distinguished:
k
eVr / non-individual-specific and individual-specific cost. With non-
= n j
. (3) individual-specific cost, total cost for DRS of an insurer j is
j=1
eVr /
independent of the number of individuals choosing this insurer.
The prime example for this case is selective advertising, where
Denote this probability by Prk ; it is also insurer k’s market share cost does not increase if an additional individual chooses insurer
among the individuals of risk type r. Prk is increasing in Vrk : a higher j. With individual-specific cost, total cost for DRS of an insurer
share of individuals of risk type r will choose insurer k, if this insurer j increases in the number of individuals choosing this insurer.
offers a higher level of medical services or charges a lower premium. One example here are additional benefits which the regulator (or
j
The variance of the additional utility component, Var(εi ) = society) considers not to be part of a ‘normal’ basic benefit package
2 2
/6, is a measure of the level of competition in this health insur- insurers are supposed to provide, like discounts for fitness club
j memberships or special counseling services. In this case, total
ance market. If is small, all the εi are very similar and therefore
only play a minor role in which insurer is chosen: Offering an only cost of DRS increases if an additional individual chooses insurer j.
somewhat higher utility level than all the other insurers will, in this It seems reasonable to assume that most risk selection activities
case, already attract a large share of all individuals; this implies a entail both non-individual-specific and individual-specific (i.e.
high level of competition. If, on the other hand, is large, the other fixed and variable) cost.
factors besides the benefit level and the premium – captured by Like positive DRS, we model negative DRS as an activity that
j generates some cost, but decreases the probability of being chosen
large positive and large negative εi – are rather important, so that
insurers, when increasing their premium (or reducing their benefit by a particular individual (or group of individuals). We denote the
level), only lose a small share of their insured; a large level of cost of negative DRS by bj and the utility decrease by f(bj ), where
therefore corresponds to a low level of competition. f(bj ) is increasing and concave. With negative DRS, the (perceived)
As shown by Lorenz (2013) who has analyzed this basic model utility of individual i and insurer k’s market share are as given in
without DRS, the level of competition determines which type of (4) and (5), but with +g(a) replaced by −f(b).
equilibrium emerges: If the level of competition is low (i.e., is Unlike with positive DRS, it is difficult to imagine some activity
large), there will be a pooling equilibrium: all insurers offer the where the cost an insurer incurs for negative DRS is independent of
same benefit-premium bundle, so each individual i chooses the the number of individuals choosing this insurer. ‘Negative adver-
j tising’ might be an example, where an insurer informs about some
insurer j for which his εi is maximal. If the level of competition
undesirable feature of its offer, like scrupulous utilization reviews,
is high (i.e., is small), a separating equilibrium similar (but not
but this and similar examples may seem rather far-fetched. We
identical) to the Rothschild–Stiglitz equilibrium under perfect com-
think it is more realistic to consider negative DRS to be an activity
petition arises.11 Some of the n insurers offer a benefit-premium
insurers are engaged in during the application process, so that cost
bundle designated for the low risks, the remaining insurers a con-
depends on the number of individuals applying at the insurer.15
tract designated for the high risks (with a larger benefit package
Activities which fall into this category are that insurers require
and a higher premium). All the low risks choose the insurer with
j additional (unnecessary) paper work or involve the high risk indi-
the highest εi among the first set of insurers, but only most of the
viduals in lengthy phone calls in which they try to convince (or
j
high risks choose the insurer with the highest εi among the second
set. Because a small share of the high risks chooses one of the insur-
ers offering the contract designated for the low risks, the separation
13
of risk types is not perfect.12 Since we are interested in a setting where insurers are engaged in DRS, we
assume lim g (a) → ∞ to guarantee an interior solution.
a→0+
14
To simplify the notation, we do not introduce different symbols for Pjk for the
case of no, positive or negative DRS; we will, however, always make clear to which
10
See Train (2009, p. 40). case we refer.
11 15
See Zweifel et al. (2009), chapter 7, for the Rothschild–Stiglitz equilibrium in In Section 5 we argue that the main effects should be similar if negative DRS
this setting. does not occur during the application process, but is targeted at individuals who
12
We explain why this occurs in Section 4.1. already hold a contract with the insurer.
urge) these individuals to choose a different insurer.16 In the work- Table 2

Shares rs of the four types; positive correlation of H and O for ı > 0.
ing paper version of this study we explicitly model the application
process and show that the main results are the same regardless of pL pH
whether the number of individuals applying at an insurer is equal Y LY = + ı HY = (1 − ) − ı
to or larger than the number of individuals eventually choosing O LO = (1 − ) − ı HO = (1 − )(1 − ) + ı 1−
the insurer.17 We will therefore refer to both cases as individual- 1−
specific cost.
3. The pooling equilibrium
In this section we analyze the pooling equilibrium which occurs

g (ak )(Lk − ak ) = n/(n − 1).19 The larger g (a), i.e., the more effec-
if the level of competition is low enough. We begin by briefly deriv-
tive insurers’ expenditures on risk selection are, the larger, c.p.,
ing the equilibrium without DRS in Section 3.1. We consider the
the equilibrium level of ak will be. These expenditures affect the
equilibrium with positive DRS in Section 3.2 and analyze the impact
distortion of the benefit level m, which is now given by
of positive DRS on risk adjustment in Section 3.3. We then show
that the results also hold for an arbitrary number of risk types in
Section 3.4. The effects of negative DRS, which are just the opposite 2

of positive DRS, are summarized in Section 3.5. (1 − )(pH − pL ) (1 − )(pH − pL ) k k
1− n mk + n a v (m ) = 1.
n−1
p n−1
p
3.1. The pooling equilibrium without DRS (10)
Normalizing the mass of individuals to one and assuming profit Because the last term in the brackets [·] is positive, compared with
maximization, the objective of insurer k is (8), the condition if there is no DRS, v (mk ) has to decrease: mk
max k = PLk Lk + (1 − )PHk H
k
, (6) increases, so the distortion is reduced. More generally, the larger
mk ,Rk the equilibrium level of ak , the larger mk .
where rk = Rk − pr mk denotes insurer k’s profit per individual of Result 1. In the pooling equilibrium, the distortion of the benefit
risk type r. The solution to this objective yields the following two level decreases in the level of positive DRS if cost for DRS is individual-
conditions (see Appendix A.1): First, specific.
n
Lk + (1 − )H
k
= . (7)
n−1 The incentive to distort the benefit level (with or without DRS)
arises because profit per high risk is lower than profit per low
Average profit per insured decreases in n and increases in : a
risk, where the degree of the distortion depends on the difference
higher level of competition (a larger number of insurers n or a
between these two profits, which, in the case without DRS, is given
smaller level of ) decreases profit. Secondly, the condition deter-
by20 Lk − Hk = (pH − pL )mk .
mining the distortion of the benefit level is given by
With positive DRS, profit per low risk decreases because insurers
2
waste part of this profit on their expenditures for DRS; this reduces
(1 − )(pH − pL )
1− n mk v (mk ) = 1, (8) the difference between net profits (including ak ), and thereby the
n−1
p
incentive to distort the benefit level.
where p = pL + (1 − )pH . Because the fraction is positive, it is This is different for the case of non-individual-specific cost.21
immediately apparent that v (mk ) > 1, so that mk is distorted below Although in this case, expenditures for risk selection decrease total
the efficient level mFB . As is to be expected, the distortion increases profit, they do not specifically decrease profit per low risk, so the
in the difference pH − pL and in the level of competition (captured difference between the risk-type-specific profits remains the same:
by n/(n − 1)). Therefore, positive DRS has no influence on the distortion of the
benefit level if cost is non-individual-specific. In the following, we
3.2. The pooling equilibrium with positive DRS will therefore only consider individual-specific cost.
3.2.1. Positive DRS of risk type

With positive DRS of the risk type itself and individual specific
3.2.2. Positive DRS of a signal that is correlated with risk type
cost, insurer k’s objective reads as
We now analyze the (probably more realistic) case that DRS is
k = PLk (Lk − ak ) + (1 − )PHk H
k
, (9) not targeted at the risk type itself, but at a signal that is (less than
perfectly) correlated with risk type. We assume that there are two
where now PLk contains g(ak ) and g(aj ) as given in (5).18 signal types s = Y, O, young and old. The shares of the four types of
The solution to this objective yields a positive equilibrium individuals, rs , are given in Table 2, where ı > 0 captures the case
level of insurers’ expenditures on DRS, implicitly defined by of a positive correlation of high risk and old age.
This formulation of a positive correlation has the advantage that
increasing ı increases the level of correlation without altering the
16
After a German sickness fund operating mainly in high cost areas went bankrupt shares of the two risk types, and (1 − ), and the shares of the two
in 2011, members of this fund who then applied at other funds received phone calls signal-types, and (1 − ).
in which some of the insurers told them that they could not continue their drug
therapy or disease management program if they did not choose a different insurer;
see, e.g., Spiegel (2011).
17
See Lorenz (2014). There, we also illustrate all results with an example.
18
We assume Hk < 0, so that positive DRS is targeted only at the low risk. For a 19
See Appendix A.2, where also condition (10) is derived.
very low level of competition (very high ), even the high risks would entail a profit 20
See the second term in the brackets in condition (8), which contains (pH − pL )mk .
and positive DRS would be targeted at both risk types (albeit at different levels). 21
In this case, insurer k’s objective is k = (PLk Lk − ak ) + (1 − )PHk Hk .
With positive DRS of the young and individual-specific cost, so that the distortion of the benefit level is eliminated for
insurer k’s objective reads as22 (1 − ) H
RAO = (p − pL )m. (15)
k k k k k ı
= rs Prs r − rY PrY a , (11)
r s r
If there is perfect correlation, the share of the low risks equals the
share of the young, so = ; in addition, the mass of individuals
k contains g(ak ) and g(aj ). The distortion of the benefit
where only PrY in the lower left and the upper right corner in Table 2 has to be
level is now determined by23 zero, which requires ı = (1 − ) = (1 − ). Replacing in (15) shows
2
that with perfect correlation the efficient benefit level is therefore
(1 − )(pH − pL ) (pH − pL ) k k implemented for RAO = (pH − pL )m, which is just the cost difference
1− n mk + n ıa v (m ) = 1. (12)
n−1
p n−1
p between the two risk types. With less than perfect correlation,
ı < (1 − ), so there is overpayment, and the lower the level of
Because the last term in the brackets is positive, compared with no correlation (i.e., the lower ı), the larger this overpayment has to
DRS, v (mk ) has to decrease, so the distortion is reduced. However, be.
for a given level of ak , the reduction of the distortion is of course not
as large as with DRS against the risk type itself, since ı < (1 − ). 3.3.2. Optimal risk adjustment with positive DRS
If there is less than perfect correlation, expenditures on DRS of the With optimal risk adjustment there is overpayment for the old,
young not only increase the cost of the low risks, but also of the so this is the group positive DRS will be targeted at.25 Insurer k’s
high risks and therefore translate into a smaller reduction of the objective in this case is given by (13) with RAO replaced by RAO − ak
cost difference between the risk types. As is to be expected, for a and with PrO k containing g(ak ) and g(aj ) as given in (5). The posi-
given level of ak the reduction of the distortion increases in the level tive equilibrium level of ak then enters condition (14), where again
of correlation (i.e. in ı). RAO has to be replaced by (RAO − ak ).26 The efficient benefit level is
Result 2. In the pooling equilibrium, the distortion of the benefit therefore implemented with
level decreases in the level of positive DRS of a signal that is correlated (1 − ) H
with low risk if cost for DRS is individual-specific. A higher level of RAO = (p − pL )m + ak . (16)
ı
correlation (a higher ı) increases the effect of a given level of DRS on
the distortion of the benefit level. Because part of the overpayment for the old is spent on positive
DRS, the optimal overpayment for the old has to be raised by exactly
these expenditures, so that the net difference in payments (includ-
3.3. Implications of DRS on optimal risk adjustment in the ing ak ) equals the amount necessary to eliminate IRS as given in
pooling equilibrium (15).
We now discuss the implications of the interaction of direct and Result 3. With individual-specific cost, if there is positive DRS
indirect risk selection for optimal risk adjustment. As shown by regarding a signal that is used for risk adjustment and that is correlated
Glazer and McGuire (2000), if a regulator does not observe indi- with high risk, the optimal overpayment of that signal to eliminate IRS
viduals’ risk type, but only a signal that is correlated with risk type has to be increased by the expenditures for DRS.
(like age), it is still feasible to eliminate the distortion of the benefit
This result shows that the formula for optimal risk adjustment
package by overpaying for a signal that indicates a high risk, and
derived by Glazer and McGuire (2000) is not invalidated by DRS:
underpaying for a signal that indicates a low risk. In Section 3.3.1,
there is still overpayment for a signal that is correlated with high
we first show how to implement optimal risk adjustment in our
risk, and underpayment for a signal correlated with low risk. Also,
setting with two risk types and two age groups if there is no DRS;
DRS does not invalidate their claim that optimal risk adjustment
we then determine how the optimal payments have to be modified
can implement the efficient benefit level. However, the formula to
if there is DRS in Section 3.3.2.
implement the efficient benefit level has to be modified and include
insurers’ expenditures on DRS if cost is individual specific. Whether
3.3.1. Optimal risk adjustment without DRS these expenditures are negligible or significant is an empirical mat-
With risk adjustment, each insurer receives a payment of RAO ter, but, e.g., the findings of Starc (2014), who reports that insurers
for each insured that is old; these payments are financed by a risk spend a large part of potential profits on marketing and insurance
adjustment fee RAF which each insurer has to pay for each insured brokers, indicate that these expenditures may be substantial.
(including the old). The balanced budget constraint requires this fee
to be RAF = (1 − )RAO . The insurer’s objective with risk adjustment 3.4. More than two risk types
is then given by
To keep the notation simple when deriving the results, so far
k = k
rY PrY (rk − RAF) + k
rO PrO (rk − RAF + RAO ). (13) we have considered the case of only two risk types. The results,
r r however, also hold for an arbitrary number of risk types. Let the
number of risk types be r and denote the probability of risk type r
Simplifying the optimality conditions now yields24
by pr and its share by r .
2
In the conditions determining the distortion of the benefit level,
(1 − )(pH − pL ) (pH − pL )
1− n mk + n ıRAO v (mk ) = 1, (14) conditions (8), (12) and (14), the term (1 − )(pH − pL ) then has to
2
n−1
p n−1
p r 2
be replaced by (p − p) > 0; see Appendix B.1. This is just
r=1 r r
the variance of the illness probabilities, and the larger this variance,
the larger the distortion of m.
22
Like in Section 3.2.1, we assume that positive DRS is profitable for only one of
the two signal types; see footnote 18.
23 25
See Appendix A.3. As before we assume that DRS is only targeted at one of the signal types.
24 26
See Appendix A.4. See Appendix A.5.
For this general case of an arbitrary number of risk types, we

define the shares of individuals (which were given in Table 2 for
the case of two risk types) by rY = r + ır and rO = r (1 − ) − ır ,
see Table B.1 in Appendix B.2. It is then straightforward toshow that
there is a positive correlation of risk type and old age for r ır pr < 0;
ır has to be (mainly) positive for small illness probabilities and neg-
H L
ative for large illness probabilities. The term(p − p )ı in the last
fraction of (12) then has to be replaced by − r ır pr , so positive DRS
again reduces the distortion caused by IRS (see Appendix B.3).
To eliminate the incentives for IRS with optimal risk adjust-
ment, as before there has to be overpayment (see Appendix B.4)
and this overpayment has to be increased by insurers’ expenditures
for positive DRS (see Appendix B.5).
3.5. The pooling equilibrium with negative DRS

Fig. 1. Separating equilibrium: contracts B and A3 are offered.
The discussion of negative DRS can be very brief because for all
the settings we considered, negative DRS creates just the opposite
effects of positive DRS.27 Negative DRS increases the loss associated
with the high risks, which increases the cost difference between the type B. If, e.g., B > A , it would be profitable for an insurer of type
two risk types and thereby increases the incentives for IRS. A to become of type B; increasing nB decreases the market share of
Negative DRS regarding a signal that is used for risk adjustment each insurer of type B, which decreases B . In equilibrium, profit
and for which there is underpayment increases the loss for this sig- per high risk equals nB /(nB − 1), so the iso-profit line associated
nal, which allows keeping the underpayment less pronounced than with contract B starts at that level, see Fig. 1. However, while B
without DRS. We can therefore summarize the results for negative is indeed one of the equilibrium contracts, A1 is not. To simplify
DRS as follows: the exposition of which contract is offered for the low risks, in the
following we will simply speak of ‘insurer A’ and ‘insurer B’, i.e.,
Result 4. In the pooling equilibrium and with individual specific cost,
present the explanation as if there was only one insurer of type A
the distortion of the benefit level increases in the level of negative
and one insurer of type B.
DRS against the high risk type or against a signal that is correlated B
Because B and A1 are both located on the indifference curve I VH ,
with low risk. If there is negative DRS regarding a signal that is used
these two benefit-premium bundles provide the same utility to the
for risk adjustment and that is correlated with low risk, the optimal
high risk, i.e., VHB = VHA1 ; therefore, all the high risks with εAi > εBi
underpayment of that signal to eliminate IRS has to be reduced by the
will choose insurer A. From the perspective of insurer A who tries
expenditures for DRS.
to avoid being chosen by the high risks by offering a contract that
satisfies the incentive compatibility constraint, what is important
B
is not only the indifference curve I VH , but also the additional utility
4. DRS in the separating equilibrium j
components εi , in particular, the difference εAi − εBi . For some of the
4.1. The separating equilibrium without DRS high risks, this difference will be negative, for others somewhat,
or even considerably, larger than zero. There is therefore no single
In Lorenz (2013), it is shown that the separating equilibrium incentive compatibility constraint (ICC), but an individual-specific
arises for a high level of competition (i.e. a low level of ) and that ICC for each of the high risks depending on this difference.
it is similar, but not identical, to the Rothschild–Stiglitz equilibrium One way to capture these individual-specific ICCs is by a dis-
under perfect competition.28 Both equilibria can be found in Fig. 1. tribution function denoting the share of the high risks for which
Under perfect competition, the separating equilibrium consists the ICC is violated; formulated in this way, this distribution func-
of contract B, chosen by the high risks, and contract A1 , chosen by tion coincides with insurer A’s market share among the high risks
the low risks, where A1 is located at the intersection of the low as a function of VHA given VHB , i.e., PHA = PHA (VHA |VHB ). Graphically, this
B distribution function can be depicted by a shaded area around the
risks’ iso-profit line pL and the high risks’ indifference curve I VH B
associated with contract B (so that the incentive compatibility con- indifference curve I VH , representing the density of this distribution
straint is satisfied); in addition, both iso-profit lines pass through (with the darkness of the shaded area as a measure of this density,
the origin and are in this case zero-profit lines.29 The number of which is bell-shaped). Above the shaded area, the ICC is satisfied for
insurers offering the contract for the low risks (insurers of type A) all the high risks, so PHA = 0. Within the shaded area, PHA increases
and for the high risks (insurers of type B), nA and nB respectively, in VHA according to the density ∂PHA /∂VHA = PHA (1 − PHA )/. Below the
are indeterminate under perfect competition.30 shaded area, PHA = 1, because there the ICC is violated for all the
This is different under imperfect competition, where nA and nB high risks.
are implicitly defined by the profit equality condition A = B : total Insurer A, when offering A2 , would therefore not be chosen by
profit for an insurer of type A has to be the same as for an insurer of any of the high risks. However, he can still increase his profit by
moving along the indifference curve of the low risks to the right,
which would have two effects31 : on the one hand, it increases profit
27
This is straightforward to show by replacing ak by bk and +g(a) by −f(b) in all
per low risk because the indifference curve has a larger slope than
the optimization problems and the derivations we derived in the appendix; see also pL ; on the other hand, it also increases the number of the high risks
Lorenz (2014).
28
The separating equilibrium for a low level of exists under the same condition
as the Rothschild–Stiglitz equilibrium: the share of low risks must not be too large.
29 31
See Zweifel et al. (2009), chapter 7. We consider a movement along the indifference curve of the low risks because
30
However, Olivella and Vera-Hernandez (2010) have shown that the smallest with more than one insurer of type A, this would leave this insurer’s market share
number of insurers supporting this equilibrium is n = 3, with nB = 1 and nA = 2. among the low risks unaffected.
Fig. 2. Separating equilibrium with (a) positive or (b) negative DRS.
choosing this contract. Because at the boundary of the shaded area Result 5. In the separating equilibrium, the distortion of the bene-
this second effect is of second order (since the density is about zero fit level is reduced if there is positive DRS of the low risk and cost is
if PHA ≈ 0), the two effects balance somewhere inside the shaded individual-specific.
area, represented by contract A3 .32 Because a small share of the
high risks choose contract A3 , this contract is somewhat above the We now turn to the case that both risk types can experience the util-
iso-profit line pL (which would apply if only the low risks choose ity increase g(a). In this case, DRS will be performed by all insurers:
insurer A). insurers of type A engage in DRS of the low risks and insurers of type
Due to the ‘stochastic nature’ of the incentive compatibility con- B in DRS of the high risks (since high risks are profitable to insurers
straint, the separating equilibrium under imperfect competition is of type B). Because the equilibrium level of expenditures on DRS in
therefore not perfectly separating. Instead, a small share of the high this case is the same for both types of insurers,35 all insurers raise
risks chooses the contract designated for the low risks, but none of their premium by the same amount. In Fig. 2(a), both iso-profit lines
the low risks choose the contract designated for the high risks. are then shifted upwards by the same distance, so there is no effect
on the benefit levels mA and mB .
However, it seems reasonable to assume that positive DRS is
4.2. The separating equilibrium with DRS at least somewhat more effective when targeted at the low risks
than when targeted at the high risks. In equilibrium, insurers of
4.2.1. The separating equilibrium with positive DRS type A will spend more on DRS and raise their premium by a higher
For positive DRS, we have to distinguish whether the utility amount than insurers of type B. The larger upward shift of their
increase g(a) will only (or at least primarily) be experienced by the iso-profit line then allows insurers of type A to increase their ben-
low risks (as, e.g., might be the case with a discount for a fitness efit level (just as in the case where the iso-profit line of insurers
club membership) or equally by both risk types (as, e.g., might be of type B is not shifted at all). Therefore, as long as positive DRS
the case with advertising). We begin with the case that the utility reduces the attractiveness of the contract offered by insurers of
increase can only be experienced by the low risks. Positive DRS will type A relative to the contract offered by insurers of type B in the
than only be performed by insurers of type A.33 premium-dimension, insurers of type A can increase the attractive-
The objective of insurers of type A with positive DRS of the low ness of their contract in the benefit level-dimension.
risks equals the objective as given in (9). The solution to this objec-
tive determines a positive equilibrium level of expenditures on
4.2.2. The separating equilibrium with negative DRS
DRS, which increases the premium charged by these insurers.34
Negative DRS will only be performed by insurers of type A,
In Fig. 2(a), this increase in RA can be shown by an upward shift of
because only these insurers try to avoid being chosen by the high
the corresponding iso-profit line. As is apparent, insurers of type
risks. As in Section 4.1, we explain the effect of negative DRS
A can then increase their benefit level before attracting the same
as if there was only one insurer of type A and one insurer of
share of the high risks as without DRS. Because a higher bene-
type B.
fit level (accompanied by the according increase in the premium,
With negative DRS, a high risk chooses insurer A if VHA − f (bA ) +
pL mA ) increases the utility of the low risks, insurers will offer this
εAi > VHB + εBi . Because negative DRS reduces the utility as perceived
higher benefit level (if there is some competition), so that the new
equilibrium will be a contract like A4 . Since positive DRS reduces by the high risks by f(bA ), VHA can be increased by exactly that
the attractiveness of the contract offered by insurers of type A for amount without altering the number of the high risks choosing
the high risks in the premium-dimension, they can increase the insurer A. This allows insurer A to increase his benefit level, see
attractiveness of their contract in the benefit level-dimension. Fig. 2(b). There, AL5 denotes a contract offered by insurer A as
perceived by the low risks, while AH 5
denotes the same contract as
perceived by the high risks. Compared to AL5 , AH 5
is shifted upwards
by f(bA ), the utility decrease of negative DRS (measured in mone-
32
See Appendix C.1; for a more detailed derivation of this equilibrium, see Lorenz tary terms). The larger bA , the larger f(bA ), and therefore the larger
(2013).
VA
mA can be without increasing the share of high risks choosing
33
Because contract B is far above the indifference curve I L associated with con-
insurer A. Because this effect occurs regardless of whether cost
tract A3 , for the low risks there is a huge difference in utility between A3 and B. Any
moderate increase in the (perceived) utility of contract B due to positive DRS of insur- for DRS is individual-specific or not, there exists one case where
ers of type B will reduce this difference only to a small degree and not induce any
of the low risks to choose this contract. Because insurers of type B cannot increase
their market share among the low risks (which remains at zero), they will refrain
from positive DRS. 35
This can be seen by replacing Hk by (Hk − ak ) in (6) for insurers of type B and
34
See Appendix C.2. comparing the respective optimality condition with the one for insurers of type A.
DRS influences the distortion caused by IRS even if cost is non- Appendix A.
individual-specific.
A.1. The pooling equilibrium without DRS
Result 6. In the separating equilibrium, negative DRS against the
high risks reduces the distortion of the benefit level of the low risks Using the property that the derivative of Prk with respect to Vrk
∂Prk P k (1−P k )
regardless of whether cost for negative DRS is individual-specific or can be expressed in terms of Prk itself as = r r , the FOCs to
∂Vrk
not.
objective (6) are given by
The main mechanism in this last case is therefore different from all

the other cases we have considered. Here, IRS is substituted by DRS, ∂ k P k (1 − PLk ) k P k (1 − PHk ) k
while in all the other cases, IRS affects risk type specific profits and = − L L + PLk + (1 − ) − H H + PHk
∂ Rk
thereby the gains from IRS.
=0 (17)
4.3. Optimal risk adjustment in the separating equilibrium

∂ k PLk (1 − PLk ) k
= pL v L − PLk
With optimal risk adjustment, profit per high risk equals ∂m k
profit per low risk, so insurers’ incentives to distort the benefit
package are eliminated and the separating equilibrium turns into PHk (1 − PHk )
+ (1 − )pH v Hk − PHk = 0. (18)
a pooling equilibrium where all insurers offer the same benefit
premium bundle. The impact of DRS on optimal risk adjustment
j
in the pooling equilibrium has, however, already been derived in Using the fact that in equilibrium Pr = (1/n) ∀ j, condition (17)
Section 3.3.2. yields (7). Solving (17) for Rk = (n/(n − 1)) + pmk and inserting
in (18) yields (8).
5. Conclusion A.2. Positive DRS of risk type in the pooling equilibrium
In this paper, the interaction of direct and indirect risk selec- The FOC of (9) with respect to ak is
tion (DRS and IRS) has been analyzed. It has been shown that DRS,
∂ k PLk (1 − PLk ) k
using measures unrelated to the benefit package, nevertheless has = g (a )(Lk − ak ) − PLk = 0. (19)
an influence on the distortions of the benefit package caused by IRS. ∂ak
If cost for DRS is (at least to some degree) individual-specific, DRS
j
selectively reduces the profit per individual of the risk type it is tar- With Pr = (1/n) ∀ j, this can be simplified to g (ak )(Lk − ak ) =
geted at. Positive DRS therefore reduces and negative DRS increases n/(n − 1), which implicitly defines a positive level of ak , because
the difference in profits between the low and the high risks. Because lim g (a) → ∞. Replacing Lk by (Lk − ak ) in (17) and (18), solving
in the pooling equilibrium the degree of the distortion depends a→0+
on the difference between these profits, positive DRS reduces the (17) for Rk and substituting in (18) then yields (10).
distortion of the benefit level, while negative DRS increases it. In
the separating equilibrium, DRS can act as a substitute for IRS and A.3. Positive DRS of a signal that is correlated with risk type
thereby reduce the distortion of the benefit level. In addition, it
has been shown that the effects of DRS on type- and signal-specific The FOC of (11) with respect to ak defines a positive equilibrium
profits also have an influence on the formula for optimal risk adjust- level of ak . Solving the FOC with respect to Rk for Rk = (n/(n −
ment: The over- and underpayments have to be inflated by insurers’ 1)) + pmk + ak and inserting in the FOC with respect to mk yields
expenditures on positive DRS and reduced by their expenditures on (12).
negative DRS.
We have derived these results for a setting where negative DRS
occurs during the application process, but the main mechanisms A.4. Optimal risk adjustment without DRS
should also hold if it is targeted at the high risks who already hold
a contract with the insurer. If an insurer attracts a larger share of Solving the FOC of (13) with respect to Rk for Rk = (n/(n − 1)) +
the high risks, this will increase the cost of negative DRS, even if the pmk + RAF − (1 − )RAO and substituting in the FOC with respect to
activity of risk selection and the cost associated with it occur only mk (using the definitions of rs as given in Table 2) yields (14).
later (when the insurer tries to induce the high risks to switch to
another insurer). With negative DRS, high risks are more expensive A.5. Optimal risk adjustment with DRS
than without DRS; anticipating these additional cost, insurers will
therefore not make their contract as attractive for this risk group The derivation of condition (16) is identical to the derivation of
as they would without DRS. Therefore, negative DRS will increase condition (14) described in Appendix A.4; the only difference is that
the distortion in such a setting as well. RAO has to be replaced by (RAO − ak ).
Appendix B and C. Supplementary data

Acknowledgements
Supplementary data associated with this article can be found, in
I thank Friedrich Breyer and Esther Schuch for helpful comments the online version, at http://dx.doi.org/10.1016/j.jhealeco.2014.12.
and suggestions. 003
References Lorenz, N., 2014. The interaction of direct and indirect risk selection. Trier University
Working Paper 12/14. http://ideas.repec.org/p/trr/wpaper/201412.html
Bauhoff, S., 2012. Do health plans risk-select? An audit study on Germany’s McGuire, T.G., Glazer, J., Newhouse, J.P., Normand, S.-L., Shi, J., 2013. Integrating risk
Social Health Insurance. Journal of Public Economics 96 (9–10), adjustment and enrollee premiums in health plan payment. Journal of Health
750–759. Economics 32 (6), 1263–1277.
Bijlsma, M., Boone, J., Zwart, G., 2011. Competition leverage: how the demand side Newhouse, J.P., Price, M., Huang, J., McWilliams, J.M., Hsu, J., 2012. Steps to reduce
affects optimal risk adjustment. TILEC Discussion Paper, 2011-039. favorable risk selection in Medicare Advantage largely succeeded, boding well
Breyer, F., Bundorf, M.K., Pauly, M.V., 2011. Health care spending risk, for Health Insurance Exchanges. Health Affairs 31 (12), 2618–2628.
health insurance, and payment to health plans. In: Pauly, M.V., McGuire, Olivella, P., Vera-Hernandez, M., 2010. How complex are the contracts offered by
T.G., Barros, P.P. (Eds.), Handbook of Health Economics, vol. 2. Elsevier, health plans? SERIEs 1 (3), 305–323.
Amsterdam, pp. 691–762. Rothschild, M., Stiglitz, J., 1976. Equilibrium in competitive insurance markets: an
Brown, J., Duggan, M., Kuziemko, I., Woolston, W., 2011. How does risk selection essay on the economics of imperfect information. Quarterly Journal of Eco-
respond to risk adjustment? Evidence from the Medicare Advantage Program. nomics 90 (4), 629–649.
NBER Working Paper 16977. Shen, Y., Ellis, R.P., 2002. How profitable is risk selection? A comparison of four risk
Cao, Z., McGuire, T.G., 2003. Service-level selection by HMOs in Medicare. Journal of adjustment models. Health Economics 11 (2), 165–174.
Health Economics 22 (6), 915–931. Shi, J., 2013. Efficiency in plan choice with risk adjustment and premium discrimi-
Eggleston, K., 2000. Risk selection and optimal health insurance-provider payment nation in Health Insurance Exchanges. Unpublished.
systems. Journal of Risk and Insurance 67 (2), 173–196. Spiegel, 2011. Fragwürdige Beratung: Krankenversicherer wimmelt Senioren ab.
Ellis, R.P., McGuire, T.G., 2007. Predictability and predictiveness of health care spend- Spiegel, May 9, 2011. http://www.spiegel.de/wirtschaft/soziales/fragwuerdige-
ing. Journal of Health Economics 26 (1), 25–48. beratung-krankenversicherer-wimmelt-senioren-ab-a-761384.html
Frank, R.G., Glazer, J., McGuire, T.G., 2000. Measuring adverse selection in managed Starc, A., 2014. Insurer pricing and consumer welfare: evidence from Medigap. RAND
health care. Journal of Health Economics 19 (6), 829–854. Journal of Economics 45 (1), 198–220.
Glazer, J., McGuire, T.G., 2000. Optimal risk adjustment in markets with adverse Train, K.E., 2009. Discrete Choice Methods with Simulation, 2nd ed. Cambridge Uni-
selection: an application to managed care. American Economic Review 90 (4), versity Press, New York.
1055–1071. van de Ven, W.P., Ellis, R., 2000. Risk adjustment in competitive health plan markets.
Glazer, J., McGuire, T.G., 2002. Setting health plan premiums to ensure efficient qual- In: Culyer, A.J., Newhouse, J.P. (Eds.), Handbook of Health Economics. Elsevier,
ity in health care: minimum variance optimal risk adjustment. Journal of Public Amsterdam, pp. 755–845.
Economics 84 (2), 153–173. van de Ven, W.P., van Vliet, R.C., 1992. How can we prevent cream skimming in
Jack, W., 2006. Optimal risk adjustment with adverse selection and spatial compe- a competitive health insurance market? The great challenge for the 90’s. In:
tition. Journal of Health Economics 25 (5), 908–926. Zweifel, P., Frech, H. (Eds.), Healtheconomics Worldwide. Kluwer Academic Pub-
Lorenz, N., 2013. Adverse selection and risk adjustment under imperfect competi- lishers, Dordrecht, Boston, London, pp. 23–46.
tion. Trier University Working Paper 5/13. http://ideas.repec.org/p/trr/wpaper/ Zweifel, P., Breyer, F., Kifmann, M., 2009. Health Economics, 2nd ed. Springer, Berlin,
201305.html Heidelberg.

Environmental regulations on air pollution in China and their impact

on infant mortality
Shinsuke Tanaka ∗
Tufts University, United States
Article history: This study explores the impact of environmental regulations in China on infant mortality. In 1998, the
Received 27 January 2014 Chinese government imposed stringent air pollution regulations, in one of the first large-scale regulatory
Received in revised form 24 February 2015 attempts in a developing country. We find that the infant mortality rate fell by 20 percent in the treatment
Accepted 26 February 2015
cities designated as “Two Control Zones.” The greatest reduction in mortality occurred during the neonatal
period, highlighting an important pathophysiologic mechanism, and was largest among infants born
to mothers with low levels of education. The finding is robust to various alternative hypotheses and
specifications. Further, a falsification test using deaths from causes unrelated to air pollution supports
Q56
I18
these findings.
Q53 © 2015 Elsevier B.V. All rights reserved.
I12
O13
Keywords:
Infant mortality
Air pollution
Environmental regulation
China
1. Introduction themselves to a legal framework for reducing pollution emissions.

Their opposition is largely due to the prevailing concern that the
There is little disagreement that air pollution poses a major envi- economic costs associated with pollution abatement may outweigh
ronmental risk to human health. Improved air quality worldwide the health benefits. Accordingly, air pollution regulations are still
is correlated with the amelioration of numerous health problems, rare in developing countries, and whether, and to what extent,
including respiratory infections, cardiovascular diseases, and lung environmental regulations on air pollution lead to health benefits
cancer.1 Developing countries rank highest in air pollution world- remains an important question yet to be answered.
wide, and children under age five in these countries are considered In this paper, we examine the effect of environmental regula-
to be the most vulnerable population. Elevated air pollution is tions pertaining to air pollution on infant mortality in China. As
generally beyond the scope of individual control and falls to the China’s economy continued to grow at unprecedented rates for the
public sector. However, environmental regulations on air pollution last several decades, ambient air quality deteriorated to one of the
are extremely contentious. This was evident at the 2009 United worst levels in the world due to its heavy reliance on coal-fired
Nations Climate Change Conference, also known as COP15, held in energy generation. In 1998, the Chinese government imposed strin-
Copenhagen, where many developing countries refused to commit gent regulations on pollutant emissions from power plants, in one
of the first attempts on such a large scale in developing countries.
This so-called the Two Control Zone (TCZ) policy designated nearly
∗ Correspondence to: The Fletcher School, Tufts University, 160 Packard Avenue, 175 prefectures exceeding the nationally mandated pollution
Medford, MA 02155, United States. Tel.: +1 6176274619. standards as the TCZ. In these areas, the power industry, which con-
E-mail address: Shinsuke.Tanaka@tufts.edu tributed more than 90 percent of air pollution, was forced to reduce
1
A sizable literature documents health risks associated with air pollution expo-
emissions and install new pollution control technologies, while also
sure. See, for example, Schwartz et al. (1996), Levy et al. (2000), Samet et al. (2000),
Chay and Greenstone (2003a), Currie and Neidell (2005), and Currie et al. (2009),
shutting down a massive number of small, inefficient power plants.
and Arceo et al. (2012). For our purpose, the TCZ policy provides a quasi-experimental
S. Tanaka / Journal of Health Economics 42 (2015) 90–103 91
environment, wherein the intensity of exposure to the regulations (2011) examines regulations on air pollution and water pollution
can be defined by the TCZ regulatory status, and we are able to com- in India since 1987. They find these regulations efficacious in reduc-
pare changes in infant mortality rate (IMR) before and after the pol- ing air pollution, but such reductions led to modest and statistically
icy reform, between the cities assigned and not assigned as the TCZ. insignificant reductions in infant mortality. Our study provides
To implement the analysis, we draw IMR data from the Chi- an interesting contrast, finding that infant mortality significantly
nese Disease Surveillance Points (DSP) system that collected birth responded to the environmental regulation. Further, the regulation
and death registrations for 145 nationally representative sites from we focus on targeted coal for energy generation, which is the major
1991 through 2000. IMR, defined as the number of infant deaths contributor to air pollution in many other developing countries as
under age one per 1000 live births in a given year, is available for well, whereas Greenstone and Hanna (2011) focus on vehicular
each DSP site by year level, linked with detailed information on pollution.3,4 The findings in this study accordingly present relevant
birth characteristics and parental attributes. We match this dataset estimates for the effect of environmental regulations in develop-
to the TCZ regulatory status assigned to individual cities, based on ing countries implementing similar policies on coal in the power
the governmental report, and thereby estimate the treatment effect industry.
of the regulations. Second, the present study contributes to our understanding
We find that the air pollution regulations led to significant of the relationship between air pollution and infant mortality at
reductions in infant mortality. The difference-in-differences esti- greater concentration levels. Previous evidence is predominantly
mates suggest that the regulations have led to 3.29 fewer infant derived from the United States or other developed countries, where
deaths per 1000 live births than would have occurred in the absence pollution is relatively low.5 Since we know little about the shape of
of the regulations. This corresponds to a 20 percent reduction the dose–response relationship, it is consequently difficult to pre-
in IMR. 63 percent of the reduction in infant mortality occurred dict the marginal impact of pollution reduction in the presence of
during the neonatal period, highlighting an important pathophysi- non-linear relationship.6 Air pollution in China is one of the highest
ologic mechanism, and the greatest reduction of mortality occurred in the world. Its total suspended particulates (TSP) level in 1995 was
among children born to mothers with low educational levels. four times higher than the WHO standards, and four times higher
A major methodological challenge, however, is that the TCZ des- than the level in the United States in 1970, when the Clean Air
ignation rule may not be orthogonal to unobserved characteristics Act was amended, as examined in Chay and Greenstone (2003b).
that contribute to reductions in air pollution and infant mortal- Thus, estimates in China provide compelling evidence applicable to
ity. The present study conducts a number of robustness checks the distinctive context of developing countries where air pollution
and a falsification test to address this issue. First, we confirm that levels are relatively high.
the TCZ status has little association with changes in observable Third, there is extensive literature showing differential patterns
covariates, assuring that there is no systematic difference in con- according to socioeconomic status, yet it is still an open question as
current trends in observable characteristics between the TCZ and to whether air pollution also exhibits differential impact on infant
non-TCZ cities. Although this is not a direct test of the exclusion mortality (Currie and Hyson, 1999; Case et al., 2002; Jayachandran,
restriction, since it requires that TCZ status not be correlated with 2009). While infants in poor countries are considered to be the
trends in unobservable factors, this result leads us to believe that most susceptible to the effect of pollution, not only because of
the treatment effect is less likely to be confounded by differential high pollution levels but also because families lack the resources
trends in unobservable factors as well (Altonji et al., 2005). Second, or knowledge necessary to avoid exposure, the impacts may be
the regression is also directly adjusted for differential pre-existing small if air pollution does not have first-order effect on them.7 Thus,
trends in mortality, yet the estimates are essentially unaffected. the present study helps identify vulnerable population in designing
Lastly, the policy had no impact on infant deaths due to acciden- policies.
tal causes. The absence of a causal mechanism linking air pollution The current research design has several advantages over previ-
to these causes of death serves as falsification evidence, suggest- ous studies. First, this study focuses on infants, not only because
ing that differences in access to or quality of medical services and
technologies cannot be the sources of bias. Overall, there is no evi-
dence that the estimates are driven by inappropriate identification
system. Second, even if pollution is successfully reduced to some extent, infant mor-
assumptions, leading us to believe that the treatment effect based tality may not fall if a concave relationship between mortality and pollution level
on the TCZ status is indeed not spurious but causal. leads to low marginal pollution effect at high concentration levels. Third, magni-
This study makes three major contributions to the existing liter- tudes of impacts in reducing air pollution should be greater if people have limited
ature. First, by exploiting regulation-induced changes in air quality, access to medical services, initially have lower health status, and/or have limited
knowledge in avoiding pollution.
it addresses a policy-relevant question: to what extent do envi- 3
Since the most heavily affected industry is the power industry, which was a
ronmental regulations in developing countries lead to reductions driving force behind China’s rapid economic growth, the findings in this study
in infant mortality? Several prior studies have focused on vari- accordingly highlight the important tradeoffs among economic growth, environ-
ation in air quality induced by recession (Chay and Greenstone, mental quality, and human health. See Tanaka et al. (2014) for the impact of the
2003a), weekly fluctuations (Currie and Neidell, 2005), wildfires environmental regulation on industrial performance.
4
A relatively smaller-scale air quality regulatory regime, targeting a different
(Jayachandran, 2009), or wind directions (Luechinger, 2014). Chay industry, in another developing country, can be found in the Indian transportation
and Greenstone (2003b) provide compelling evidence for the link- sector, which was mandated to use compressed national gas vehicles in Delhi during
age between the Clean Air Act of 1970 and infant mortality in the working hours. Kumar and Foster (2007) show its effect on respiratory health.
5
U.S. It remains to be determined, however, whether, and how effec- Examining the pollution effect at low levels, especially levels lower than what
is often considered to be the standard, is also an interesting question in itself.
tively, environmental regulations can improve human health in
Currie and Neidell (2005) find that CO has significant impact on infant mortality
developing countries.2 A recent study by Greenstone and Hanna in California over the 1990s at relatively low levels.
6
For example, Arceo et al. (2012) show a non-linear relationship between CO
and health in Mexico. Evidence in developed countries may understate the impact
of pollution reduction in developing countries if the marginal impact of pollution is
2
Extrapolating evidence in developed countries to developing countries is dif- higher at greater concentration levels.
7
ficult for a number of reasons. First and foremost importantly, we do not know For example, people with poor health tend to stay indoor with little exposure
whether environmental regulations in developing countries had any impact in to pollution. Children from rich households, who tend to have better health, may be
reducing pollution due to weaker implemental mechanisms and an enforcement exposed more to pollution if they are more likely to have outside activities.
92 S. Tanaka / Journal of Health Economics 42 (2015) 90–103
they are particularly vulnerable to air pollution due to their weak environmental regulatory policies, the first version of which
respiratory system, but because focusing on infant mortality mit- was enacted in 1987, known as the Air Pollution Prevention and
igates complicating factors associated with adult mortality. For Control Law (APPCL). This original law, however, failed to reduce
example, adult deaths correlate more closely to chronic disease air pollution, mainly because it excluded the power sector, the
conditions than to acute changes in air quality. In addition, adults major contributor of SO2 emissions (Qian and Zhang, 1998). Even
may migrate into less polluted areas. Addressing infant mortality worse, SO2 emissions continued to surge, and areas affected by
circumvents these issues, if not completely, because it is relatively acid rain expanded.
less difficult to identify causes of death during the first year of APPCL was consequently amended in 1995. The major part of
birth, and because migration rates are low for pregnant women and the amendment was to include a section to regulate pollutant emis-
infants. Lastly, China is not only one of the most polluted countries sions and coal combustion, particularly regarding the usage of high
but also one of the first developing countries to regulate air pollu- sulfur-content coal, at power plants (Hao et al., 2007). Although the
tion on such a large-scale. It is evident that China serves as a rare 1995 APPCL still had a weak enforcement mechanism and limited
research environment in which to assess the impact of environ- efficacy, a prominent feature of the amendment was to propose a
mental regulations at greater concentration levels. future regional strategy, which would identify priority regions to
The rest of the paper is organized as follows. Section 2 provides improve air quality and prevent the spread of acid rain.9,10
the historical background on air pollution and national air pollution This was officially approved and implemented as the Two Con-
regulations in China. Section 3 describes the data and the descrip- trol Zone (TCZ) policy in January 1998 (State Council, 1998). This
tive statistics. Section 4 presents the econometric framework and legislation designated prefectures exceeding nationally mandated
its validity, and Section 5 presents empirical results. Section 6 con- thresholds as either acid rain control zone or SO2 pollution control
cludes. zone.11 Based on the records in preceding years12 , prefectures were
designated as SO2 pollution control zone if;
2. Background on air pollution and regulations in China
• Average annual ambient SO2 concentrations exceeded the Class
2.1. Brief history II standard,13
• Daily average concentrations exceeded the Class III standard, or
China is infamous for its air pollution, due to emissions from a • High SO2 emissions were recorded.14
power sector that relies heavily on coal to generate electric power.
As the world’s largest coal producer, China possesses abundant Alternatively, prefectures were designated as acid rain control
and relatively cheap coal, which constitutes the country’s primary zone if
energy resource endowment, accounting for 75.5 percent of total
energy production in 1995 (National Bureau of Statistics of China, • Average annual pH values for precipitation were less than or
2006). However, coal generally emits more pollutants than other equal to 4.5,
fossil fuels. As China underwent rapid economic growth, total SO2 • Sulfate deposition was greater than the critical load, or
emissions increased from 18.4 million tons in 1990 to 23.7 million • High SO2 emissions were recorded.
tons in 1995, and the ambient air pollution rose to levels detri-
mental to human health (State Environmental Protection Agency In total, 175 prefectures out of 333 prefectures across 27
[SEPA], 1996). provinces were designated as TCZs. They accounted for 40.6 per-
Fig. 1 illustrates the world distribution of TSP (Panel A) and SO2 cent of its population, 62.4 percent of GDP, and 58.9 percent of
(Panel B) concentration levels in 1995. The TSP level in Beijing, the the total SO2 emissions in 1995 (Hao et al., 2001). The SO2 pollu-
capital city of China, was 377 ␮g/m3 , almost four times higher than tion control zone was concentrated in the north due to high SO2
the WHO guideline of 90 ␮g/m3 , and its SO2 concentration level was emissions for heating,15 whereas the acid rain control zones were
90 ␮g/m3 , almost double the WHO guideline of 50 ␮g/m3 (WHO, primarily in the south, where heat, humidity, and solar radiation
2002). SO2 is also an important precursor of acid rain. From the combine to create high atmospheric acidity. Hence, acid rain in the
1980s to the mid-1990s, the area of China experiencing acid rain
expanded by more than 1 million km2 (Yang and Schreifels, 2003).
During that decade, elevated air pollution gave rise to increas- and cardiovascular reasons. See also Aunan and Pan (2004) and Matus et al. (2012)
ing public concern about adverse impacts on human health.8 for health effects of air pollution in China.
9
In response, the Chinese government formulated a series of Article 27 of the 1995 APPCL stipulates: “The environmental protection depart-
ment under the State Council together with relevant departments under the State
Council may, in light of the meteorological, topographical, soil and other natural
conditions, delimit the areas where acid rain has occurred or will probably occur
8
It is generally known that the smaller a particulate, the more detrimental it is and areas that are seriously polluted by sulfur dioxide as acid rain control areas and
to health. For example, PM10 or PM2.5 , the particles with a diameter of 10 or 2.5 sulfur dioxide pollution control areas, subject to approval by the State Council.”
micrometers or less, respectively, or toxic gas, such as SO2 , are considered to be 10
It is a standard practice of policy experimentation in China to implement strate-
the most hazardous because, when inhaled, these particulate matters or gas can gies in a particular region or for a set of time period, attempting to demonstrate their
penetrate deep into the lungs and interfere with internal gas exchange. Further, effectiveness before expanding their implementation to the entire nation.
11
Laden et al. (2000) find that fine particles emitted from combustion sources (i.e., In this sense, the legislation can be considered to be parallel to the attainment
motor vehicles or coal combustion) have a stronger association with mortality than and nonattainment county designation by the Clean Air Acts in the United States.
12
those from non-combustion sources. Alternatively, SO2 becomes sulfuric acid when The original document does not specify exactly which years of records they refer
it interacts with water, which is the main component of acid rain that may have a to.
13
direct or indirect impact on health. Yet, epidemiological evidence of the impact of According to the Chinese National Ambient Air Quality Standards (CNAAQS)
SO2 on mortality in developed countries is somewhat mixed. While Mendelsohn and for SO2 , Class I standard designates an annual average concentration level not
Orcutt (1979) show close associations between the two, SO2 is also considered a less exceeding 20␮g/m3 , Class II ranges 20 ␮g/m3 < SO2 < 60 ␮g/m3 , and Class III ranges
important determinant of mortality (Schwartz and Marcus, 1990; Nielsen and Ho, 60 ␮g/m3 < SO2 < 100 ␮g/m3 . Cities should meet Class II, which is considered to be
2007). Hedley et al. (2002) is one of the few intervention studies that investigates less harmful.
14
changes in SO2 caused by an overnight restriction on all power plants and road The original document does not specify the levels of SO2 emissions that are
vehicles in Hong Kong using fuel oil with a sulfur content of more than 0.5 percent. considered to be “high.”
15
They found that the intervention resulted in an immediate reduction in ambient See Almond et al. (2009) for the impact of heating policy, which created a
SO2 concentrations and a reduction in death rates particularly due to respiratory discrepancy in air quality north and south of the Huai River.
Fig. 1. Air pollution across countries Notes: These figures present the world distribution of the TSP in Panel A and SO2 in Panel B in 1995.
Source: World Bank (1998).
south cannot necessarily be attributed to SO2 emissions traveling • All new and renovated power plants are required to use coal with
down from the north, but is rather due to local emissions. This is less than 1 percent sulfur content.
even more evident because acid deposition is the greatest in the • Existing power plants using coal with sulfur content above 1
summer, when wind direction is generally south to north. percent are required to install flue gas desulfurization (FGD)
The TCZ status enforced more stringent regulations mandating equipment.
the use of less high-sulfur coal and the development of clean coal
technology. For example; 2.2. Effectiveness of TCZ policy
• No new coal mines producing coal with a sulfur content higher Various studies have documented the effectiveness of TCZ reg-
than 3-percent can be established, and existing mines that pro- ulatory actions in reducing pollutant emissions and improving air
duce such coal must gradually be shut down or reduce output. quality. For example at the national level, SO2 emissions fell from
• Construction of any new coal-burning thermal power plants in 23.67 million tons in 1995 to 19.95 million tons in 2000, and the
large and medium-sized prefectures is prohibited. percentage of prefectures exceeding the Class II standard fell from
54 percent in 1995 to 20.7 percent in 2000. By the end of 1999, Table 1

Descriptive statistics of baseline sample.
collieries producing more than 50 million tons of high-sulfur coal
had been closed (Hao et al., 2001). By the end of 2000, the total Mean Std. Dev. Min. Max. Obs.
power capacity with FGD equipment exceeded 10,000 MW, and Panel A: Birth characteristics
small thermal power plants, with output capacity below 50 MW, IMR from death record 16.61 16.79 0 202.53 556
were actively shut down because they were relatively less efficient, IMR from birth record 9.62 13.43 0 133.18 550
had high coal consumption rates, and emitted massive amounts of % of Boy 0.55 0.04 0.33 0.71 550
% of January births 0.11 0.07 0 1 548
pollutants. This reduced raw coal consumption by10 million-tons
Birth order 1.40 0.28 1 3.39 551
and SO2 emissions by 0.4 million tons (Yang et al., 2002).
Panel B: Household characteristics
Importantly for our identification strategy, reductions in air
Mother’s age 25.37 1.35 22.18 30.45 552
pollution were more remarkable in TCZ cities. Among TCZs, SO2 Maternal education ≥H.S. 0.22 0.30 0 1 547
emissions fell by about 3 million tons, and about 71 percent of all % of Han 0.89 0.25 0 1 553
factories producing over 100 tons of emissions per year reduced
Panel C: District characteristics
their SO2 emissions to the standard between 1998 and 2000 (He Total population 106,551 53,062 5267 251,707 556
et al., 2002). Between 1998 and 2005, the number of prefectures Number of infants 1,706 1,213 44 6,747 556
in the SO2 pollution control zones meeting the Class II standard % of city and above 0.46 0.50 0 1 553
rose by 12.3 percent, those meeting the Class III standard increased Notes: Each column presents the variable mean, standard deviation, the minimum
by 4.2 percent, and those not meeting the Class III standard fell value, the maximum value, and the number of observations. The sample is at the
by 16.5 percent. Within the acid rain control zone, the number DSP site by year level for the period prior to 1995. All mean values are weighted by
the number of population.
of prefectures meeting the Class II standard rose by 3.3 percent,
those meeting the Class III standard increased by 7.9 percent, and
those not meeting Class III decreased by 11.2 percent (United death without information on birth date or family attributes. Fur-
Nations Environment Programme, 2009). Online Appendix III pro- ther details on how these 145 sites were selected, and how the data
vides available evidence on the effect of TCZ regulation on air were processed at each site can be found in Online Appendix I.
quality. TCZ regulatory status. The TCZ regulatory status is reported in
the document “Official Reply to the State Council Concerning Acid
3. Data sources and descriptive statistics Rain Control Areas and Sulfur Dioxide Pollution Control Areas,”
published by the State Council in 1998. The assignments are pri-
3.1. Data sources marily made at the prefecture level, while those in municipalities
are at the county level. The document lists the names of all pre-
Infant mortality. The micro-level data on infant mortality come fectures that are designated as within the Acid Rain Control Zone
from the Chinese Disease Surveillance Points (DSP) system. The or the SO2 Control Zone. Hence, we can merge this information to
DSP covers 145 sites, primarily at the county-level,16 established the DSP sites, and those sites that are in the TCZ prefectures com-
on the representative sample of the national population. It reports prise the treatment group in the analysis.18 In total, 61 of 145 DSP
the censuses of death and birth registrations for the sample sites are in the TCZ prefectures and thus comprise the treatment
population of 10 million residences (approximately 1 percent of group, and 84 sites are in the non-TCZ prefectures, forming the
the national population) in various geographic areas across 31 control group. More detailed rules on matching DSP sites and TCZ
provinces, autonomous regions, and municipalities in China. regulatory status are described in Online Appendix I.
Overall, the original data record approximately 500,000 deaths
(for all ages) and 1,000,000 births from 1991 through 2000, from 3.2. Descriptive statistics
which the dataset we obtained was aggregated to the DSP site
by year level.17 The birth record reports whether or not infants Table 1 presents the descriptive statistics of the baseline sample
died within the calendar year, and if they did, the cause of death, in the pre-reform period (prior to 1995).19 The weighted average
using the International Classification of Disease, 9th Revision (ICD- means of observations at the DSP site by year level are calculated
9) codes. For our purpose, we use the birth registration to measure using population as the weight. The table shows that the mean IMR
the infant mortality rates, because it additionally contains variables is 16.61 per 1000 live births, using the death record.20 As expected,
relevant for the analysis, i.e., the infant’s characteristics such as gen- the birth record yields a much lower level of IMR, 9.62 per 1000
der, the date of birth, birthweight, length of gestation period, and live births, because it reports the occurrence of infant deaths only
birth order as well as maternal demographics such as age, educa- within a calendar year.21 It is worth emphasizing that since the
tion, and race. On the other hand, the death record is less useful in
our analysis because it only reports the age in years and cause of
18
Because the regulatory actions were the same between the SO2 and acid rain
control zones, we consider both areas as comprising one treatment group.
16 19
There are four main administrative divisions in China: the province, the prefec- We use pre-1995 as the baseline observations in case the APPCL amendment
ture, the county, and the township. The “county” level refers to districts, county-level had any contribution to pollution reductions. We address this issue by separating
cities, or counties. 17 out of 145 sites are at the prefecture level, while the rest are controlling for this period up to when the TCZ policy was implemented (i.e., years
at one of the county-level divisions. between 1995 and 1997).
17 20
Because the latest DSP data available at the city level end in 2000 and infor- By 2003, there was a concern that the DSP system might not comprise a rep-
mation in the subsequent years are unfortunately not accessible, the analysis resentative sample of China, being more concentrated in developed and populous
necessarily ends then. Some evidence indicates that pollution has increased during areas than rural ones. Indeed, according to the estimates by the World Bank, the
the twenty-first century, after the coverage of our dataset, to bolster rapid domestic average IMR over the same period was 38.2 per 1000 live births. Adjustments have
economic growth, which almost negated the effect of TCZ policy. Thus, the effect been made to the DSP system by extending the number of sites to 160 after our study
of TCZ policy was indeed concentrated in the first several years, which our dataset period. For the purpose of our study, the issue is less important, as we estimate the
covers but likely to have dissipated afterwards. Subsequent policies such as subsi- within-city impacts.
21
dizing sulfur scrubber or other incentive programs to control SO2 emissions have Note that the definition of IMR will differ according to whether one is using
been implemented since 2006. The effects of these policies are beyond the scope of the death record or the birth record. Using the death record, the number of infants
our paper. who did not make it to age one per 1000 live births is measured, whereas using
main analysis focuses on the relative changes in IMR, a question of

whether the levels of IMR in the birth record are underestimated
or not does not invalidate the use of the birth record, unless the
data were intentionally manipulated to respond to the TCZ regu-
lation, which is unlikely due to the quality control enforced by the
institution that has little to do with SEPA. If there is any concern
with using the birth data, it is that the impact may be understated
if reductions in infant deaths are truncated at zero. As the table
shows, several DSP sites report no occurrence of infant deaths in
both death and birth records. This is not surprising, given that they
have a small number of births. To address this issue, we repeat the
main analysis using the death record as a robustness check.
4. Empirical framework
4.1. Basic specification
The main objective of this study is to assess the effect of air pol-
lution regulations on infant mortality. In an ideal research setting, Fig. 2. Trends in infant mortality rate. Notes: This figure plots the trend of infant
the TCZ status is randomly assigned across cities, creating varia- mortality rate due to internal causes between the TCZ and non-TCZ cities. The
annual mean is calculated using the population as the weight. The solid vertical line
tion uncorrelated with baseline characteristics. In the absence of
indicates the timing of TCZ policy implementation in January 1998. Because each
a randomized controlled trial, we first use a simple difference-in- observation represents the annual average value, the solid vertical line is located
differences (DID) approach, based on the TCZ regulatory status; between 1997 and 1998 to clarify the timing of their implementations.
Yjt = ˛ + 1 (Tj × Postt ) + ı1 Xjt + t + j + εjt (1)

in the changes in IMR before and after the regulations, between
where Yjt is IMR in city j in year t, Tj is an indicator variable that the cities that were and were not designated as a TCZ in 1998. If
takes on the value one if city j was assigned as a TCZ in 1998,22 and the air pollution regulations contributed to significant reductions
Postt is an indicator variable that takes on the value one for years in IMR in the TCZ cities relative to non-TCZ cities, 1 is expected to
in or after 1998.23 The city fixed effects, j , control for the perma- be negative.
nent heterogeneity across cities, whereas the year fixed effects, t ,
control for year-specific shocks that are common to both TCZ and 4.2. Validity of the identification assumptions
non-TCZ cities. Xjt controls for an additional set of covariates that
capture birth, parental, and city characteristics at the city by year The key identification assumption for Eq. (1) to provide a causal
level. All standard errors are clustered at the city level, allowing for inference is that the non-TCZ cities provide valid counterfactual
an arbitrary correlation within cities over time. changes in infant mortality for the TCZ cities, had they not been
The parameter of interest is 1 , the reduced-form impact of air treated, conditional on covariates. Two potential hypotheses may
pollution regulations on infant mortality, capturing the difference violate this assumption: (1) there is a systematic difference in pre-
existing trends in mortality reductions, and/or (2) the TCZ status
is not orthogonal to factors explaining the reductions in infant
the birth record, the number of infants who died within the same calendar year in mortality in the post-treatment period.
which they were born is measured. For example, if an infant is born on January 1st, To examine the pre-existing trend, we plot the evolution of IMR
there is almost one year during which to record her death during the first year after over time between the TCZ and the non-TCZ cities in Fig. 2. The
birth, whereas there is only a day in which to record the death of an infant born on
December 31st.
solid vertical line indicates the timing of TCZ policy implementa-
22
Note that the non-TCZ cities are not equivalent to “non-affected” cities. It is tion in January 1998.24 The figure provides graphical support that
more appropriate to say that the regulations were more stringent in the TCZ cities, IMR trends were similar in the pre-intervention period; infant mor-
relative to those in the non-TCZ cities. In such a context, it may be ideal for Tj to tality fell between 1991 and 1993, stayed somewhat constant until
measure a continuous intensity of exposure to the regulations, measured by either
1996, and fell again in 1997 in both sets of the cities. A trend break
per-capita amount of high-sulfur coal used or by per-capita amount of SO2 emis-
sions in the baseline years. A similar strategy is used in Qian (2008), where she uses appears around 1998, when IMR continued to drop only among the
the amount of tea planted in each county in China to measure the impacts of relative TCZ cities, while IMR stayed somewhat constant within the non-
female income on sex ratio, where women have comparative advantage in picking TCZ cities. A similar drop in IMR is illustrated in an even-study
tea, and the price of tea was dramatically increased by the post-Mao reforms. Also, type analysis in Fig. 3. Notably, although the IMR was continuously
Bleakley (2007) uses the pre-treatment hookworm infection rate to measure the
higher in the TCZ cities than those in the non-TCZ cities before 1998,
benefits of hookworm eradication on school enrollment. Unfortunately, data limi-
tation makes this infeasible, and we decided to use a discrete choice of being in the the reverse is true after 1998, which is commensurate with the tim-
TCZ cities or not. This is still valid because the large amount of high-sulfur coal and ing of the TCZ policy. A similar pre-trend between the two sets of
SO2 emissions was produced in the TCZ cities. The TCZ policy may also have reduced the cities suggests that the post-trend would have been similar in
pollution beyond TCZ cities, especially when non-TCZ cities are located near TCZ
the absence of the regulations.
cities, either directly through the policy effect on even non-TCZ cities or indirectly
through reducing pollution that travels to non-TCZ cities. (Note that lifetime and However, it may not be fully convincing that TCZ cities and
travel distance of pollutants may be highly dependent on conditions, such as type non-TCZ cities would be similarly trended in the absence of
of pollutants, temperature, weather, wind, etc.) However, reductions in pollution in the treatment. In light of this, we extend Eq. (1) by additionally
non-designated areas, if any, would only understate our finding. Restricting the con-
trol group to distance places would generate a concern about comparability across
the places. Instead, it is part of our important finding that the treatment effects exist
24
even among geographically similar sites. Because each observation represents the annual average value, the solid vertical
23
We later discuss the theoretical and empirical rationales of using 1998 as a cut- line is located between 1997 and 1998 (since the TCZ policy was implemented in
off year for the post-reform period, rather than1995 when the APPCL was amended. January, early in the year). This is to clarify the timing of the TCZ policy implemen-
We find that the results are robust to using 1995 instead. tation.
Table 2
Balancing test by the TCZ status.
Trend difference
(1) (2)
All Years 1996–2000
Panel A: Birth characteristics

% of Boy 0.004 −0.001
(0.009) (0.006)
% of January births 0.001 0.005
(0.006) (0.005)
Birth order 0.053* −0.007
(0.029) (0.024)
Panel B: Household characteristics

Mother’s age 0.138 0.025
(0.113) (0.100)
Mother’s education ≥H.S. 0.034 0.032
(0.045) (0.041)
% of Han 0.038* 0.035
(0.021) (0.029)
Panel C: District characteristics

Fig. 3. Event-study analysis of the effect on IMR. Notes: The figure plots the
Total population 1433 3171
coefficients and their associated 90% confidence interval based on a single regres-
(4132) (3719)
sion modifying Eq. (1) by interacting the TCZ and respective year dummies (instead
Number of infants 139.5 124.2
of a post dummy):
(138.2) (109.5)
2000
Yjt = ˛ + i (Tj × 1(year = i)) + ı1 Xjt + t + j + εjt . Panel D: Economic Characteristics
i=1992 GDP per capita 918.12
(yuan) (740.88)
where 1(*) is an indicator variable taking on the value of one if year is i.
Num. of hospitals −29.34
(in units) (32.70)
controlling for the location-specific linear trend using the following Water supply 0.109
(100 million tons) (0.114)
specification:
Electricity consumption 14.94
(in kwh) (26.87)
Yjt = ˛ + 2 (Tj × Postt ) + ı2 Xjt + t + j + j × t + εjt (2)
Notes: Each entry reports the coefficient of the interaction term of TCZ status and a
where j × t absorbs long-term linear trend in IMR that may vary post dummy (=1 if year is 1998 or later) from a separate regression when using the
across DSP sites. The specification in Eq. (2) explicitly controls for respective variable as the dependent variable based on Eq. (1). All regressions are
any effects through differential trends across cities. Such rigorous weighted by the number of population, and the robust standard errors, clustered at
the DSP site level, are reported in the parentheses. The sample includes all obser-
regression-adjusted evidence below also shows that the treatment vations between 1991 and 2000 in column (1), and observations between 1996 and
effect is robust to DSP site-specific time trends. 2000 in column (2). Economic characteristics are available only for urban areas and
Another concern is that the regulations’ effect may be con- from 1996.
*
founded by other concurrent changes in policies or factors affecting Significant at p < 0.1 level.
infant deaths. If the central government assigned the TCZ status
solely based on the nationally mandated standards, the designation point estimates are substantially small and virtually indistinguish-
should be less likely to respond to demands from local govern- able from zero. Importantly, the TCZ status balances the trends in
ments. However, because the TCZ status is based on air pollution important predictors of infant deaths, such as mother’s age or edu-
level, and air pollution level is conditioned by various local fac- cational level as well as the percentage of male and share of January
tors, it may still be possible that the TCZ status simply proxies births. A low degree of correlation with local economic trends sug-
these other factors. For example, air pollution may be high in urban gests that underlying economic shocks are less likely to be a source
places where the number of births is decreasing due to high levels of bias.25 In column (2), the sample is restricted to a shorter period,
of women’s participation in the labor force or relatively more strin- 1996–2000, which serves as a robustness check, because a number
gent enforcement of the one child policy. In this case, the negative of policies are fixed under the 9th Five-Year Plan. In this case, none
association between the TCZ status and IMR would be confounded of the variables presents significant differences in trends, while
by the quantity-quality tradeoff. To address this concern, we inves- many of the estimates become even smaller in magnitude.
tigate whether the TCZ status has any association with changes Overall, these findings provide compelling evidence that the TCZ
in observable characteristics. Although this is not a formal test regulatory status is orthogonal to trends in observable determi-
of exclusion restrictions, as the assumption states that the treat- nants of infant deaths, strongly suggesting that the current research
ment status should not covary with unobservable characteristics, design is unlikely to be biased by changes in unobservable vari-
the absence of significant correlation with observable characteris- ables. A falsification test and robustness checks below will provide
tics suggests that there should not be significant correlations with further support.
unobservable variables either (Altonji et al., 2005).
Column (1) of Table 2 addresses potential differences in trends 5. Empirical results
of characteristics between the TCZ and non-TCZ cities, before and
after the policy change. Each entry reflects the coefficient of the Following the identification strategies outlined above, we first
interaction term, 1 , in Eq. (1) from separate regressions, when present the main impact of environmental regulations on IMR in
each characteristic is used as the dependent variable. Any signif-
icant estimate indicates a systematic difference in trend patterns
and may confound the regulation effects. Strikingly, the table sheds 25
We address channels through additional economic activities in Online Appendix
light on little systematic difference in trends. Further, most of the II.
Table 3 of the point estimate is essentially unaffected, indicating that the

Main results on IMR.
main finding cannot be explained by differential trends.
(1) (2) (3) (4) The estimate in column (4) represents approximately 20 percent
TCZ × Post −2.870 **
−3.592 ***
−3.205 ***
−3.287 reductions in infant mortality over the three years in the post-
(1.103) (1.067) (1.087) (2.128) reform period.26 As a comparison of this figure to the U.S. context,
Observations 1340 1281 1281 1281 Chay and Greenstone (2003b) find that the nonattainment status
R2 0.415 0.438 0.442 0.552 in 1972 is associated with a 3–6 percent reduction in IMR in nonat-
Year fixed effects Y Y Y Y tainment counties from the previous year. This puts our estimates
District fixed effects Y Y Y Y in about a similar magnitude in terms of average annual reduc-
HH controls N Y Y Y
tion in IMR. It is worth mentioning that these comparisons pertain
District controls N N Y Y
District trend N N N Y
to the effect of environmental regulation on infant mortality, but
do not imply differences in marginal effect of air pollution. Due to
Notes: Dependent variable is number of infant deaths per 1000 live births, estimated
from the birth record. Robust standard errors are clustered at the DSP site level. Col-
lack of reliable pollution data in our study area, it is not feasible to
umn (1) includes only year and DSP sites fixed effects without additional covariates, calculate elasticity of infant mortality in response to air pollution.27
while column (2) adds birth and parental characteristics (share of male, birth shares
in respective month, birth order, mother’s age, mother with high school degree or
more, Han), column (3) additionally includes DSP sites characteristics (number of
5.2. The biological mechanism
births, total population, rainfall), and column (4) additionally includes DSP-site spe-
cific time trends. The point estimate based on the specification in column (1) using In an effort to identify the biological mechanism through which
the same sample as in columns (2)–(4) is −3.306 (1.06). air pollution affects infant mortality, we adopt two different strate-
**
Significant at p < 0.05 level. gies. The first examines whether the regulations are associated with
***
Significant at p < 0.01 level.
birth outcomes, namely birthweight and length of gestation period.
Maternal exposure to pollution during pregnancy is considered to
retard fetal development, as absorbed pollutants limit nutrition and
the subsequent section. We then highlight the biological mech- oxygen flows to fetuses. This sometimes results in low birthweight
anism underlying these results to illustrate that the effects are or shorter gestation period, although exact channels or outcomes
concentrated during the neonatal period and induced by changes are not well-known (Dejmek et al., 1999; Perera et al., 1999).28
in internal causes of deaths, deaths potentially associated with air Column (1) of Table 4 shows that those who were born in the
pollution, and not due to external causes of deaths, ones obviously TCZ cities weighed more by 2.66 g, yet the estimate is not distin-
not associated with air pollution. Then, we present heterogeneity in guishable from zero. This result should be interpreted with caution.
effects depending on infants’ gender or mothers’ education. Finally, As shown in the table, the baseline birthweight was already high
we present a number of robustness checks that support our main in China, and parents can often intentionally adjust birthweight.
findings. Rather, we should be more concerned about children born at low
birthweight, which is often considered to have a lasting impact
on later health and socioeconomic status.29 In Column (2), hence,
5.1. Main results on infant mortality we evaluate the distributional effect on births at low birthweight;
the dependent variable is the number of births below 2500 g per
We present the DID estimates of the regulations’ effect on infant 1000 live births. The result shows that the environmental regula-
mortality in Table 3. The dependent variable is IMR per 1000 live tions correlate to a lower incidence of low birthweight births by
births for all causes of deaths. Column (1) provides the result from 3.42 per 1000 live births.30 Columns (3) and (4) further examine
Eq. (1) without any controls except city and year fixed effects, the corresponding impact on the length of gestation period and the
indicating that the TCZ status is associated with 2.87 fewer infant
deaths per 1000 live births. Column (2) controls for a number of
birth and parental characteristics, directly addressing a concern
26
that changes in infant mortality may be explained by changes Note that we use the baseline estimate of IMR based on the death record (16.61
per 1000 live births), as it captures the entire infant deaths per year.
in observable characteristics of samples that vary over time and 27
Based on back-of-envelope calculation of elasticity using changes in air pollu-
that are correlated with the TCZ status. Namely, it controls for the tion, our elasticity of changes in mortality rate relative to changes in pollution is
percentage of male births, share of births by month, birth order, about 0.9. In economics studies, evidence in developing countries is still scarce, yet
mother’s age, mother’s education, and percentage of Han. The share a recent study by Arceo et al. (2012) finds elasticity of 0.415 in Mexico, and other
studies in U.S. provide elasticities of 0.284 (Chay and Greenstone, 2003a) or 1.827
of births in every month works similarly to birth month dummies,
(Knittel et al., 2011). Thus, our estimates in China are still in the range of the lit-
if the data were at the individual level, addressing the fact that erature, though toward a higher end. Evidence on the dose–response relationship
infant deaths for those who were born in earlier of the year are between pollution and human health in the fields of medicine and epidemiology is
more likely to be in the data. With these controls, the estimated in favor of greater elasticity at greater pollution levels.
28
coefficients become larger and are statistically significant at the 1 For adults, air pollution is often linked to respiratory and cardiovascular disease,
aggravation of asthma, heart disease, lung malfunction or cancer, stroke, and possi-
percent level. Column (3) additionally controls for DSP site charac- bly carcinogenesis. For infants, high exposure to pollution in the neonatal period is
teristics, in an effort to control for changes in total number of births, likely to result in deaths by acute respiratory infections. Some recent studies show
total population, and precipitation rates. The estimated effects are strong impacts coming from maternal exposure.
29
robust to the inclusion of these covariates and remain significant See Behrman and Rosenzweig (2004), Case et al. (2005), and Almond et al. (2005)
for costs and returns of low birthweight.
at the 1 percent level. 30
In theory, it is also possible to find lower birthweight when infant morality
A major concern with any non-experimental study is that the decreases, as marginal babies who would have died before are now born alive.
treatment group and control group are not comparable. Our study This argument is particularly true in a case, for example, where improved medical
is no exception. In Column (4), we present results based on Eq. (2), technology contributes to greater infant survival at the margin (or special health-
which directly controls for linear time trends specific to the DSP care services for newborns at risk). In our case, infant mortality decreased due to
improved pre-natal environment, which is more likely to result in greater birth-
sites. While Eq. (2) represents our preferred specification through- weight. Our finding of positive effect on birthweight is indeed consistent with a
out the paper, it substantially reduces the degrees of freedom, strong association between maternal smoking (which works in a similar way as
causing the standard error to double. However, the magnitude pollution exposure) and reduced birthweight.
Table 4
Identifying the biological mechanism.
(1) (2) (3) (4) (5) (6) (7)

Birthweight Low birthweight Gestation period Short gestation Deaths w/in 1day Deaths w/in 28 days Deaths w/in 6 mo.
TCZ × Post 2.66 −3.42** −0.237 −2.59 −0.85* −2.08** −2.40**

(16.32) (1.70) (0.217) (3.17) (0.44) (0.85) (0.93)
N 1249 1235 657 650 1295 1295 1295
R2 0.78 0.66 0.44 0.23 0.43 0.48 0.50
Pre-1995 mean 3313.2 15.76 39.09 26.47 2.88 6.59 8.00
[116.9] [15.47] [1.80] [118.5] [5.03] [9.35] [11.23]
Year effect Y Y Y Y Y Y Y
District effect Y Y Y Y Y Y Y
HH attributes Y Y Y Y Y Y Y
District attributes Y Y Y Y Y Y Y
Notes: Dependent variables are; birthweight in gram in column (1), the number of infants born with less than 2500 g per 1000 live births in column (2), gestation period in
week in column (3), the number of infants born in less than 32 weeks per 1000 live births in column (4), IMR within one day in column (5), IMR within 28 days in column
(6), and IMR within 6 months in column (7). The mean values of respective variable (weighted by total population) and their standard deviations in the square bracket are
provided based on the observations before 1995.
*
**
share of gestation period below 32 weeks, the lowest one percentile an upward bias. In order to rule out such a possibility, we exam-
level. Neither estimate is significant, indicating that the impact on ine the regulations’ effect on infant mortality by cause of death.
low birthweight is not driven by changes in length of gestation If maternal exposure is a primary channel, we expect to see larger
period. changes in mortality associated with prenatal disorders as opposed
The results above suggest that fetal exposure to pollution affect- to postnatal causes. There is added significance to this analysis;
ing fetal development appears to be a key. However, birthweight the estimates are directly comparable to the falsification test that
and length of gestation period may not fully capture fetal devel- examines the effect on external causes of deaths unrelated to air
opment. Hence, we now explore the effect on deaths occurring at pollution.
different time periods. Infant deaths occurring during the neona- Given that the exact etiology and pathology of diseases caused
tal period (within 28 days after birth) are generally considered to by fetal exposure to air pollution are not yet known, the most
be associated with poor fetal development (Chay and Greenstone, important comparison is between internal and external causes
2003a,b).31 Column (5) presents the impacts on infant deaths that of deaths. We compute IMR due to internal deaths to include all
occurred within one day of birth. The point estimate is small yet health-related, non-accidental causes that are potentially associ-
marginally significant at the 10 percent level, implying that 26 ated with air pollution, other than infant deaths due to external
percent of overall reduction in IMR occurred within one day. The causes of deaths, those that clearly do not pertain to air pollution:
magnitude is in line with Chay and Greenstone (2003b), who esti- injury and poisoning (Chay and Greenstone, 2003a,b).
mate that roughly 22 percent of overall infant deaths occurred As predicted, Table 5 shows statistically significant effects on
within one day. Yet this contrasts to the finding in Chay and mortality from health-related causes.33 When effects are estimated
Greenstone (2003a), who attribute roughly 60 percent of overall separately for four major health-related causes,34 the estimates
impact to infant deaths within one day. Column (6) reveals that correspond to a reduction of 49.7 percent in mortality from ner-
the regulations are disproportionately more associated with the vous system disorders and a reduction of 32.7 percent in circulatory
probability of death during the neonatal period, indicating that system disorders. These findings are consistent with vast literature
63 percent of the effect of the regulations on infant mortality is that also finds strong associations between these birth defects and
due to reductions in this period.32 This corresponds to the find- maternal smoking during pregnancy (which essentially works in
ings in Chay and Greenstone (2003a) and Chay and Greenstone a similar mechanism as air pollution exposure) (see for example
(2003b), whose 73–82 percent and 80 percent of infant mortality Fried, 1995; Brennan et al., 1999).
reductions occurred in the same period, respectively. Overall, these On the other hand, infectious, parasitic and respiratory diseases
findings highlight weak fetal development via maternal exposure did not have significant impacts. Low rates of respiratory diseases
as an important biological mechanism. indicate that these are not typical causes of deaths for infants, par-
A related concern is that the regulations are confounded by ticularly during the neonatal period, who spend most of their times
other concurrent changes in factors contributing to the reductions indoor, while these diseases have been found to be a major cause
in infant mortality. For example, in response to high exposure of deaths for children or at youngest post-neonatal period (See for
to pollution in the TCZ cities, the local governments may have example, Woodruff et al., 1997; Borja-Aburto et al., 1997; Bobak
increased healthcare spending, leading to an improvement in qual- and Leon, 1999; Woodruff et al., 2006).
ity and/or quantity of healthcare services. Then, without directly Most importantly, we find no statistically significant effect on
controlling for a local health policy, a simple DID estimate would mortality from external causes such as injury and poisoning. If
erroneously pick up effects through a healthcare policy, causing
33
Over the study period between 1991 and 2000, 51 percent of infant deaths are
due to diseases of the circulatory system, 19.4 percent come from nervous systems
31
According to the WHO, infant deaths during the neonatal period are also asso- and sense organs, and 19 percent are due to external causes.
34
ciated with preterm birth, intrapartum-related complications, and infections, and Note that this exercise of separately estimating effects for different “internal”
thus we need caution in attributing it solely to fetal development, as there is also a causes do not itself test whether the fundamental cause is due to air pollution or
possibility that post-natal exposure plays a role. others, as it is mentioned that the exact pathway is not known. Indeed, Chay and
32
Note that because the analysis is based on the birth record, we need to keep in Greenstone (2003a,b) do not disentangle effects among internal causes for this rea-
mind that deaths in a shorter period are more likely to be recorded and be cautious son. We do this simply to highlight variation across disease types given our data
about the estimates. We appreciate this discussion by an anonymous referee. capacity to do so.
Table 5
Effects on IMR by cause of death.
(1) (2) (3)

Pre-1995 mean mortality rate Estimated coefficients % in mortality rate
All non-accidental causes 7.61 −2.82*** −37.07

[11.28] (0.98)
All non-accidental causes disaggregated

Nervous system disorders 1.63 −0.81*** −49.69
[3.85] (0.31)
Circulatory system disorders 4.56 −1.49* −32.68
[7.18] (0.76)
Infectious and parasitic diseases 1.15 −0.38
[2.49] (0.29)
Respiratory system disorders 0.13 −0.03
[0.77] (0.13)
Accidental causes
Injury and poisoning 2.01 −0.39
[7.21] (0.48)
Notes: All specifications include year fixed effects, DSP site fixed effects, household attributes, and district attributes. The number of observations is 1281. Standard deviations
are reported in the square brackets, and robust standard errors are reported in the parentheses.
*
***
households had had improved access to healthcare services, there Table 6

Heterogeneity in effect.
would likely have been reductions in infant deaths by external
causes as well. Dependent variable
While this does not directly rule out the possibility that the reg-
IMR Low birthweight Short pregnancy
ulations spuriously pick up the effect through unobserved changes
Boys −3.12*** −2.06 −0.677
that are correlated with internal causes of deaths but not with
(1.20) (1.52) (3.47)
external ones, the finding suggests that the main results are not Girls −3.41*** −4.98** −3.11
driven by any other channels, leading us to believe that the relation- (1.18) (2.29) (3.60)
ship is causal. We provide a number of further robustness checks Mother’s education < H.S. −2.21 −3.85* −3.43
in Section 5.4. (2.15) (2.16) (3.65)
Mother’s education ≥ H.S. 4.45 −4.94** −1.52
(2.98) (2.29) (1.65)
Notes: Each cell reports the coefficient of interests and standard errors in the paren-
5.3. Heterogeneity in the regulations effect theses from separate regressions for respective subsample. The dependent variables
are IMR, the number of infants born at less than 2500 g per 1000 live births, and the
This part tests the hypothesis that the regulations on air pol- number of infants born in less than 32 weeks per 1000 live births, respectively.
lution may have a heterogeneous impact on infant mortality Low maternal education is defined as educational attainment less than high school,
and high maternal education is equal to or above high school completion. Note that
across various subsamples. We first search for heterogeneity in
the analysis based on mother’s education inevitably drops deaths associated with
the treatment effects between boys and girls due to biologically missing mother’s education.
based gender differences given that, in the literature, male fetuses *
**
are considered to be more physiologically sensitive than female Significant at p < 0.05 level.
***
fetuses to environmental changes. On the other hand, heteroge- Significant at p < 0.01 level.
neous effects may also reflect gender discrimination, particularly

when infant deaths occur after birth. For example, if boys were threshold for male fetuses. We do not find any significant effects
initially more likely to be protected from exposure to pollution or on the length of gestation period.
were more likely to receive medical treatments for health prob- Next we explore whether the effect differs according to the
lems caused by pollution, then the effect of pollution reductions level of mother’s education. Such heterogeneity is likely to arise,
would be more pronounced for girls. The first and second rows because behavior is known to be an important determinant in
in Table 6 report the estimated effects for boys and girls, respec- health production function when it comes to pollution (Zivin and
tively. The coefficients suggest that the effect on infant mortality is Neidell, 2009; Moretti and Neidell, 2011; Deschenes et al., 2012).35
larger among girls than boys, yet the difference is not statistically The effect may be amplified among households with low lev-
significant. This indicates that pollution reduction effect on infant els of maternal education for many reasons. For example, infants
mortality was similar for boys and girls. In contrast, the impact and fetuses of poorly educated mothers tend to have a lower ini-
on birthweight is substantially higher for girls than for boys. The tial health endowment, making them more liable to be adversely
interpretation of the results requires a caution. On one hand, the impacted by air pollution initially. Alternatively, mothers with
evidence is consistent with the gender bias hypothesis in that girls greater educational attainment are more likely to know how to pro-
are more sensitive to changes in environment on a condition that tect their children from being exposed to pollution outside and/or
parents had a means to identify the gender of the fetus, which
changed their behaviors. On the other hand, the finding is also
consistent with the literature that female fetuses have a higher 35
In empirical studies, Jayachandran (2009) finds greater wild fire air pollution
threshold at which pollution leads to mortality than boys do, and effect on low-income households in Indonesia. Arceo et al. (2012) finds that the
thus only girls were affected when pollution decreased somewhat estimated marginal effect of carbon monoxide is larger in Mexico compared to the
from initially high levels to lower levels, which were still above the U.S. estimates, while the effect of PM10 is similar between the two countries.
have more access to health services to treat their children. Fami- Table 7
Robustness checks.
lies with high socioeconomic status also have a greater degree of
mobility to better areas, while the poor may continue to be exposed Specifications Coeff.
to greater pollution. On the other hand, the effect may be smaller Use death record −3.45*
among poor households, if long-term exposure to pollution in poor (1.84)
areas allowed them to be more adept at keeping infants indoors. Use 1996–2000 −3.57**
The third row reports the estimated effects for the sample (1.44)
Only districts and cities −3.62**
of households where maternal education was less than a high
(1.64)
school degree, and the fourth row for the sample of households Common support −3.24***
where mothers attained at least a high school education. We find (1.12)
that the regulations’ effect is substantially higher among house- Eliminate outliers −2.73***
(0.95)
holds with low maternal education.36 The finding suggests that
Include province × year effects −3.50***
the regulations’ effect on infant mortality should be stronger for (1.27)
the low-socioeconomic families that are more vulnerable to the Control for 1995–1997 −4.08***
effects of air pollution. (1.56)
Weight by number of birth −2.92**
(1.26)
5.4. Additional robustness checks Use 1995 cut-off −2.92*
(1.61)
The findings above leave little room for the scope of confound- Cluster at prefecture level −3.205***
(1.076)
ing factors. First, little association between TCZ status and trends Only northern China −3.138*
in observable characteristics limits the possibility that the main (1.657)
results erroneously reflect time trends that vary systematically Only southern China −3.258**
between the TCZ and the non-TCZ cities. Second, the consistency (1.497)
Use 1991–1997 sample −1.967
of the regulations’ effect in both magnitude and statistical signif-
(1.901)
icance when controlling for the set of key determinants of infant
Notes: The table provides robustness checks of the main results to various other
mortality suggests that the estimated effects are robust to compar-
hypotheses and specifications. See the text for their explanations.
isons with similar characteristics. Third, the absence of treatment *
effect in the falsification exercise on external infant mortality pro- **
vides strong evidence that health care system reform or medical ***
technology advancement cannot be a source of bias.
In this subsection, we extensively explore additional robust-
ness checks to rule out other possible scenarios. First, we examine
whether the finding is robust when using an alternative dataset. restrict the sample to only districts and county-level cities, mostly
As discussed above, using the birth record, which reports the urban areas, in the third row. The estimate is larger and remains sig-
occurrence of deaths only within the calendar year, may result in nificant at the 5 percent level, indicating that the treatment effect is
understating the effect, if the number of infant deaths is truncated not driven by simple comparisons between urban and rural areas.
at zero. The death record allows us to compute IMR for all deaths A major concern with any non-experimental studies is the pos-
occurring before age of one. As expected, the first row of Table 7 sibility that omitted heterogeneity may give rise to a spurious
shows that the size of the estimate becomes larger, though not sub- relationship. Controlling for the attributes, as in columns (2) and
stantially different from the main result, indicating that the effect (3) in Table 3, may not solve this issue when we cannot compare
is not sensitive to using the death record, while the main analysis the distribution of attributes across TCZ and non-TCZ cities. We
may understate the overall impact, if any. address this issue in two ways. In the fourth row, we limit to the
Second, we confirm that the treatment effect is not driven by observations under common support using the propensity score,
other national or local policy changes. In the second row, we restrict which potentially restricts the sample to DSP sites that have similar
the sample to the years between 1996 and 2000, which corresponds observed characteristics. In doing so, we first compute propensity
to the period of the 9th Five-Year Plan. The fact that the time period scores of being TCZ based on households and DSP attributes used
is shorter and falls within one policy regime helps reduce a set of in the main analysis. Then, we re-estimate the treatment effects
potential confounders in the pre- and post-natal health environ- using observations only under the common support. Alternatively,
ment other than pollution. Also, Table 2 shows that all observed in the fifth row, we examine whether the main findings are robust
characteristics had balanced trends during this short period. The to eliminating outliers. Specifically, we eliminate DSP sites whose
estimated effect is similar, suggesting that various other national IMR are above 99th percentile. The orders of the both magnitudes
or local policy changes should not confound the effect. are similar.
Third, despite evidence that the treatment effect is robust to het- Another concern is that there may be unobserved policy changes
erogeneous city-specific trends in IMR, as predicted in Table 3, there that affected infant mortality. In addressing this, we control for
may still be a concern that the TCZ status may be correlated with province times year fixed effects in the sixth row. In this specifica-
administrative divisions. For example, urban districts and county- tion, the effect of the regulations on infant mortality is identified
level cities may be more likely to be treated, whereas poor counties using variation in regulatory stringency before and after 1998
may be less likely to be assigned as TCZ. To address this issue, we within the province, thus purging any potential effects resulting
from any other policy changes at the provincial level. Further, in
the seventh row, we control for the years between 1995 and 1997,
the intermediate period after the APPCL and before the TCZ pol-
36
Note that, although neither of these estimates in themselves nor the differences icy was in effect. Note that this additional variable controls for the
between them are statistically significant, when we restrict the IMR to internal
causes, the coefficient to low maternal education level is −4.40 and significant
immediate impacts of APPCL, if any, but does not directly rule out
at the 5 percent level, whereas that to high maternal education is −0.34 and not delayed impacts of APPCL after 1998. Both of the estimates are again
statistically significant. unchanged.
Next, we re-estimate the main analysis using different spec- the TCZ policy, which went into effect in 1998, was one of the
ifications. Namely, we weight the regressions using the number largest-scale air pollution regulatory schemes ever implemented
of population aged 0 in the eighth row;37 we use 1995 as a cut- in a developing country, imposing stringent regulations on pollut-
off year instead of 1998 to incorporate all years after APPCL was ant emissions from power plants in cities exceeding the nationally
enacted in the ninth row;38 we cluster the standard error at the mandated standards.
prefecture level in the tenth row;39 we use only northern China in The major objective of this paper is to test the hypothesis that
the eleventh row; we use only southern China in the twelfth row.40 these regulations led to reductions in infant mortality within the
All these estimates effects remain similar, showing robustness to TCZ cities subjected to particularly stringent regulations. Using the
alternative specifications. difference-in-differences approach, comparing changes in infant
Lastly, we repeat the main analysis using samples only in the mortality between the cities assigned and not assigned as the TCZ,
pre-reform period in the last row. We use 1995 as a placebo cut- before and after the policy reform, we find substantial impacts:
off year, and thus years between 1995 and 1997 are defined as infant mortality decreased by 20 percent; a large fraction of the
“post”-reform observations. The point estimate is lower and not reduction occurred in the neonatal period, which can be attributed
statistically indistinguishable from zero, reassuring that the trends to fetal exposure; and infants of mothers with low levels of educa-
between the TCZ and non-TCZ cites are similar in the pre-reform tion benefited the most.
period. The set of falsification tests and robustness checks limits the
Taken all above together, there is no evidence to indicate that role of omitted variables in biasing these estimates, leading us
the main results are driven by inappropriate identification assump- to believe that the linkage between the air pollution regulations
tions, leading us to believe that the relationship is causal. The and infant mortality reduction is causal. First, the estimates are
collection of these robustness checks substantially limits the scope robust to DSP site-specific trends in IMR. Second, we confirm that
of omitted variables, leading us to believe that the main find- the treatment effect is absent for infant deaths caused by external
ings substantiate the causal impact of environmental protection causes, ruling out a potential mechanism through improved local
on infant mortality. healthcare system. Lastly, the estimates are robust to various alter-
native hypotheses, i.e., using the death record, controlling for other
6. Conclusions policies at national or local levels, and limiting to only urban areas.
The findings in this study have important implications for pol-
China suffers from notoriously bad air pollution, the health icy. First, the question of whether, and to what extent, air pollution
effects of which have been of increasing public concern. In 1998, regulations in developing countries can lead to reducing infant
mortality remains unanswered. This study highlights a significant
reduction in infant mortality correlating to air pollution reductions,
37 in contrast to Greenstone and Hanna (2011) who find that air pol-
This intends to adjust the regression weighted by the number of birth, where
we do not have information on actual numbers of birth. lution reductions in India had only modest and insignificant impact
38
The main analysis uses 1998 as a cut-off year for the post-reform period, rather on infant mortality. Our results substantiate compelling evidence
than 1995 when the APPCL was amended, based on both theoretical and empir- to support air pollution regulations in countries that suffer from
ical rationales. For theoretical purposes, it is plausible to take two to three years high levels of air pollution. The size of the benefits itself may not
before the regulations are carried out to the full extent. Chay and Greenstone (2005)
are based on a similar argument when they use 1975 nonattainment status as an
be enough to justify environmental protection without consider-
instrumental variable in estimating the effect of 1970 Clean Air Act on housing ing their costs. In the United States, the Clean Air Act Amendments
prices between 1970 and 1980. In their context, the nonattainment status changes are found to have caused distortions on productivity (Gollop and
every year. By using the mid-decade regulation, they also take into account a two- Roberts, 1983; Barbera and McConnell, 1990; Greenstone et al.,
to three-year lag before the policy was fully executed. This is also relevant in our
2012), firm’s location decisions (Henderson, 1996; Becker and
context because the regulations required power plants to alter the energy sources
and install costly technology (such as FGD). Informal conversations with officials at Henderson, 2000; List et al., 2003); employment (Greenstone,
local power plants provide anecdotal evidence to support this assertion. Time-lags 2002; Deschenes, 2010; Walker, 2011, 2013); and foreign direct
are also likely in China because it is common for the government to set policy targets investment inflows and outflows (Eskeland and Harrison, 2003;
or guidelines, often very ambitious ones, without specifying the critical details until Keller and Levinson, 2002; Hanna, 2010). On the other hand, Tanaka
later, thereby largely leaving implementation up to the local governments or indi-
vidual firms. Further, the 1995 amendment had a weak implementation mechanism,
et al. (2014) provide compelling evidence that polluting firms in the
and more drastic actions (such as shutting down numerous inefficient power plants TCZ cities substantially improved economic performance through
and enforcing stringent air pollution regulations) were enforced only after the TCZ increased market dynamics via the entry of more efficient firms
policy in 1998. The empirical rationale for the cut-off date is based on the finding and the exit of less efficient ones. In addition, costs of reducing
that the interaction term between the TCZ status and the post-1998 period better
pollution are likely to be smaller under convex marginal cost func-
balances important determinants of infant mortality, compared with using 1995 as
a cut-off year, suggesting that the former is less likely to be confounded. Further, tions. Therefore, our finding is likely to pass cost-benefit analysis
using a 1998 cut-off year enables us to restrict the sample to observations between and suggest further implementation of air pollution regulations in
1996 and 2000, where the interaction term is not correlated with any observable similarly polluted countries.
variables. Therefore, using 1998 better averts omitted variable bias. Second, while the precise mechanisms through which air pol-
39
The main results are clustered at DSP level for two reasons. (1) It is conventional
to cluster the standard errors at the treatment-site level, and in our case, the treat-
lution reductions lead to health benefits is not known, our findings
ment status varies at the DSP level, not the prefecture level. There are cases that one highlight substantial reductions in infant mortality during the
prefecture accommodates multiple DSP sites that have different treatment status, neonatal period, shedding light on maternal exposure to pollu-
i.e., one is a district or county-level city, while the other is a county. (2) Indeed, tion as a potential pathophysiologic mechanism. This necessitates
most DSP sites are in different prefecture. Out of 145 DSP sites, there are only 12
additional policy interventions to protect pregnant women against
cases that we observe multiple sites in a single prefecture (two of which are three
DSP sites in one prefecture, and the rest of the cases include two DSP sites in one environmental risks.
prefecture). This robustness check clustering at the prefecture level should thus be Third, our findings identify that children in households with
seen as accounting for serial correlations within prefecture. low maternal education are particularly susceptible to fluctua-
40
The northern and southern China is defined as prefectures accommodating SO2 tions in air quality. Although all individuals are potentially exposed
control zones (north) or acid rain control zone (south) (or more so if accommo-
dating both). We eliminated Tibet Autonomous Region, Qinghai Province, Xinjiang
to ambient pollution, the evidence indicates that socioeconomic
Autonomous Region, as they are not typically perceived as either northern or south- status cushions the effect of air pollution, either through behav-
ern China. ioral factors in avoiding pollution or socioeconomic factors such
as increased access to medical care. As such, our findings provide Currie, J., Hyson, R., 1999. Is the impact of health shocks cushioned by socioeco-
justifications to interventions targeting low-income households, nomic status? The case of low birthweight. American Economic Review 89 (2),
245–250.
including information provision about air pollution effect. Currie, J., Neidell, M., 2005. Air pollution and infant health: what can we learn
This study has clear policy implications for developing countries from California’s recent experience? Quarterly Journal of Economics 120 (3),
in general: namely, while climate change does not currently appear 1003–1030.
Currie, J., Neidell, M.J., Schmieder, J., 2009. Air pollution and infant health: lessons
to be a sufficiently strong motivation for these countries to embark from New Jersey. Journal of Health Economics 28 (3), 688–703.
on more aggressive air pollution regulations, our findings leave Dejmek, J., Selevan, S.G., Benes, L., Solansky, I., Sram, R.J., 1999. Fetal growth and
little doubt that protecting the environment is vital to improving maternal exposure to particulate matter during pregnancy. Environmental
Health Perspectives 107 (6), 475–480.
domestic public health.
Deschenes, O., 2010. Climate policy and labor markets. NBER Working Paper No.
16111.
Acknowledgements Deschenes, O., Greenstone, M., Shapiro, J.S., 2012. Defensive investments and the
demand for air quality: evidence from the NOX budget program and ozone
reductions. NBER Working Paper No. 18267.
I am indebted to Daniele Paserman, Dilip Mookherjee, Tavneet Eskeland, G.S., Harrison, A.E., 2003. Moving to greener pastures? Multinationals and
Suri, and Wesley Yin for invaluable advice and feedback. I am also the pollution haven hypothesis. Journal of Development Economics 70 (1), 1–23.
Fried, P.A., 1995. Prenatal exposure to marihuana and tobacco during infant infancy,
grateful to Lucas Davis, Esther Duflo, Michael Greenstone, Hsueh- early and middle childhood: effects and an attempt at synthesis. Archives of
Ling Huynh, Kelsey Jack, Ginger Zhe Jin, Hiroaki Kaido, Kevin Lang, Toxicology Supplement 17, 233–260.
Adriana Lleras-Muney, Michael Manove, Kenneth A. Rahn, Leena Gollop, F.M., Roberts, M.J., 1983. Environmental regulations and productivity
growth: the case of fossil-fueled electric power generation. Journal of Political
Rudanko, Marc Rysman, Johannes Schmieder, Jeremy Smith, three
Economy 91 (4), 654–674.
anonymous reviewers, and seminar participants at Boston Univer- Greenstone, M., 2002. The impacts of environmental regulations in industrial activ-
sity, Hiroshima University, Loyola Marymount University, National ity: evidence from the 1970 and 1977 Clean Air Act Amendments and the census
of manufactures. Journal of Political Economy 110 (6), 1175–1219.
University of Singapore, Tufts University, University of Washing-
Greenstone, M., List, J.A., Syverson, C., 2012. The effects of environmental regulation
ton, 2010 NEUDC, and the 2011 Royal Economic Society for their on the competitiveness of U.S. manufacturing. NBER Working Paper 18392.
comments and suggestions. I also thank the Chinese Center for Dis- Greenstone, M., Hanna, R., 2011. Environmental regulations, air and water pollution,
ease Control and Prevention for sharing the data. Financial support and infant mortality in India. NBER Working Paper 17210.
Hanna, R., 2010. US environmental regulation and FDI: evidence from a panel of
from the Institute for Economic Development at Boston Univer- US-based multinational firms. American Economic Journal: Applied Economics
sity, as well as Hewlett/IIE Dissertation Fellowship in Population, 2 (3), 158–189.
Reproductive Health and Economics, is gratefully acknowledged. Hao, J., He, K., Duan, L., Li, J., Wang, L., 2007. Air pollution and its control in
China. Frontiers of Environmental Science and Engineering in China 1 (2),
Jieshuang He provided excellent research assistance. All remaining 129–142.
errors are my own. Hao, J., Wang, S., Liu, B., He, K., 2001. Plotting of acid rain and sulfur dioxide pollution
control zones and integrated control planning in China. Water, Air, and Soil
Pollution 130, 259–264.
References He, K., Huo, H., Zhang, Q., 2002. Urban air pollution in China: current status, char-
acteristics, and progress. Annual Review of Energy and the Environment 27,
Almond, D., Chay, K.Y., Lee, D.S., 2005. The costs of low birth weight. Quarterly 397–431.
Journal of Economics 120 (3), 1031–1083. Hedley, A.J., Wong, C.M., Thach, T.Q., Ma, S., Lam, T.H., Anderson, H.R., 2002. Car-
Almond, D., Chen, Y., Greenstone, M., Li, H., 2009. Winter heating or clean air? Unin- diorespiratory and all-cause mortality after restrictions on sulphur content fuel
tended impacts of China’s Huai River policy. American Economic Review Papers in Hong Kong: an intervention study. Lancet 360 (9346), 1646–1652.
and Proceedings 99 (2), 184–190. Henderson, V., 1996. Effects of air quality regulation. American Economy Review 86
Altonji, J.G., Elder, T.E., Taber, C.R., 2005. Selection on observed and unobserved (4), 789–813.
variables: assessing the effectiveness of catholic schools. Journal of Political Jayachandran, S., 2009. Air quality and early-life mortality: evidence from
Economy 113 (1), 151–184. Indonesia’s wildfires. Journal of Human Resources 44 (4), 916–954.
Arceo, E., Hanna, R., Oliva, P., 2012. Does the effect of pollution on infant mortality Keller, W., Levinson, A., 2002. Pollution abatement costs and foreign direct invest-
differ between developing and developed countries? Evidence from Mexico City. ment inflows to U.S. states. Review of Economics and Statistics 84 (4),
NBER Working Paper No. 18349. 691–703.
Aunan, K., Pan, X.C., 2004. Exposure-response functions for health effects of ambi- Knittel, C., Miller, D.L., Sanders, N.J., 2011. “Caution, Drivers! Children Present: Traf-
ent air pollution applicable for China: a meta-analysis. Science of the Total fic, Pollution, and Infant Health,” NBER Working Paper No. 17222.
Environment 329, 3–16. Kumar, N., Foster, A., 2007. Respiratory Health Effects of Air Pollution in Delhi and
Barbera, A.J., McConnell, V.D., 1990. The impact of environmental regulations on its Neighboring Areas. Mimeo, India.
industry productivity: direct and indirect Effects. Journal of Environmental Eco- Laden, F., Neas, L.M., Dokery, D.W., Schwartz, J., 2000. Association of fine particulate
nomics and Management 18 (1), 50–65. matter from different sources with daily mortality in six U.S. cities. Environ-
Becker, R., Henderson, V., 2000. Effects of air quality regulations on polluting indus- mental Perspectives 108 (10), 941–947.
tries. Journal of Political Economy 108 (2), 379–421. Levy, J.I., Hammitt, J.K., Spengler, J.D., 2000. Estimating the mortality impacts of
Behrman, J.R., Rosenzweig, M.R., 2004. Returns to birthweight. Review of Economics particulate matter: what can be learned from between-study variability? Envi-
and Statistics 86 (2), 586–601. ronmental Health Perspectives 108 (2), 109–117.
Bleakley, H., 2007. Disease and development: evidence from hookworm eradication List, J.A., Millimet, D.L., Fredriksson, P.G., McHone, W.W., 2003. Effects of envi-
in the American South. Quarterly Journal of Economics 122 (1), 73–117. ronmental regulations on manufacturing plant births: evidence from a
Bobak, M., Leon, D.A., 1999. The effect of air pollution on infant mortality appears propensity score matching estimator. Review of Economics and Statistics 85 (4),
specific for respiratory causes in the postneonatal period. Epidemiology 10 (6), 944–952.
666–670. Luechinger, S., 2014. Air pollution and infant mortality: a natural experiment from
Borja-Aburto, V.H., Loomis, D.P., Bangdiwala, S.I., Shy, C.M., Rascon-Pacheco, R.A., power plant desulfurization. Journal of Health Economics 37, 219–231.
1997. Ozone, suspended particulates, and daily mortality in Mexico City. Amer- Matus, K., Nam, K.M., Selin, N.E., Lamsal, L.N., Reilly, J.M., Paltsev, S., 2012. Health
ican Journal of Epidemiology 145 (3), 258–268. damages from air pollution in China. Global Environmental Change 22 (1),
Brennan, P.A., Grekin, E.R., Mednick, S.A., 1999. Maternal smoking during preg- 55–66.
nancy and adult male criminal outcomes. Archives of General Psychiatry 56 Mendelsohn, R., Orcutt, G., 1979. An empirical analysis of air pollution
(3), 215–219. dose–response curves. Journal of Environmental Economics and Management
Case, A., Fertig, A., Paxson, C., 2005. The lasting impact of childhood health and 6 (2), 85–106.
circumstance. Journal of Health Economics 24 (2), 365–389. Moretti, E., Neidell, M., 2011. Pollution, health, and avoidance behavior: evidence
Case, A., Lubotsky, D., Paxson, C., 2002. Economic status and health in childhood: from the ports of Los Angeles. Journal of Human Resources 46 (1), 154–175.
the origins of the gradient. American Economic Review 92 (5), 1308–1334. National Bureau of Statistics of China, 2006. http://www.stats.gov.cn/
Chay, K.Y., Greenstone, M., 2003a. The impacts of air pollution on infant mortality: Nielsen, C.P., Ho, H.S., 2007. Air pollution and health damages in China: an intro-
evidence from geographic variation in pollution shocks induced by a recession. duction and reviews. In: Ho, Mun, S., Chris, P., Nielsen (Eds.), Clearing the Air:
Quarterly Journal of Economics 118 (3), 1121–1167. The Health and Economic Damages of Air Pollution in China. The MIT Press,
Chay, K.Y., Greenstone, M., 2003b. Air quality, infant mortality, and the Clean Air Act Cambridge.
of 1970. NBER Working Papers No. 10053. Perera, F.P., Jedrychowski, W., Rauh, V., Whyatt, R.M., 1999. Molecular epidemi-
Chay, K.Y., Greenstone, M., 2005. Does air quality matter? Evidence from the housing ological research on the effects of environmental pollutants on the fetus.
market. Journal of Political Economy 113 (2), 376–424. Environmental Health Perspectives Supplements 107 (3), 451–460.
Qian, J., Zhang, K., 1998. China’s desulfurization potential. Energy Policy 26 (4), Walker, R.W., 2011. Environmental regulation and labor reallocation: evidence from
345–351. the clean air act. American Economics Review Papers & Proceedings 101 (3),
Qian, N., 2008. Missing women and the price of tea in China: the effect of sex- 442–447.
specific earnings on sex imbalance. Quarterly Journal of Economics 123 (3), Walker, R.W., 2013. The transitional costs of sectoral reallocation: evidence from
1251–1285. the Clean Air Act and the workforce. Quarterly Journal of Economics 128 (4),
Samet, J.M., Zeger, S.L., Dominici, F., Curriero, F., Coursac, I., Douglas, W., Dockery, J.S., 1787–1835.
Zanobetti, A., 2000. The National Morbidity, Mortality, and Air Pollution Study. Woodruff, T.J., Grillo, J., Schoendorf, K.C., 1997. The relationship between selected
Part II. Morbidity, Mortality, and Air Pollution in the United States. Health Effects causes of postneonatal infant mortality and particulate air pollution in the
Institute, Boston. United States. Environmental Health Perspectives 105 (6), 608–612.
Schwartz, J., Dockery, D.W., Neas, L.M., 1996. Is daily mortality associated specifically Woodruff, T.J., Parker, J.D., Schoendorf, K.C., 2006. Fine particulate matter (PM2.5)
with fine particles? Journal of the Air and Waste management Association 46 air pollution and selected causes of postneonatal infant mortality in California.
(10), 927–939. Environmental Health Perspectives 114 (5), 786–790.
Schwartz, J., Marcus, A., 1990. Mortality and air pollution in London: a time series World Bank, 1998. World Development Indicators. The World Bank, Washington,
analysis. American Journal of Epidemiology 131, 185–194. D.C.
State Council, 1998. Official Reply to the State Council Concerning Acid Rain Control World Health Organization (WHO), 2002. Air Quality Guidelines for Europe, 2nd ed,
Areas and Sulfur Dioxide Pollution Control Areas. Copenhagen.
State Environmental Protection Agency (SEPA), 1996. The Report on Environmental Yang, J., Cao, D., Ge, C., Gao, S., 2002. Air pollution control strategy for China’s power
Quality in 1991–1995. SEPA. sector. In: Chinese Academy for Environmental Planning, Beijing.
Tanaka, S., Yin, W., Jefferson, G., 2014. Environmental Regulation and Industrial Yang, J., Schreifels, J., 2003. Implementing SO2 emissions in China. In: Presented in
Performance: Evidence from China. Mimeo. OECD Global Forum on Sustainable Development. Emissions Trading, Paris.
United Nations Environment Programme, 2009. Two Control Zone Plan and Pro- Zivin, J.G., Neidell, M., 2009. Day of haze: environmental information disclosure
gram to Control Sulfur Pollution, Available at: http://www.ekh.unep.org/files/ and intertemporal avoidance behavior. Journal of Environmental Economics and
GP-2.pdf Management 58 (2), 119–128.

How do health insurer market concentration and bargaining power

with hospitals affect health insurance premiums?
Erin E. Trish a,∗ , Bradley J. Herring b,1
a
Leonard D. Schaeffer Center for Health Policy and Economics, University of Southern California, and Department of Health Policy and Management,
University of California, Los Angeles Verna and Peter Dauterive Hall 301-3 635 Downey Way, Los Angeles, CA 90089-3333, United States
b
Johns Hopkins Bloomberg School of Public Health, Department of Health Policy and Management, 624 North Broadway, Room 408, Baltimore, MD 21205,
United States
Article history: The US health insurance industry is highly concentrated, and health insurance premiums are high and
Received 17 April 2014 rising rapidly. Policymakers have focused on the possible link between the two, leading to ACA pro-
Received in revised form 5 November 2014 visions to increase insurer competition. However, while market power may enable insurers to include
higher profit margins in their premiums, it may also result in stronger bargaining leverage with hospitals
Available online 8 April 2015
to negotiate lower payment rates to partially offset these higher premiums. We empirically examine the
relationship between employer-sponsored fully-insured health insurance premiums and the level of con-
centration in local insurer and hospital markets using the nationally-representative 2006–2011 KFF/HRET
I11
L11
Employer Health Benefits Survey. We exploit a unique feature of employer-sponsored insurance, in which
L41 self-insured employers purchase only administrative services from managed care organizations, to dis-
D4 entangle these different effects on insurer concentration by constructing one concentration measure
representing fully-insured plans’ transactions with employers and the other concentration measure rep-
Keywords: resenting insurers’ bargaining with hospitals. As expected, we find that premiums are indeed higher for
Insurance
plans sold in markets with higher levels of concentration relevant to insurer transactions with employers,
Competition
lower for plans in markets with higher levels of insurer concentration relevant to insurer bargaining with
Hospitals
Premiums hospitals, and higher for plans in markets with higher levels of hospital market concentration.
Bargaining power © 2015 Elsevier B.V. All rights reserved.
1. Introduction by the Department of Justice (DOJ) and Federal Trade Commission

(FTC) in their Horizontal Merger Guidelines (2010).
The US healthcare industry has become increasingly consoli- This increased level of interest in insurer concentration is war-
dated. While the wave of hospital mergers in the 1990s gave way ranted for several reasons. Understanding the effects of these high
to numerous studies of the implications of hospital consolidation, levels of market concentration and their implications for premi-
newfound attention in recent years has focused on consolidation ums is valuable generally, but particularly so for an industry facing
in the US health insurance industry. Robinson (2004) documents such high and rapidly rising premiums. Further, there are a num-
the increasing concentration of these markets over the first half of ber of policy provisions included in the 2010 Patient Protection
the 2000s, as well as the predominance of insurance markets dom- and Affordable Care Act (ACA) that have important implications for
inated by a small number of large, nationwide insurers. Similarly, the level of competition in the US health insurance industry. The
a report from the American Medical Association (2013) highlights creation of health insurance exchanges and the inclusion of vari-
the preponderance of health insurance markets across the country ant forms of health insurers (such as CO-OP plans and nonprofit
that are highly concentrated, as defined by the standards set forth plans directed by Office of Personnel Management) as competitors
alongside more traditional insurers are examples of ACA provi-
sions targeted toward increasing competition in the private health
insurance industry.
However, the ultimate effect of the level of health insurance
∗ Corresponding author. Tel.: +1 213 821 6178.
concentration on health insurance premiums is not straightfor-
E-mail addresses: etrish@healthpolicy.usc.edu (E.E. Trish), herring@jhu.edu
(B.J. Herring). ward, because there are potentially offsetting effects of the level
1
Tel.: +1 410 614 5967. of insurer competition on premiums. On one hand, higher levels
E.E. Trish, B.J. Herring / Journal of Health Economics 42 (2015) 104–114 105
of insurer concentration should lead to increased insurer market The second HHI market concentration measure focuses on the
power in the markets where insurance is sold (to employers and hospital price’s portion of the premium tied to the negotiations
individuals), likely resulting in relatively higher premiums due to between insurers and hospitals. While self-insured enrollment rep-
higher plan profit margins, all else equal. On the other hand, insur- resents a distinct product that is sold to employers, the insurer’s
ers also engage in bilateral bargaining over transaction prices with patient volume across the entire combined “book of business” (i.e.,
providers, one of the key drivers of insurer costs. Thus, higher levels the fully-insured market and the self-insured market) represents
of insurer market concentration may yield stronger insurer bar- its market share relevant to the price negotiations with hospitals.
gaining leverage with local providers, thereby enabling them to We therefore use these HealthLeaders-InterStudy data to mea-
negotiate lower provider prices, which may partly be passed on sure each plan’s fully-insured and self-insured combined market
to insurance purchasers in the form of lower premiums. This pur- share in this HHI calculation representing insurer bargaining with
chasing power effect is particularly important, given the recent providers. We hypothesize that concentration in the fully-insured
movement toward increased consolidation among provider mar- and self-insured markets combined will be associated with rela-
kets driven by the ACA and other trends (Cutler and Scott Morton, tively lower health insurance premiums. (We also hypothesize that
2013). higher hospital market concentration – derived from the American
Moreover, the effects of insurer market power may depend on Hospital Association’s (AHA) Annual Survey – will be associated
the amount of provider market power, and vice versa. The extent with relatively higher health insurance premiums.)
to which insurers can use their bargaining leverage to negotiate Using plan-level premium data from the restricted-use
lower provider prices likely depends on the level of competition in Kaiser Family Foundation/Health Research and Educational Trust
the local provider market, as these prices may already be at or near (KFF/HRET) Employer Health Benefits Survey for years 2006
the point at which economic profits are zero in relatively competi- through 2011, we find that premiums are indeed higher among
tive provider markets. Furthermore, the extent to which hospitals markets with higher levels of insurer concentration representing
can use their bargaining leverage likely depends on local insur- fully-insured coverage sold to employers (and higher among more
ance market conditions. A better understanding of the extent to concentrated hospital markets), and we find that premiums are
which higher prices resulting from concentrated provider markets indeed lower among markets with higher levels of insurer con-
are passed through to consumers in the form of higher premiums centration representing insurer bargaining with hospitals (derived
(rather than simply representing a transfer of rents from insurers from combined fully-insured and self-insured market shares).
to providers) is particularly relevant for antitrust enforcement in Regarding the organization of the remainder of the paper, we
terms of evaluating the extent to which hospital market consolida- first summarize the relevant literature on the effects of insurer and
tion ultimately harms consumers.2 hospital concentration and then describe the conceptual frame-
work. We then explain our empirical model, data, and market defi-
nitions. Our results, discussion, limitations, and conclusions follow.
1.1. Our empirical contribution
In this paper, we empirically analyze the relationships between 2. Relevant literature

insurer concentration, hospital concentration, and employer-
sponsored health insurance premiums. Our primary empirical The majority of studies related to competition in the US
contribution is that we identify a way to disentangle insurer con- healthcare industry over the past few decades have focused on
centration’s differing effects on higher insurer profits and lower competition and consolidation among hospitals. Gaynor and Vogt
provider prices. We do so by exploiting a unique feature of the mar- (2000), Vogt and Town (2006), and Gaynor and Town (2011, 2012)
ket for employer-sponsored insurance whereby smaller employers provide excellent reviews of this literature. While many of these
tend to purchase fully-insured coverage whereas larger employers studies yield unique findings, the results generally suggest that
tend to self-insure and purchase only administrative services from increasing consolidation in the hospital industry is associated with
managed care plans (such as provider network assembly and claims higher hospital prices.
processing). An insurer’s market share in the fully-insured market The literature on the association between insurance premiums
is mainly relevant to the plan’s profits, while an insurer’s market and the level of competition in the US health insurance indus-
share in the fully-insured and self-insured markets combined is try, particularly within the employer-sponsored market, is more
mainly relevant to provider prices. limited, largely due to data limitations. Early studies by Wholey
More specifically, we construct two distinct measures of health et al. (1995) and Dranove et al. (2003) find that markets with
insurance market concentration to disentangle these two effects. more HMO competitors are associated with lower premiums. Dafny
Both concentration measures use the HealthLeaders-InterStudy (2010) finds evidence of price discrimination as a consequence of
census of private insurers to construct Herfindahl-Hirschman insurer market power. Using a proprietary dataset containing infor-
Indices (HHI) of market concentration, and we consider HHIs mation about the insurance benefits offered by large employers
alternatively using Core-Based Statistical Areas (CBSA), with the between 1998 and 2005, she utilizes variation in the profitability
Metropolitan Divisions therein, and counties as the geographic of these large employers to illustrate that insurers in concen-
market boundaries. One HHI market concentration measure trated insurance markets impose higher premium increases on
focuses on the profit portion of the premium’s administrative over- more profitable employers (assumed to be less price sensitive).
head tied to the transactions between fully-insured plans and Dafny et al. (2012) observe a positive effect of insurer consolidation
employers by only using HealthLeaders-InterStudy’s fully-insured on health insurance premiums by exploiting the 1999 merger of
plans in its HHI’s market share calculation. We hypothesize that, nationwide insurers Aetna and Prudential as a source of differential
all else equal, concentration in the fully-insured market will be changes in local insurance market concentration across the coun-
associated with relatively higher health insurance premiums. try. Using this instrument and the same dataset of large employers
as above, they find a significant effect of increases in local insurance
market concentration on increases in health insurance premi-
ums. Additionally, they explore the possible effects of insurance
2
We thank Chris Garmon for highlighting this point. consolidation on bargaining power with providers, finding that
106 E.E. Trish, B.J. Herring / Journal of Health Economics 42 (2015) 104–114
increased insurance concentration is associated with a substitu- in higher HMO premiums and reductions in insurance coverage,
tion of nurses for physicians. Similarly, Dafny et al. (Forthcoming) and that these effects were strongest among competitive insurance
exploit United Healthcare’s uneven impact of its non-participation markets.
in state exchanges to conclude that more concentrated insurance
exchanges were associated with higher premiums in 2014. 3. Conceptual framework for premiums
Numerous recent studies document the significant inter- and
intra-market variation in negotiated provider prices, including its Premiums set by insurers for a given employer represent a com-
association with market-level factors, such as insurer or hospital bination of expected medical spending covered by the insurer and a
market concentration (for example, White et al., 2013; Berenson loading factor. The loading factor reflects the insurer’s administra-
et al., 2012; Ginsburg, 2010; MedPAC, 2009; Massachusetts tive costs (such as marketing and paying claims) and any possible
Attorney General, 2010; US GAO, 2005). Several recent papers mark-up in the profit margin resulting from the insurer exercising
examine the relationship between both insurer and hospital con- market power in selling the insurance policy. Expected medical
centration and negotiated hospital prices. McKellar et al. (2013) spending is a function of prices and quantities of medical care to
and Moriya et al. (2010) both find that higher levels of insurance be consumed. Prices generally represent the outcome of negotia-
concentration are associated with lower hospital prices, but that tions between insurers and providers, and the expected quantity of
higher levels of hospital concentration are not significantly associ- healthcare consumed generally reflects the generosity of the plan
ated with higher hospital prices. However, Melnick et al. (2011) find and the health status and other features of the group covered.4
that higher hospital concentration is indeed associated with higher As noted above, an increase in the level of concentration in
hospital prices, and also find that hospital prices are lower in the the insurance market likely has offsetting effects on premiums
most concentrated health plan markets compared to more compet- as market concentration may differentially affect the loading and
itive health plan markets. Ho and Lee (2013) find heterogeneous expected spending components of the premium. Regarding the
effects of insurer competition on negotiated hospital prices; while loading component of the premium, the most straightforward
increased insurer competition actually reduces hospital prices on effect of increasing insurance market concentration is the likely
average, they observe a positive and significant effect on the prices positive effect on loading as the insurer gains more market power
negotiated by the most attractive hospitals.3 and attains higher profit margins on policies sold to employers.5
Similar effects have been documented among physician However, higher levels of insurer market concentration may poten-
markets. Schneider et al. (2008) find that higher physician concen- tially yield efficiencies in certain administrative costs such as lower
tration is associated with higher prices but find no effect of insurer advertising costs and an increased ability to spread certain fixed
concentration on prices, while Dunn and Shapiro (2012) find that costs over a larger population. Would an insurer with increased
physician prices are higher in concentrated physician markets and market power ever pass any portion of these saving in adminis-
lower in concentrated insurance markets. Additionally, Dunn and trative costs along to consumers in the form of relatively-lower
Shapiro (2013) find that negotiated physician prices increased as a premiums? Consider the extreme case of one monopolist insurer
result of health reform in Massachusetts, including some evidence setting the price of the premium such that its marginal revenue
suggesting that these price increases are at least partly attributable (from selling an additional policy) equals its marginal cost. Unless
to increased competition among insurers. However, the outcome the aggregate demand for insurance is completely inelastic, any
of interest in each of these studies is the prices negotiated between decrease in the marginal cost (from administrative efficiencies via
insurers and providers, leaving open the question of whether and, larger market share) implies a partial decrease in the premium (to
if so, the extent to which such prices are ultimately passed through thus reduce marginal revenue in equilibrium). That said, the over-
to consumers in the form of higher premiums. all effect of increased market concentration would seem to likely
In perhaps the most closely related paper to our study, Town increase premiums, with the partial effect of increased profits on
et al. (2006) analyze the effects of hospital industry consolida- higher premiums exceeding the partial effect of reduced adminis-
tion in the 1990s on HMO premiums. They derive a theoretical trative costs on lower premiums, unless the aggregate demand for
model demonstrating the effects of horizontal mergers in upstream insurance is highly elastic.
markets on consumer prices in downstream markets, and apply Regarding the expected spending components of the premium,
this model to the hospital (i.e., the upstream input to the prod- increased insurance market concentration may also result in lower
uct of health insurance) and health insurance (i.e., the downstream healthcare spending, as the insurer gains stronger bargaining
output) industries. Their theory predicts that the effects of con- leverage with hospitals and is able to negotiate lower payment
solidation in the upstream industry will have differential effects rates. However, the extent to which these lower provider prices
on the price and quantity of the downstream product dependent attained by an insurer are passed along as savings to consumers
on the level of competition in the downstream product industry, in the form of lower premiums is also unclear. Similar to the
and their empirical findings support this theory. Specifically, they
find that the hospital mergers that occurred in the 1990s resulted
4
Quantity consumed is also a function of the price; however, here we are focusing
on prices in terms of total transaction price negotiated with the hospital by the
3
Several other papers also focus on the effect of the type of hospital with insurer. Given the presence of insurance coverage, the portion of this price faced
respect to negotiation between hospitals and insurers (which we do not consider by the consumer seeking medical care is likely to be considerably smaller than this
in our empirical analysis). Ho (2009) develops a sophisticated model of the insurer- negotiated transaction price, so the price effects would likely reflect the change in
hospital bargaining game, estimating the expected division of profits between consumer cost-sharing, rather than the change in overall price. In a similar paper
insurers and hospitals. She finds that specific hospital features have important which disentangles the price and quantity effects on physician services consumed,
effects on the outcome of this bargaining game – that “star” and capacity constrained Dunn and Shapiro (2012) find very small price effects on quantity consumed in this
hospitals have stronger bargaining leverage with insurers and higher profits. This state of insurance coverage. Additionally, McKellar et al. (2013) find that, despite
result is also documented by Berenson et al. (2012) who, using data from qualitative an inverse correlation between market-level private prices and utilization, overall
interviews with hospital and insurance executives from the Community Track- the price effect dominates, resulting in a positive relationship between prices and
ing Study, find that “must-have hospital systems. . .can exert considerable market spending.
5
power to obtain steep payment rates from insurers.” Lewis and Pflum (2014) also Competitive pressures on insurers could also lead to improvements in quality
find that multi-market participation by a hospital system may increase bargaining for the insurance plan, holding spending constant, although Scanlon et al. (2008)
leverage. find no evidence to support competition’s effect on quality.
above consideration of reduced administrative costs, a reduction models anyway, but these models yielded insignificant results.)
in negotiated provider prices is essentially a downward shift in Our prior is that exogenously-high insurer profits would increase
the insurer’s marginal cost curve. If the price elasticity of demand insurer competition and exogenously-high hospital prices would
for insurance is completely inelastic, all of the savings from lower increase hospital competition, leading to a bias against observing
provider prices paid by the insurer would be retained by the our hypothesized positive effects of HHIs on premiums.
insurer as higher profits. Otherwise (unless the price elasticity
is extremely high), a portion of the savings from lower provider 4.2. Data
prices would likely be passed on to consumers as lower premiums
(tied with the desire to sell more policies)6 while a portion of We obtain data on employer-sponsored health insurance pre-
the savings from lower provider prices would be retained by the miums from a restricted-use version of the annual KFF/HRET
insurer as a higher profit margin. Employer Health Benefits Survey for 2006–2011. (The public-use
Conversely, as hospital markets become more concentrated, version of this dataset does not have geographic identifiers.) The
hospitals may gain stronger leverage in the bargaining game with KFF/HRET survey provides nationally representative data regard-
insurers, resulting instead in higher premiums via increased spending employers’ health benefits offerings for roughly 2000 firms
ing due to higher negotiated payment rates to hospitals. Moreover, per year. The data include plan-level information on the largest
as hospital markets become more competitive, the effect of insurer plan of each type of plan (i.e., HMO, PPO, etc.) offered by the
concentration may have a negligible impact on hospital prices if employer in the year. We obtain plan-level premiums, type of
those prices cannot be negotiated downwards any further by insur- plan, and generosity factors such as deductible and out-of-pocket
ers due to hospital solvency constraints. maximum information from these data and focus our regression
As a result, the relative magnitudes of these potentially off- analyses on single coverage. We also include firm-level control
setting effects of increasing insurance concentration on health variables from these data including firm industry, size, unioniza-
insurance premiums are not clear. Our study therefore aims to tion, and workforce characteristics. We restrict our analysis to
empirically isolate some of these potentially countervailing effects employers purchasing fully-insured coverage by excluding self-
of health insurance concentration, and their interaction with local insured employers. We also exclude rural employers from our
hospital market concentration, on health insurance premiums in analyses, as we ultimately link these data to market concentration
the employer-sponsored insurance market. measures constructed for urban markets. Finally, we exclude obser-
vations with premiums in the highest and lowest one percentile of
4. Empirical model and data the distribution of the data.
We also include time-variant market-level control variables in
4.1. Empirical overview these premium regressions. These include mean per capita income
at the CBSA-level, which we obtain from the Bureau of Economic
We run plan-level OLS regressions to test the relationship Analysis, and the age, sex, and race-adjusted mean annual Medicare
between insurer and hospital market concentration and employer- hospital reimbursement per enrollee at the HRR-level, obtained
sponsored fully-insured premiums from the KFF/HRET Employer from the Dartmouth Atlas of Health Care, which we include to con-
Health Benefits Survey from 2006 through 2011. These models trol for local variation in practice patterns that would be expected
use the logged single-employee’s total annual premium (i.e., the to affect utilization and therefore premiums. Additionally, we con-
employer and employee shares combined) as the dependent vari- trol for state-level premium tax rates and an index of high-cost
able of interest and include plan, firm, industry, and market-level state-mandated benefits,7 both of which may increase premiums
controls for premiums. Our model uses continuous HHI measures for fully-insured coverage. We use a one year lag for all market-
for insurer and hospital market concentration and, as noted above, level variables, except the premium tax rates and mandated benefit
incorporates two separate measures of insurance market concen- index, which are contemporaneous (though highly invariant over
tration to disentangle the effects of insurer market concentration the time period studied).
in the market for selling fully-insured coverage to employers from
those effects of insurer market concentration in the market for 4.3. Market definitions
bargaining over service prices with hospitals.
An important limitation of our analysis is that we ultimately We construct HHI measures of insurance market concentra-
rely on cross-sectional geographic variation in these market con- tion from the HealthLeaders-InterStudy census of private insurers
centration measures for both insurers and hospitals, and thus the and subsequently merge these market concentration measures to
endogeneity of these market concentration measures is a potential the KFF/HRET data.8 The HealthLeaders-InterStudy data include
concern. A good instrument for cross-sectional variation in market enrollment at the managed care organization (MCO)-product-
concentration is simply not apparent to us. Many studies therefore county-level for each year.9 We construct two distinct measures
use variation over time in these market concentration measures,
but we think that firm decisions to merge with one another are also
likely endogenous to market characteristics themselves. Regard- 7
The index is constructed by summing the number of high cost benefit mandates
less, there is very little within-market variation over time in either in effect for the given year in the state in which the policy is sold. “High cost” man-
dates are defined as those for which associated healthcare spending is estimated to
the insurer HHIs or the hospital HHIs during this 2006 through 2011
be more than 1% of overall premium by the Council for Affordable Health Insurance
time period for our premium data. (Despite this lack of variation (2006–2011).
over time, we tested models including market-level fixed-effects 8
The HHI is the sum of the squared market shares of each competitor in the
market, and is a commonly used measure of market competitiveness in horizontal
merger analyses conducted by the DOJ and FTC. The measure ranges from 0 to 10,000
with 10,000 representing a perfect monopoly. We scale this by 100 points (such that
6
In the presence of adverse selection, insurers may also pass on savings in the the HHI ranges from 0 to 100) in all of our regression analyses for easier presentation
form of lower premiums in an effort to attract a healthier risk pool. For example, of results.
9
Starc (2014) shows that medical spending is positively associated with premiums in The InterStudy data have been criticized for work on health insurance
the Medigap market and that adverse selection in this market somewhat restrains markets due to concerns with accuracy and consistency (see, for example,
insurer premium markups despite insurer market power. Dafny et al., 2011). One important point to note is that earlier criticisms of the
of insurance market concentration based on the market shares of systems from the Insurer:Employer HHI calculations, as these
the relevant transaction. plans compete with other plans for employer coverage.11
For the market transaction in which the insurer sells fully- We consider two ways to define geographic markets: Core-
insured coverage to employers, we define the product market as Based Statistical Areas (CBSA) and counties. The CBSA is a
all fully-insured managed care insurance products and aggregate geographic area defined by Office of Management and Budget to
the MCO’s enrollment in these products within a defined geo- represent an area with commuting ties to an urban center. The
graphic market (described below). We refer to this measure as the 11 largest CBSAs (e.g., greater New York City, greater Chicago) are
“Insurer:Employer HHI” representing the level of competition of separated into smaller Metropolitan Divisions (e.g., four Divisions
the market in which the employer is purchasing a fully-insured within New York City, three Divisions within Chicago), and so we
managed care product to provide coverage for its employees. use the smaller Metropolitan Division codes, when available, to
For the market transaction in which the insurer uses its bargain- define the geographic markets within these larger CBSAs.12 While
ing leverage to negotiate with hospitals, we define the insurance we believe that HHIs using the CBSA as the geographic market def-
product market by aggregating each insurer’s enrollment in a inition should reasonably characterize the market transactions for
geographic market for its entire commercial book of business relatively-smaller employers purchasing coverage among compet-
(i.e., combined enrollment in fully-insured and self-insured man- ing private health insurers and reasonably characterize the market
aged care products) because that full set of commercially insured transactions between private health insurers and hospital systems,
patients represents that insurer’s purchasing power. We refer to we also construct HHI measures using counties as the geographic
this measure as the “Insurer:Hospital HHI” measure. We exclude market. Accordingly, we run a regression model using the CBSA for
observations in markets in which the HealthLeaders-InterStudy these three HHIs and then a separate regression model using the
data provide implausibly high, low, or variant total enrollment. county for these three HHIs and report results for both measures.13
To construct the measure of market concentration for hospi- The joint distribution of the Insurer:Employer and Insurer:
tal services, we use data from the AHA annual survey. We include Hospital CBSA-based HHI measures is shown in Fig. 1A. While the
all non-federal short-term general acute care hospitals in the US. two measures are strongly correlated across markets (i.e., the cor-
We define the product market as the number of private-pay inpa- relation coefficient is 0.83), there is actually a considerable level
tient days aggregated to the hospital system within the geographic of differences between the Insurer:Employer and Insurer:Hospital
market.10 This concentration measure, referred to as “Hospital HHI” HHI measures, so that we are able to disentangle these opposing
represents the relative bargaining strength of the local hospital effects of higher profits and lower hospital prices on premiums.
market with which insurers must negotiate hospital prices. The joint distribution of relative bargaining leverage (i.e.,
We exclude plan enrollment and hospital admissions among Insurer:Hospital HHI and Hospital HHI for CBSAs) is shown in
three specific integrated delivery systems – namely, Kaiser Per- Fig. 1B. The correlation coefficient is 0.22, indicating that there is
manente, Geisinger Health System, and Intermountain Healthcare a mix of markets where the insurers have more bargaining power
– from the calculations of the Insurer:Hospital and Hospital HHI than hospitals, insurers have less bargaining power than hospitals,
measures, respectively, in the geographic markets where their and insurers and hospitals have comparable bargaining power. The
hospitals exclusively treat patients from the integrated insurer DOJ/FTC Horizontal Merger Guidelines provide particular HHI cut-
and there is thus no relevant hospital price negotiation. However, offs as one way to categorize the level of competition in a market;
we do not remove the enrollment among these integrated delivery by these standards, markets with an HHI between 1500 and 2500
are considered moderately concentrated, and markets with an HHI
greater than 2500 are considered highly concentrated (US DOJ/FTC,
2010).
InterStudy data related to the fact that they only measured enrollment in HMOs do
For the Insurer:Employer HHI measure using the CBSA for the
not apply to our study, as information on PPOs and other products was added begin-
ning in 2005 associated with combining with HealthLeaders. Nonetheless, there are market definition, 2.6% of plans are in un-concentrated markets,
still concerns regarding the validity and volatility of enrollment. We have addressed 39.3% are in moderately concentrated markets, and 58.1% are in
these concerns in several ways. In particular, we have removed some enrollment highly concentrated markets. For the Insurer:Employer HHI mea-
to address the double counting issue of “rental network” enrollment, particularly sure using the smaller county for the market definition, 1.8% of
in 2007–2008, following our own analysis and discussions with database managers
at HealthLeaders-InterStudy. Additionally, we have taken several steps to address
plans are in un-concentrated markets, 35.1% are in moderately con-
volatility in the data. First, we have taken the average MCO-product enrollment centrated markets, and 63.1% are in highly concentrated markets.
of the two observations per year (January and July) and used this as MCO-product For the Hospital HHI measure using the CBSA for the market def-
enrollment for the year. Next, we aggregate total managed care enrollment (fully- inition, 27.2% of plans are in un-concentrated markets, 29.8% are
and self-insured) in the data at the market (CBSA/Division or county) level, and com-
in moderately concentrated markets, and 43.1% are in highly con-
pare this enrollment to estimates of the under-65 population for the market, which
we obtain from the Small Area Health Insurance Estimates. We exclude from our centrated markets. For the Hospital HHI measure using the county
analyses any markets where the aggregate private enrollment in the HealthLeaders- for the market definition, 10.8% of plans are in un-concentrated
InterStudy data is less than 30% or greater than 100% of the total under-65 population markets, 20.1% are in moderately concentrated markets, and 69.1%
in the market. We believe these are conservative cutoffs, as the under-65 population are in highly concentrated markets. Only 1.0% of plans are in
includes not only those that are privately insured, but also those that are uninsured
and those with Medicaid or another source of public coverage (such as VA, Medicare,
etc.). Additionally, we drop any market-year observations for markets in which the
HHI is more than 25% greater or less than the mean HHI of that same market across
11
the six-year time period included in our study (i.e., an implausibly large insurance The results are qualitatively unchanged if we either do not exclude this enroll-
market one-year outlier). Overall, these restrictions result in an exclusion of about ment and/or if we simply drop observations in markets with integrated delivery
20% of the total plan-level observations in the KFF/HRET data. These excluded mar- systems present in the market.
12
kets tend to have higher levels of insurance market concentration (likely due to The results are also qualitatively unchanged if we use CBSAs to define geographic
mis-measurement), but are otherwise similar to the markets retained in our study. markets without using the smaller Metropolitan Divisions within these 11 largest
10
We also run models with alternative definitions of hospital product market, such CBSAs.
13
as beds, total volume, total admissions, and Medicare discharges; hospital market We also run models using the Dartmouth Atlas’ Hospital Referral Region (HRR)
concentration based on these different measures are all very highly correlated and as the geographic market for both Insurer:Hospital HHIs and Hospital HHIs, as the
our results are robust to these alternative definitions. Moreover, we believe that HRR has been frequently been used as a geographic market for healthcare. HRRs are
using the system-level measures (rather than individual hospitals) more accurately generally larger geographic areas than CBSAs, so CBSA-level markets are typically
represent the bargaining nature with insurers. more highly concentrated than HRR-level markets.
100
mandated benefit index. (We lag the income and Medicare utiliza-
tion values because they are essentially t − 1 forecasts by an insurer
while the tax rates and benefit mandates are known in advance.)
t is a year-indicator variable and εpt is the random error. We use
80
cluster-corrected robust standard errors at the market-year level.

We first estimate the model for the entire sample. We then con-
Ins:Emp HHI
60
duct sensitivity analyses by systematically excluding one or two

of the three market concentration measures to allow us to exam-
40
ine our ability to disentangle the effects of insurer concentration

through these measures. We then also estimate separate models for
different subsamples, by first stratifying the sample by insurance
20
market concentration and then stratifying the sample by hospital

market concentration. These stratified models allow us to examine
whether the effects of insurer and hospital market concentration
0
0 20 40 60 80 100 on premiums appear to vary across markets. We use an HHI of 2500

Ins:Hosp HHI
(A) for stratifying these samples, as this is the FTC/DOJ cutoff for a high
level of concentration and it splits the sample roughly in half for
the CBSA-defined markets.
100
5. Results and discussion

80
The first two columns of Table 1 include the weighted means and
Hospital HHI
60
standard deviations (where applicable) of the variables used in our

analyses. The weighted mean annual single-employee premium in
our sample is $4567; this, as well as the insurer and hospital mar-
40
ket concentration measures discussed below are similar to overall

population-weighted national measures over this time period and
20
thus do not appear to be idiosyncratic to the KFF/HRET sampling

(which is designed to be nationally representative) or our exclusion
criteria.
0
0 20 40 60 80 100 The next three columns of Table 1 present the full results from
Ins:Hosp HHI the OLS regression model for the annual premium shown in Eq.
(B)
(1). Before discussing the results for insurer and hospital market
concentration, we note that the results for the control variables
Fig. 1. Comparison of the two insurance market concentration measures and the
joint distribution with hospital market concentration using CBSA market definitions. generally appear as expected, indicating that the overall data and
Notes: The scatterplots depict the joint distribution of the Insurer:Employer and model is well specified. For instance, plans with higher deductibles
Insurer:Hospital CBSA-based HHI measures of insurance market concentration in have lower premiums, unionized firms have higher premiums, and
Panel (A) and the joint distribution of Insurer:Hospital and Hospital CBSA-based
smaller firms have higher premiums.
HHI measures for market concentration in Panel (B). Each dot represents a plan
from the 2006–2011 KFF/HRET Employer Health Benefits Survey. HHI is Herfindahl- In this model using the CBSA as the geographic market, the
Hirschman Index. coefficient for a 100 point increase in the Insurer:Employer HHI
is 0.0021 and the coefficient for a 100 point increase in the Hospi-
CBSA markets where both insurer and hospital markets are un- tal HHI is 0.0019. These findings, which are statistically significant
concentrated. In contrast, 31.6% of plans are in CBSA markets where at the 5% and 1% levels, respectively, support our hypothesis that
both insurer and hospital markets are highly concentrated. higher levels of both Insurer:Employer and Hospital concentration
are associated with higher employer-sponsored health insurance
4.4. Empirical model premiums. To put a relative magnitude on these coefficients, we
consider their effect size in the commonly-used example of a stan-
We estimate parameters from the following OLS plan-level pre- dard “five-to-four” merger – a market in which two of five equally
mium regression: sized firms merge, resulting in an 800 point increase in HHI (i.e., an
HHI increase from 2000 to 2800). These coefficient estimates imply
ln Ppt = ˛ + ˇIns:Emp HHImt−1 + ϕIns:Hosp HHImt−1 that a simulated five-to-four merger in the Insurer:Employer mar-
ket is associated with 1.7% ($78) increase in premiums, and the
+ Hosp HHImt−1 + Xp + ıFf + Mmt−1 + t + εpt (1)
same increase in the Hospital HHI is associated with a 1.5% ($67)
where the indices are plan p, firm f, market m, and year t. The increase in premiums.
Ins:Emp HHI term in this equation is the one-year lagged HHI of The coefficient for a 100 point increase in the Insurer:Hospital
the market in which insurers sell fully-insured policies to employ- HHI is −0.0024, also statistically significant at the 1% level. A simu-
ers, the Ins:Hosp HHI term is the one-year lagged HHI of the market lated five-to-four merger in this insurer bargaining leverage market
in which insurers bargain with hospitals, and the Hosp HHI term is associated with a 1.9% ($90) decrease in predicted premiums. This
is the one-year lagged HHI of the hospital market. The Xp and Ff finding of a positive coefficient on the Insurer:Employer HHI term
covariates are plan-level and firm-level control variables, while the and a negative coefficient on the Insurer:Hospital HHI term pro-
Mmt−1 are market-level controls, including the one-year lagged and vides support that there are indeed offsetting effects of increases in
logged CBSA-level per capita income, the lagged and logged mean insurer concentration in terms of market power in selling insurance
HRR-level Medicare hospital reimbursement values, the contem- to employers (increasing premiums) versus negotiating leverage
poraneous state premium tax rate, and the contemporaneous state with hospitals (decreasing premiums).
Table 1
Summary statistics and premium regression results for insurance and hospital market concentration.
Dependent variable Mean Std Dev Coeff Std Err p-Value
ln(Premium) 8.43 0.30 3.9720 0.4987 0.000

Market concentration variables
Insurer:Employer HHI (CBSA) 31.07 13.44 0.0021 0.0010 0.029
Insurer:Hospital HHI (CBSA) 27.39 12.65 −0.0024 0.0009 0.006
Hospital HHI (CBSA) 30.78 23.79 0.0019 0.0003 0.000
Plan-level controls
HMO 30.9% Ref
PPO 49.1% 0.1062 0.0154 0.000
POS 20.0% 0.0949 0.0197 0.000
Annual deductible ($000s) 0.44 0.72 −0.0581 0.0099 0.000
OOP Max < $1500 16.5% 0.0080 0.0156 0.607
Firm-level controls
Unionized workers 21.2% 0.0810 0.0156 0.000
Percent low income (<$23,000) 13.0% 20.1% −0.0564 0.0315 0.073
Percent part time 12.0% 16.5% 0.0344 0.0340 0.312
Firm size: 500+ 26.7% Ref
Firm size: 100–499 25.1% 0.0211 0.0152 0.166
Firm size: 25–99 23.0% 0.0232 0.0173 0.179
Firm size: 3–24 25.2% 0.0407 0.0200 0.042
Construction 7.0% Ref
Manufacturing 9.1% 0.0398 0.0306 0.193
Mining 0.4% 0.0105 0.0770 0.891
Transportation 7.4% 0.1016 0.0391 0.009
Wholesale 4.7% 0.0409 0.0335 0.223
Retail 7.2% 0.0512 0.0363 0.159
Finance 8.9% 0.1222 0.0322 0.000
Service 40.8% 0.1277 0.0291 0.000
Government 7.1% 0.2106 0.0334 0.000
Healthcare 7.3% 0.1425 0.0364 0.000
Market-level controls
ln(Per capita income) 10.62 0.20 0.2835 0.0417 0.000
ln(Medicare hospital payments) 8.38 0.20 0.1300 0.0387 0.001
State premium tax rate 0.99% 1.03% 0.8981 0.7783 0.249
State Mandated Benefit Index 9.49 4.20 −0.0004 0.0019 0.814
Regional controls
South 29.8% Ref
Northeast 20.9% 0.0855 0.0195 0.000
Midwest 16.1% 0.0793 0.0216 0.000
West 33.2% −0.0181 0.0192 0.345
Year controls
Year 2006 15.9% Ref
Year 2007 17.1% −0.0034 0.0227 0.880
Year 2008 16.5% 0.0618 0.0239 0.010
Year 2009 18.9% 0.1039 0.0223 0.000
Year 2010 16.9% 0.1439 0.0241 0.000
Year 2011 14.6% 0.1924 0.0260 0.000
Notes: The left-hand side of the table shows enrollment-weighted means and standard deviations from the 2006–2011 KFF/HRET Employer Health Benefits Survey with
geographic markets defined as CBSAs. The right-had side of the table shows enrollment-weighted OLS regression results from a plan-level regression of log annual premium
on the insurance and hospital market concentration. N = 5270; F(32, 1288) = 20.49 (p = 0.000); R2 = 0.2176. Standard errors are robust cluster-corrected at the market-year
level. The insurance and hospital market concentration measures and other market controls are lagged by one year and HHIs are scaled by 100. Percentages may not sum to
100% due to rounding. The coefficient, standard error, and p-value included in the first row are for the intercept.
The results in Table 2 illustrate how the inclusion of the two picked up by the measure limits its magnitude and significance in
distinct and potentially offsetting measures of insurance con- the direction that we expect. This provides support for the fact that
centration (in the employer and hospital markets) appear to be these two insurance market concentration terms are indeed con-
necessary to disentangle the different effects of market power on tributing unique information regarding the structure of the market
the seller and buyer side. Table 2A’s second column (labeled Model in which the plan is sold and that in which the insurer bargains
1) repeats our main results including all three HHI measures using over hospital prices, and their relevant association with premiums.
the CBSA market definition, and the next six columns show the Finally, Model 4 indicates that our findings for insurance market
results from additional separate regressions to show the possi- concentration are not sensitive to the exclusion of hospital market
ble permutations of including/excluding these three concentration concentration, and Model 5 indicates that our finding for the associ-
measures. Our ability to capture these related but offsetting effects ation between higher premiums and hospital market concentration
of insurer concentration is supported by the fact that the magni- is not sensitive to the exclusion of insurer market concentration.
tude and statistical significance of the coefficients on these terms Table 2B presents the results from this same set of regressions
are diminished when only one of them is included in the regression but instead using counties as the geographic market. While the
specification. This is illustrated in the other columns of Table 2A; mean concentration measures for the insurer and hospital markets
when only one of these insurance market concentration mea- are higher for county-defined markets compared to CBSA-defined
sures is included without the other (i.e., Models 2 and 3 with the markets (especially for hospital markets), overall the same pattern
hospital HHI excluded, Models 6 and 7 with the hospital HHI of regression results holds for the models using county as the mar-
included), the portion of the offsetting effect on premiums that is ket. In the model including all three concentration measures with
Table 2
Premium regression results for insurance and hospital market concentration measures excluding each measure, using (A) CBSA and (B) county market definitions.
ln(Premium) Mean Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7
(A)
Ins:Emp HHI 3107 0.0021 0.0005 0.0028 0.0001
SD/SE 1344 0.0010 0.0005 0.0010 0.0005
p-Value 0.029 0.308 0.004 0.768
Ins:Hosp HHI 2739 −0.0024 −0.0004 −0.0028 −0.0006

SD/SE 1265 0.0009 0.0004 0.0009 0.0004
p-Value 0.006 0.328 0.002 0.139
Hospital HHI 3078 0.0019 0.0019 0.0019 0.0020

SD/SE 2379 0.0003 0.0003 0.0003 0.0003
p-Value 0.000 0.000 0.000 0.000
N 5270 5270 5270 5270 5270 5270 5270

R2 0.2595 0.2623 0.2624 0.2617 0.2599 0.2600 0.2599
(B)
Ins:Emp HHI 3225 0.0025 0.0005 0.0029 0.0004
SD/SE 1351 0.0009 0.0005 0.0009 0.0005
p-Value 0.008 0.351 0.002 0.416
Ins:Hosp HHI 2844 −0.0025 −0.0006 −0.0030 −0.0005

SD/SE 1269 0.0008 0.0004 0.0008 0.0004
p-Value 0.003 0.181 0.000 0.283
Hospital HHI 4293 0.0009 0.0011 0.0011 0.0011

SD/SE 2757 0.0003 0.0003 0.0003 0.0003
p-Value 0.001 0.000 0.000 0.000
N 5210 5210 5.210 5210 5210 5210 5210

R2 0.2608 0.2624 0.2624 0.2616 0.2614 0.2614 0.2613
Notes: These seven models show selected enrollment-weighted OLS regression results from a plan-level regression of log annual premium from the KFF/HRET survey on the
insurance and hospital market concentration measures using the CBSA as the geographic market in Panel (A) and the county as the geographic market in Panel (B). Standard
errors (SE) are robust cluster-corrected at the market-year level. The three market concentration measures in the regression models are lagged by one year and scaled by
100 (i.e., the coefficients represent the effect of an HHI that is 100 points higher).
markets defined at the county-level, the coefficient for a 100 point Table 3
Premium regression results for insurance and hospital market concentration mea-
increase in the Insurer:Employer HHI is 0.0025 and the coefficient
sures, stratified by insurance market concentration.
for a 100 point increase in the Hospital HHI is 0.0009, both signif-
icant at the 1% level. Additionally, the coefficient for a 100 point ln(Premium) 1 2 3
Full sample Ins:Emp HHI Ins:Emp HHI
increase in the Insurer:Hospital HHI is −0.0025, also significant at
≤2500 >2500
the 1% level.
Ins:Emp HHI 0.0021 0.0061 0.0026
SE 0.0010 0.0039 0.0010
5.1. Stratified analyses p-Value 0.029 0.115 0.026
Ins:Hosp HHI −0.0024 −0.0004 −0.0023

Based on our discussion in Section 3, as well as the results from SE 0.0009 0.0029 0.0010
Town et al. (2006), we hypothesize that the effects of insurance and p-Value 0.006 0.879 0.026
hospital market concentration may be more (or less) pronounced Hospital HHI 0.0019 0.0020 0.0018
depending on the relative concentration of the other market. SE 0.0003 0.0006 0.0004
Specifically, the Town et al. theoretical and empirical results sug- p-Value 0.000 0.001 0.000
gest that the positive effects of hospital concentration on insurance N 5270 2207 3063
premiums will be more pronounced among competitive insurance R2 0.2595 0.2427 0.2659
markets. To test this hypothesis, we re-ran the regressions on two Notes: The first column repeats the regression results in Table 2A’s Model 1, while
subsamples of the plans, stratified by the level of competition in the next two columns show enrollment-weighted OLS regression results for premiums
Insurer:Employer market; i.e., an HHI either below or above 2500. from the KFF/HRET survey stratified by level of insurance market concentration
(We also stratify our models based on hospital concentration fur- above or below an HHI of 2500. All markets are geographically defined using CBSAs.
The three market concentration measures are lagged by one year and scaled by 100.
ther below.) We use this stratified-sample approach, rather than
Standard errors are robust cluster-corrected at the market-year level.
an interaction term, so that we can observe the baseline effects in
more competitive and more concentrated markets independently.
The findings from this analysis using CBSAs as the geographic association between hospital concentration and premiums is sig-
market are presented in Table 3.14 While we find that the nificant among plans sold in both relatively more competitive
downstream insurance markets and relatively more concentrated
downstream insurance markets, we do not find evidence that
14
Repeating these stratified analyses using the county-based HHI measures with
the same HHI cutoff above and below 2500 yields slightly different results. How-
ever, this appears to be due, at least in part, to the difference in the magnitudes
of HHIs for markets using the county-defined markets as compared to the CBSA- the HHI threshold cutoff (e.g., from 2500 to 3500 and thus roughly evenly-split
defined markets (i.e., the mean HHIs for the smaller county-defined markets are subsample sizes) for stratifying more competitive versus more concentrated mar-
higher). We instead observe a similar pattern of results using county-defined markets. Additional results from these analyses are available from the authors upon
kets to those presented in Tables 3 and 4 using CBSA-defined markets if we increase request.
the relationship is stronger among more competitive down- Table 4

Premium regression results for insurance and hospital market concentration mea-
stream insurance markets. The coefficient on premiums for a
sures, stratified by hospital market concentration.
100 point increase in Hospital HHI is 0.0020 in more competi-
tive Insurer:Employer markets and 0.0018 in more concentrated ln(Premium) 1 2 3
Full sample Hospital HHI Hospital HHI
Insurer:Employer markets, both significant at the 1% level.
≤2500 >2500
However, we do observe that the negative association between
the Insurer:Hospital concentration measure and premiums is Ins:Emp HHI 0.0021 0.0013 0.0030
SE 0.0010 0.0015 0.0013
stronger among more concentrated insurance markets. Specifi-
p-Value 0.029 0.378 0.023
cally, the coefficient on premiums for a 100 point increase in the
Ins:Hosp HHI −0.0024 −0.0012 −0.0034
Insurer:Hospital HHI measure is −0.0023 (p < 0.05) for plans sold
SE 0.0009 0.0012 0.0012
in more concentrated insurance markets (Column 3), and statis- p-Value 0.006 0.343 0.005
tically insignificant (−0.0004) for plans sold in more competitive
Hospital HHI 0.0019 0.0012 0.0015
insurance markets (Column 2). In addition, this stratification by
SE 0.0003 0.0014 0.0004
level of concentration in the insurance market reveals that, while p-Value 0.000 0.372 0.000
the relationship between Insurer:Employer competition and pre-
N 5270 3001 2269
miums among more competitive insurance markets is statistically R2 0.2595 0.2487 0.2632
indistinguishable from zero, the extent of Insurer:Employer com-
Notes: The first column repeats the regression results in Table 2A’s Model 1, while
petition among more concentrated insurance markets is important.
next two columns show enrollment-weighted OLS regression results for premiums
These findings suggest that the effects of the levels of con- from the KFF/HRET survey stratified by level of hospital market concentration above
centration in healthcare markets on premiums vary with the or below an HHI of 2500. All markets are geographically defined using CBSAs. The
overall market characteristics. They provide evidence that increas- three market concentration measures are lagged by one year and scaled by 100.
ing consolidation among hospital markets not only results in Standard errors are robust cluster-corrected at the market-year level.
higher negotiated prices, but that these higher prices are ulti-
mately passed-through to consumers in the form of higher insurance premiums. Further, these results (from Tables 3 and 4)
premiums, regardless of downstream insurer market structure. suggest that the negative relationship between premiums and
The finding that the positive and negative associations between insurer bargaining power with hospitals may be particularly pro-
Insurer:Employer and Insurer:Hospital market concentration and nounced among more highly concentrated insurer and hospital
premiums are strongest among the more concentrated insurance markets. Generally, they support the suggestion that the relative
markets may suggest that the association between higher levels balance of insurer and hospital concentration also has important
of insurance concentration on the ability to charge higher premi- implications for insurance premiums, reflecting the underlying
ums to employers and to negotiate lower prices with providers market structure that insurers must bargain with hospitals to set
may be particularly important among relatively more concentrated transaction prices, and thus the level of concentration in both
insurance markets. insurer and hospital markets and their relative bargaining leverage
Next we examined whether a similar pattern of results would jointly impact these negotiated prices. Importantly, they provide
emerge from stratifying the observations based on the level of empirical evidence that these higher provider prices are often
hospital market concentration. Here we hypothesize that the asso- ultimately passed-through to consumers in the form of higher pre-
ciation of the level of competition in the Insurer:Hospital market miums, and that this pass-through also depends on relative market
and premiums would be strongest among the more concentrated conditions. In general, we observe higher premiums among plans
hospital markets, as the effects of increased insurer bargaining sold in markets with higher levels of hospital and Insurer:Employer
leverage may be less pronounced if hospital prices are already rela- concentration and lower premiums among plans sold in markets
tively low due to hospital market competition alone. Analogous to with higher levels of Insurer:Hospital concentration, especially
the above analysis, we stratified the observations into two groups among more highly concentrated markets.
according to level of concentration in the hospital market; i.e., Hos-
pital HHI below or above 2500 using the CBSA-defined geographic 5.2. Limitations
markets.
The findings from this analysis stratified by hospital market con- As noted earlier, the cross-sectional design of this study limits
centration are presented in Table 4, where Column 1 again simply our ability to infer the causal relationship between the concen-
repeats the results from Table 2A’s Column 1, and Columns 2 and tration of insurance and hospital markets and health insurance
3 show the results among plans sold in markets with more or less premiums. It is difficult to construct instruments that would rep-
competitive hospital markets. We find that the statistical signifi- resent an exogenous source of variation in market concentration
cance of the positive and negative associations between premiums that would be unrelated to premiums, particularly instruments
and Insurer:Employer HHI and Insurer:Hospital HHI, respectively, that would be uniquely predictive of the two different insurer and
hold only among the more concentrated hospital markets. We the hospital concentration measures. One approach (for examining
also find that the statistical significance of the positive association provider prices) is to instrument for insurer concentration using the
between Hospital HHI and premiums holds only among the more underlying distribution of firms, but it is doubtful that this would
concentrated hospital markets. These findings provide support for be unrelated, independently, to employer-sponsored premiums.
our hypothesis that, while hospital prices may already be rela- Another common approach is to exploit mergers as a source of
tively lower among more competitive hospital markets, in more variation in concentration over time, but there was little consol-
concentrated hospital markets, more concentrated insurers may idation activity for insurers and hospitals during this time period
leverage their stronger bargaining power to negotiate lower prices for which we have these rich KFF/HRET data. Moreover, it is not
among concentrated hospital markets and use their market power clear that the merger itself would necessarily be exogenous. Yet
in selling insurance to employers to increase their profit margins. another commonly-used approach is to construct measures of mar-
Taken together, these results (from Tables 1 and 2) suggest that ket concentration based on predicted, rather than actual patient
the levels of competition in the health insurance and hospital mar- flows, but we are limited by data and analytical resources to com-
kets are significantly associated with employer-sponsored health plete this exercise at the national level for hospitals and unsure how
one would apply this consumer-flow approach to insurer market Overall, we believe that our ability to construct two distinct
shares. We have tried to alleviate these endogeneity concerns by insurance market concentration measures using the variation in
lagging the market concentration and market control variables by fully-insured and self-insured enrollment represents an improve-
one year; this at least implies a temporal relationship that is con- ment in depicting these markets and their unique association with
sistent with the hypothesis that the level of market concentration health insurance premiums. Nonetheless, further work on these
affects health insurance premiums. Additionally, if higher premi- open questions is warranted.
ums in fact encourage market entry by other insurers, this would
result in a more competitive health insurance market, which would
bias our results downward. Nonetheless, we interpret our results 6. Conclusion
as associations between market concentration and premiums and
not necessarily a causal relationship. The US health insurance industry is highly concentrated and
Another limitation is that our model also measures the asso- health insurance premiums are high and rising rapidly. Our data
ciation between aggregate market-level measures of insurer and demonstrate that less than 3% of the markets in which employers
hospital concentration and the premium of a specific insurance purchase fully-insured coverage are considered un-concentrated
plan purchased by an employer in that market. The KFF/HRET data by the guidelines set forth by the DOJ/FTC. Similarly, more than
do not allow for the identification of which insurer sold the pol- half of these markets are considered highly concentrated. Provi-
icy to the employer, so we therefore are also unable to link the sions included in the ACA are focused on increasing competition
specific market share of that insurer to the observation. Similarly, in these markets, with the expectation that increased competi-
we do not know anything about which hospitals are included in tion within the health insurance industry would help to lower
a given plan’s network, nor about the insurer-hospital contracts. premiums. Though focused on the individual rather than the
Thus, due to data limitations, we are unable to model the bargain- employer-sponsored market, early evidence suggests that not all
ing between insurers and hospitals and to consider constructs such state insurance markets are, in fact, becoming more competitive
as “Option Demand/Willingness to Pay” (Capps et al., 2003) mea- (Cox et al., 2014) but that more competitive exchanges have lower
sures for inclusion of certain hospital systems and the effects that premiums (Dafny et al., Forthcoming).
this may have on premiums. Nonetheless, while the market mea- However, health insurers operate in a complex bilateral
sures may not reflect the specific insurer and associated network oligopoly, whereby they must negotiate service prices with hos-
from which the plan is purchased, they do represent the overall pitals and other providers, and higher levels of market power may
market conditions within which the employer is choosing a policy. in fact result in stronger bargaining leverage with these providers
Thus, we believe that they provide important information regard- to drive down prices, which could then be partially passed through
ing the relationship between market conditions and policies sold in in the form of lower premiums. Thus, the ultimate impact of the
those markets. Additionally, while we control for some plan gen- level of competition in the health insurance industry on premiums
erosity features such as plan type, deductible, and out of pocket is unclear – but it seems likely that the underlying goal of reduc-
maximum, premium variation may also reflect differences in qual- tions in insurer administrative overhead associated with increased
ity and plan generosity that are not accounted for by these control insurer competition can generally not be achieved without the
variables (including the plan’s network) which could conceivably unintended consequence of higher provider prices associated with
be correlated with the extent of market concentration. decreased bargaining power with providers. The analyses pre-
Additionally, our market concentration measures are reliant sented herein suggest that the effects of increasing competition in
on how we have chosen to define the markets. While we believe health insurance markets on health insurance premiums are likely
that CBSAs represent a reasonable geographic market for employ- to depend on the level of competition in local hospital markets,
ers purchasing insurance and a reasonable geographic market for as well as the relative competitiveness of the fully-insured and
hospital care, the extent to which our measures accurately rep- self-insured markets and insurers’ overall bargaining leverage with
resent the true level of competition in these markets depends on hospitals and other local providers.
the degree to which they accurately reflect the markets in which We find that employer-sponsored insurance premiums among a
insurance is purchased and hospital network inclusion negotiations nationally representative sample of firms purchasing fully-insured
occur, respectively. However, given that CBSAs are constructed to products are higher in markets where insurance and/or hospital
represent person-flows for commuting to employment, we believe markets are highly concentrated, as compared to those in which
that they represent reasonable choices for the markets we are try- they are more competitive. Further, we find that higher levels of
ing to measure. We are also reassured by the observation that our concentration among the market in which insurance is sold to
results are similar when using counties instead of CBSAs. employers are associated with higher premiums, whereas higher
Further, given that these KFF/HRET data are of plans actually levels of concentration among the market in which insurers bargain
purchased by employers, our analysis does not explicitly model the with hospitals are associated with lower premiums. Importantly,
employers’ option to not offer coverage or to self-insure as an alter- we find that higher levels of concentration in hospital markets are
native choice.15 However, given that all the plans in our model are also associated with higher premiums – providing evidence that
indeed purchased, they reflect choices that employers have made the well-documented higher prices resulting from consolidation
dependent on the local market conditions that do in fact exist, among hospitals do in fact affect consumers in the form of higher
and thus we feel that they represent an appropriate association premiums, and that local market conditions affect the extent of this
between market conditions and premiums. pass-through.
However, our findings, along with recent literature suggesting
that hospital prices are lower among more concentrated insur-
ance markets, suggest that higher levels of insurer bargaining
15
We have also examined firm-level decisions to self-insure using these KFF/HRET leverage with hospitals may lead to lower health insurance pre-
data, with a primary focus on examining the influence of state community rating miums via lower negotiated hospital prices, as long as there
rules for low-risk versus high-risk industries among firms with 25–100 workers
(Trish and Herring, 2014). In those analyses, there were no statistically significant
is sufficient competition in the market for selling insurance to
associations between insurer market concentration nor hospital market concentra- small employers that these lower prices get passed through to
tion (included as control variables) and a firm’s decision to self insure. employers in the form of lower premiums. Recent policy changes,
such as the introduction of minimum medical loss ratios, may Dafny, L., Gruber, J., Ody, C., 2015. More insurers lower premiums: evidence from
help to ensure that such savings are passed through, even in the initial pricing in the health insurance marketplaces. American Journal of Health
Economics 1 (1), 53–81.
absence of higher levels of competition in this market. Addition- Dartmouth Atlas of Health Care, 2012. Data by Region, Available from: http://www.
ally, our results suggest that this important negative relationship dartmouthatlas.org/data/region/
between insurer bargaining power and premiums is particularly Dranove, D., Gron, A., Mazzeo, M., 2003. Differentiation and competition in HMO
markets. Journal of Industrial Economics 51 (4), 433–454.
pronounced among more highly concentrated markets. Taken Dunn, A., Shapiro, A., BEA Working Papers 0084 2012. Physician Market Power and
together, these findings suggest that ACA provisions to increase Medical-Care Expenditures. Bureau of Economic Analysis.
competition in health insurance markets may be unsuccessful by Dunn, A., Shapiro, A., Working Paper 2013-36 2013. The Impact of Health Care
Reform on Physician Payments: Evidence from Massachusetts. Federal Reserve
not also considering the level of concentration in local hospital mar-
Bank of San Francisco.
kets, particularly if they dilute insurers’ overall bargaining leverage Gaynor, M., Town, R., 2011. Competition in health care markets. In: Pauly, M.,
with local hospital systems. This may be particularly problematic Mcguire, T., Barros, P. (Eds.), Handbook of Health Economics, vol. 2. Elsevier,
Amsterdam, pp. 499–637.
due to recent provider consolidation trends and the strong incen-
Gaynor, M., Town, R., 2012, June. The Impact of Hospital Consolidation – Update.
tives for such provider consolidation included in the ACA, such Robert Wood Johnson Foundation Synthesis Report.
as increased horizontal and vertical integration resulting from the Gaynor, M., Vogt, W., 2000. Antitrust and competition in health care markets. In:
formation of Accountable Care Organizations. Therefore, efforts tar- Culyer, A., Newhouse, J. (Eds.), Handbook of Health Economics, vol. 1B. Elsevier,
Amsterdam, pp. 1405–1487.
geted toward reducing health insurance premiums may be better Ginsburg, P., 2010. Wide Variation in Hospital and Physician Payment Rates Evidence
directed toward insurer–provider negotiations and rate regula- of Provider Market Power. HSC Research Brief No. 16, Center for Studying Health
tion, or efforts to simultaneously reduce the level of concentration System Change.
Government Accountability Office (GAO), 2005. Report to the Honorable Paul Ryan:
among insurers and providers. House of Representatives: Federal Employees Health Benefits Program – Com-
petition and Other Factors Linked to Wide Variation in Health Care Prices,
Acknowledgements Washington, DC.
Ho, K., 2009. Insurer–provider networks in the medical care market. American Eco-
nomic Review 99 (1), 393–430.
This research was supported by Grant No. 69070 from the Robert Ho, K., Lee, R.S., NBER Working Paper #19401 2013. Insurer Competition and Nego-
Wood Johnson Foundation’s Changes in Healthcare Financing and tiated Hospital Prices. National Bureau of Economic Research.
Lewis, M.S., Pflum, K.E., 2015. Diagnosing hospital system bargaining power in
Organization (HCFO) initiative. We would like to thank Anthony managed care networks. American Economic Journal: Economic Policy 7 (1),
Damico, Matthew Rae, and Gary Claxton of the Kaiser Family Foun- 243–274.
dation for their assistance with the KFF/HRET Employer Health Massachusetts Attorney General, 2010. Examination of Health Care Cost Trends
and Cost Drivers. Report for Annual Public Hearing, Office of Attorney General
Benefits Survey data and Adele Shartzer for her assistance with the
Martha Coakley, Boston, MA.
AHA data. We are grateful for helpful comments by two anonymous McKellar, M.R., Naimer, S., Landrum, M.B., Gibson, T.B., Chandra, A., Chernew, M.,
reviewers, Chris Garmon, Jeff Stensland, Abe Dunn, Vivian Wu, 2013. Insurer market structure and variation in commercial health care spend-
ing. Health Services Research (Epub ahead of print: 5 December 2013).
Rich Lindrooth, Lisa Dubay, and seminar participants at CBO, NCHS,
Medicare Payment Advisory Commission (MedPAC), 2009. Report to the Congress:
Urban Institute, Emory University, Mathematica Policy Research, Medicare Payment Policy, Washington, DC.
Analysis Group, University of Massachusetts-Amherst, RAND Cor- Melnick, G., Shen, Y., Wu, V., 2011. The increased concentration of health plan
poration, University of Wisconsin, Colorado School of Public Health, markets can benefit consumers through lower hospital prices. Health Affairs
(Millwood) 30 (9), 1728–1732.
Virginia Commonwealth University, an RWJF HCFO grantee brief- Moriya, A., Vogt, W., Gaynor, M., 2010. Hospital prices and market structure in the
ing, AcademyHealth’s Annual Research Meeting, and the American hospital and insurance industries. Health Economics, Policy and Law 5, 459–479.
Society of Health Economists Conference. Robinson, J., 2004. Consolidation and the transformation of competition in health
insurance. Health Affairs (Millwood) 23 (6), 11–24.
Scanlon, D.P., Swaminathan, S., Lee, W., Chernew, M., 2008. Does competition
References improve health care quality? Health Services Research 43 (6), 1931–1951.
Schneider, J., Li, P., Klepser, D., Peterson, N., Brown, T., Scheffler, R., 2008. The effect of
American Medical Association, 2013. Competition in Health Insurance. A Compre- physician and health plan market concentration on prices in commercial health
hensive Study of US Markets: 2013 Update. insurance markets. International Journal of Health Care Finance and Economics
Berenson, R., Ginsburg, P., Christianson, J., Yee, T., 2012. The growing power of 8, 13–26.
some providers to win steep payment increases from insurers suggests policy Starc, A., 2014. Insurer pricing and consumer welfare: evidence from Medigap. RAND
remedies may be needed. Health Affairs (Millwood) 31 (5), 973–981. Journal of Economics 45 (1), 198–220.
Capps, C., Dranove, D., Satterthwaite, M., 2003. Competition and market power in Town, R., Wholey, D., Feldman, R., Burns, L., NBER Working Paper #12244 2006.
option demand markets. RAND Journal of Economics 34 (4), 737–763. The welfare consequences of hospital mergers. National Bureau of Economic
Council for Affordable Health Insurance, 2006–2011. Health Insurance Mandates in Research.
the States 2006–11. The Council for Affordable Health Insurance. Trish, E., Herring, B., Working Paper 2014. Does Small Group Market Community
Cox, C., Ma, R., Claxton, G., Levitt, L., 2014. Sizing Up Exchange Market Competition. Rating Affect Firm Self-Insurance?
Kaiser Family Foundation Issue Brief. US Department of Justice and the Federal Trade Commission, 2010. Hori-
Cutler, D.M., Scott Morton, F., 2013. Hospitals, market share, and consolidation. zontal Merger Guidelines, Available from: http://www.justice.gov/atr/public/
Journal of the American Medical Association 310 (18), 1964–1970. guidelines/hmg-2010.pdf
Dafny, L., 2010. Are health insurance markets competitive? American Economic Vogt, W.B., Town, R., 2006. How has Hospital Consolidation Affected the Price and
Review 100 (4), 1399–1431. Quality of Hospital Care? Research Synthesis Report No. 9, Robert Wood Johnson
Dafny, L., Dranove, D., Limbrock, F., Morton, F.S., 2011. Data impediments to empir- Foundation.
ical work on health insurance markets. The B.E. Journal of Economic Analysis & White, C., Bond, A.M., Reschovsky, J.D., 2013. High and Varying Prices for Privately
Policy 11 (2), Article 8. Insured Patients Underscore Hospital Market Power. HSC Research Brief No. 27,
Dafny, L., Duggan, M., Ramanarayanan, S., 2012. Paying a premium on your pre- Center for Studying Health System Change.
mium? Consolidation in the US Health Insurance Industry. American Economic Wholey, D., Feldman, R., Christianson, J., 1995. The effect of market structure on
Review 102 (2), 1161–1185. HMO premiums. Journal of Health Economics 14, 81–105.

Does women’s education affect breast cancer risk and survival?

Evidence from a population based social experiment in education
Mårten Palme a,∗ , Emilia Simeonova b
a
Department of Economics, Stockholm University, SE-106 91 Stockholm, Sweden
b
Johns Hopkins University and NBER, 100 International Drive, Baltimore, MD 21202, United States
Article history: Breast cancer is a notable exception to the well documented positive education gradient in health. A
Received 12 July 2013 number of studies have found that highly educated women are more likely to be diagnosed with the dis-
Received in revised form 30 October 2014 ease. Breast cancer is therefore often labeled as a “welfare disease”. However, it has not been established
Accepted 5 November 2014
whether the strong positive correlation holds up when education is exogenously determined. We esti-
Available online 22 November 2014
mate the causal effect of education on the probability of being diagnosed with breast cancer by exploiting
an education reform that extended compulsory schooling and was implemented as a social experiment.
We find that the incidence of breast cancer increased for those exposed to the reform.
I12
I18 © 2014 Elsevier B.V. All rights reserved.
Keywords:
Education gradient in health
Schooling reform
Breast cancer
1. Introduction et al., 2008). This is in stark contrast with the well-documented

negative association between education and all-cause mortality
Worldwide, breast cancer is the most common cancer and the and with the positive effects of education on health-promoting
leading cause of cancer death among women. In the year 2008 behaviors (Cutler and Lleras-Muney, 2006, 2011 for reviews of the
alone, 2.6 women were diagnosed with breast cancer every minute literature). Frequently diagnosed cancers of the female reproduc-
across the globe. More than 52 women died of breast cancer every tive organs, such as cervical cancer, show the opposite, negative
hour in the same year (Ferlay et al., 2010). These aggregate num- association with education in correlational studies (Baquet et al.,
bers mask large differences in trends in breast cancer incidence and 1990). Part of the observed positive correlation between educa-
mortality across the developed and developing economies. Histor- tion and breast cancer could be due to more frequent screening
ically, western societies have experienced a heavier burden of the and more adequate response to risk factors among the better edu-
disease, however in the last couple of decades the incidence and cated (Lange, 2011). Still, environmental and social factors could
mortality from breast cancer has been on the rise in developing also affect breast cancer risk and survival. A recent report by the
countries (Althius et al., 2005). While it is plausible that this rise Interagency Breast Cancer and Environmental Research Coordinat-
is due to increased screening and better medical and vital records ing Committee (IBCERCC) in the US forcefully argues that research
keeping, some have argued that more affluent societies and west- on the causes of increased breast cancer risk and consequently on
ernization also contribute to these recent trends (ibid). increased prevention is of first order importance in designing public
Breast cancer in women is one of the rare health conditions health strategies to contain the disease.1
that exhibit a positive incidence gradient with socio-economic A key question on the etiological background to the link between
status (SES), and in particular with attained education (see e.g. education and the incidence of breast cancer is whether the rela-
Hemminki and Li, 2003, 2004; Lund and Jacobsen, 1991; Hussain tion is made up by life style factors, such as delayed childbearing,
∗ Corresponding author. Tel.: +46 8 16 33 07; fax: +46 8 15 94 82. 1

“Prioritizing Prevention” Summary of Recommendations of the Interagency
E-mail addresses: Marten.Palme@ne.su.se (M. Palme), Breast Cancer and Environmental Research Coordinating Committee (http://www.
Emilia.Simeonova@gmail.com (E. Simeonova). niehs.nih.gov/about/assets/docs/ibcercc full.pdf).
116 M. Palme, E. Simeonova / Journal of Health Economics 42 (2015) 115–124
that may be acquired along with prolonged education, or if it can The closest study we are aware of is by Glied and Lleras-
be attributed to factors and individual characteristics correlated Muney (2008) who use the Surveillance, Epidemiology and End
with both educational attainments and the probability to get breast Results Program (SEER) data to estimate the effects of technological
cancer. The most common research strategy used in epidemiolog- progress on cancer deaths by education, relying on US compulsory
ical studies is to add confounders that are known to be associated schooling laws for exogenous variation in educational attainment.
with educational attainments and potentially etiologically related They find that conditional on technological progress, extra educa-
to breast cancer, such as delayed childbearing in a regression frame- tion reduces overall cancer mortality in men, but not in women.
work, and to investigate if the correlation remains (see e.g. Braaten Excluding cancers of the reproductive system, inclusive of breast
et al., 2004; Danø et al., 2004; Heck and Pamuk, 1997). There are cancer, makes the estimated effects for men and women consis-
at least two problems with this strategy. First, there is an identi- tent. The authors do not specifically test for the effects of education
fication problem. Most confounders, such as fertility behavior, are on survival from reproductive system cancers in women, relying
likely to be endogenous to educational attainment. This means that on the findings in the medical literature we discuss above.
it is still not clear if including them in the regression makes up for a This study finds that attaining higher levels of education
causal relation with education, or if they just proxy individual char- increases the risk of being diagnosed with breast cancer in women,
acteristics correlated with educational attainments. Second, adding confirming the results obtained from purely correlational studies.
independent variables in a regression would in most cases aggra- However, we also find that this heightened probability of diagnosis
vate the downward bias from measurement errors (see e.g. Greene, is later followed by an elevated probability of death from breast
2003). cancer among better educated women. Further, we investigate the
An alternative strategy to analyze this research question is to potential role of fertility decisions, which has been pointed out as
use exogenous variations in educational attainments created by the mechanism linking education and the incidence of breast can-
natural experiments. A number of influential studies have used cer. We find no convincing evidence in favor of this hypothesis.
this research strategy to study the relationship between education The curious association between education and the most common
and measures of general health. Lleras-Muney (2005), Oreopoulos cancer diagnosis in women appears to be affected by qualities,
(2006) and Clark and Royer (2012) use variation induced by behaviors, and risk factors acquired in the process of obtaining more
changes in compulsory schooling legislations in the US and the education, rather than pre-existing characteristics that predispose
UK as a source of exogenous variation in education. Spasojevic some women to both get more education and be diagnosed with
(2010), Meghir et al. (2012) as well as Lager and Torssander (2012) the disease.
investigate the health consequences of the introduction of compre-
hensive school reform in Sweden. An interesting related question
2. The comprehensive school reform
is whether the health effects of education vary by gender2 and
diagnosis.
2.1. The Swedish school system before and after the reform
In this paper we investigate whether there is a causal effect of
education on the incidence and mortality from breast cancer in the
Sweden implemented a compulsory schooling reform as a social
population of women born in Sweden between 1940 and 1957 who
experiment between 1949 and 1962. Prior to the implementa-
survived until at least 1985. We make use of a compulsory schooling
tion of the reform, pupils attended a common basic compulsory
reform that increased the number of compulsory years of education
school (folkskolan) until grade six. After the sixth grade pupils were
from 7 or 8 depending on municipality to 9 years nationwide. We
selected to continue either for one or, in mainly urban areas, two
also compile a unique nationally representative dataset from vari-
years in the basic compulsory school, or to attend the three year
ous Swedish national data registries, including the Swedish Cancer
junior secondary school (realskolan). The selection of pupils into the
Registry.
two different school tracks was based on their past academic per-
The Swedish setting is particularly well suited to study how edu-
formance, measured by grades. The pre-reform compulsory school
cation affects the incidence of a “welfare disease” such as breast
was in most cases administered at the municipality level. The junior
cancer in women for several reasons. First, Sweden is ethnically
secondary school was a prerequisite for the subsequent upper sec-
and racially homogenous, especially in the cohorts under study.
ondary school, which was itself required for higher education.
This reduces potential omitted confounders that could correlate
In 1948 a parliamentary committee proposed a school reform
both with the hereditary genetic make-up and the SES of some eth-
that implemented a new nine-year compulsory comprehensive
nic or racial subgroups. Second, health care is free at the point of
school.3 The reform had three main elements:
access and the Swedish government provides free universal health
insurance. Disparities arising from differential access to care due to
financial constraints are unlikely to play a role in the Swedish set- 1. An extension of the number of years of compulsory schooling to
ting. Breast cancer screening covers the entire female population 9 years in the entire country.
in the critical ages and is free of charge. The screening program 2. Abolition of early selection and tracking based on academic per-
was adopted nation-wide in 1986 after the first results from the formance. Although pupils in the comprehensive schools were
Swedish mammography trials became available (Tabar et al., 1985). able to choose between three tracks after the sixth grade – one
The take up rate of this screening program after the first invita- track including vocational training, a general track, and an aca-
tion to screen is about 80 percent (see e.g. Hussain et al., 2008). demic level preparing for later upper secondary school – they
Third, the Swedish Cancer Registry is the oldest cancer registry and were kept in common schools and classes until the ninth grade.
one of the best in terms of data quality and accuracy in the world 3. Introduction of a national curriculum. The new curriculum
today. replaced the pre-existing curriculum which varied between
municipalities.
2 3
Clark and Royer (2012) as well as Meghir et al. (2012) investigate for differential We offer a brief description of the main parts of the Swedish comprehensive
effects of education by gender and find inconclusive evidence. Gathman et al. (2012) school reform. The school reform and its development are described in Meghir
analyze a number of compulsory schooling reforms in Europe and find diverging and Palme (2003, 2005), and Holmlund (2007). For more detailed reference on the
effects of education on mortality by gender. reform, see Marklund (1981).
M. Palme, E. Simeonova / Journal of Health Economics 42 (2015) 115–124 117
2.2. The social experiment link women in the sample to their parents. We then use the
Education Register for the parents to determine the level of
The social experiment with the new comprehensive nine-year education of each woman’s father. Fathers who had more than
compulsory school started during an assessment period between the basic required (7 years) education are considered highly
1949 and 1962, when the final curriculum was decided.4 The pro- educated.
posed new comprehensive school system, as described above, was All women who died of breast cancer as a primary cause of
introduced in municipalities or parts of city communities, which death were found in the Cancer Register as having been previ-
in 1952 numbered 1055. The cohorts included in our empirical ously diagnosed with the disease. We record all diagnoses and
analysis, born between 1940 and 1957, cover the entire period deaths until 2006. The Swedish Cancer Register is the oldest Can-
of implementation of the comprehensive school. In 1962 it was cer Register in the world and contains detailed information on all
decided that the new comprehensive school would become the incidences of cancer diagnosis in Sweden. It is compiled from com-
standard education in Sweden. The last class that graduated from pulsory cancer diagnosis registrations by physicians, cytologists
the old schooling system did so in 1970. and pathologists and covers close to 100% of all cancer diagnoses
The selection of municipalities into the new comprehensive in Sweden (Swedish National Board of Health and Welfare, 2006).
school was not based on random assignment. Still, the decision Studies of the accuracy of the Cancer Register have shown that cases
to select the areas was based on an attempt to choose locations of breast cancer are the most reliably reported cancer diagnosis
that were representative for the entire country, both in terms in the Register, with under-reporting rates of less than 1.1% of all
of demographics as well as geographically. At first the National cases diagnosed within the reporting year (Barlow et al., 2009).
Board of Education contacted the municipalities, or sometimes they Importantly, the exact date of every diagnosis is recorded, and the
themselves applied to participate. From this pool of applicants a data can be linked to the population registers through a unique
“representative” sample of municipalities was chosen. Municipal- person ID.
ities could elect to implement the comprehensive school starting In the empirical analysis we use the population of all women
with first or fifth grade cohorts. Once the grade of implementation born in Sweden between 1940 and 1957 and surviving until at least
was fixed, all individuals from the cohort immediately affected and 1985. We exclude 414,214 women with missing parental educa-
all subsequent cohorts went to comprehensive school. The older tion background and use the remaining sample of 562,814 women.
cohorts continued in the per-reform school. Of those, 19,736 women were diagnosed with breast cancer after
Meghir and Palme (2005) and Holmlund (2007) study the effect 1984. Of those who were diagnosed, 2370 women died, and breast
of the comprehensive school reform on educational attainments.5 cancer was noted as the cause of death on their death certificate.
The Meghir and Palme (2005) estimates for their entire sample Another 401 of the women diagnosed with breast cancer after 1984
are 0.252 additional years for males and 0.339 years for females; died from a different main diagnosis.8
for low SES persons the estimates are 0.3 extra years for males Table 1 summarizes the main explanatory and control variables
and 0.512 for females.6 Holmlund has estimates in the range used in the analysis. The mortality data start in 1985 and include
0.21–0.61 additional years of schooling for men and 0.13–0.44 for the exact date of death and the main cause of death as recorded in
women. the death certificate. We restrict the time of first diagnosis to be
after 1984 in order to avoid selection of women who were diag-
3. Data nosed previously and survived until the period after 1984. The
women in our sample were aged between 28 and 45 in 1985 and
This is a population-level study. We match data from the (those surviving) between 49 and 66 in 2006. As a percent of total
Swedish Cancer Register to Swedish population register data, the female mortality, breast cancer mortality peaks between ages 40
1990 Swedish Education register, and the Cause of Death regis- and 60 at around 15% of total deaths in the age group. This implies
ter. The population register contains information on the parish of that we are capturing the interval in women’s lives during which
birth for all individuals born in Sweden in the 1940–1957 cohorts. they are most likely to be affected by breast cancer (as opposed
We use this register to assign municipality of birth for all women to another lethal disease). Aggregate mortality in Sweden is very
in the cohorts affected by the schooling reform. The municipal- low at ages below 45 at 6 per 1000 (from data), and breast can-
ity of birth is then used to assign the year in which the reform cer mortality is even lower at 1 per 1000 (from data). A back of
was implemented in that locality, and the reform treatment status the envelope calculation suggests that we are potentially missing
to different cohorts of women who were born in the municipal- at most 100 deaths from breast cancer that may have occurred
ity. Note this means that all estimated effects are of the “intention in our study population before 1985.9 This is a very small part
to treat” type, but we avoid potential bias coming from selective of the total number of breast cancer deaths in the sample – less
migration. Holmlund (2007) offers a detailed exposition of the exact than 5%.
matching algorithm used.7 Several differences in the raw means between women of high
The Cause of Death register contains information on the date and low SES family backgrounds are worth discussing. Unsur-
of death and the principal cause of death. The Census data provide prisingly, on average women of higher SES obtained more years
information on the date of birth and the number of children born of education. They are less likely to have had any children and
to the women in the 1940–1957 cohorts. We use this informa- the average age at first childbearing in this group is about two
tion to assign age at first childbearing and the total completed years higher. High SES women are also more likely to have been
fertility per woman. The multi-generational register is used to
8
The distribution of the main causes of death among those women is: 61 women
4
The official evaluation was mainly of administrative nature. Details on this eval- died from ovarian cancer; 45 from lung cancer; 22 from AMIs; 15 from pancreatic
uation are also described in Marklund (1981). cancer; 12 died from colon cancer; 9 from melanoma; 15 from unknown causes.
5
Holmlund (2007) does not have individual treatment status and imputes it from The remaining 222 deaths are distributed across more than one hundred different
municipality of residence in 1960. causes.
6 9
Note that Meghir and Palme (2005) use the exact reform assignment from the Assuming mortality at missing ages is equivalent to mortality at those ages
school registries for a random subset of the cohorts born in 1948 and 1953. Their among observed cohorts; assuming also that cohort sizes at different ages are sim-
estimates are free of measurement error in the reform assignment variable. ilar over time, which gives an upper bound estimate since demographic trends led
7
We are grateful to Helena Holmlund for sharing her algorithm with us. to steady cohort size increase between 1940 and 1957.
Table 1
Main explanatory and outcome variables in interest. Standard deviations are reported in square brackets under the mean. P-values of tests of differences in means between
high and low SES background women are also reported.
Father’s education Low FE High FE P-value diff
Obs Mean Obs Mean
Variable
Years of education 360,240 11.155 180,612 12.823 0
[2.805] [3.063]
No children (nulliparous) 372,894 0.11 187,702 0.134 0
[0.315] [0.34]
Age at first childbearing 331,164 26.5 162,587 28.1 0
[5.4] [5.5]
Age at diagnosis 12,723 50.196 6654 49.513 0.088
[6.581] [6.511]
Death year – year of diagnosis 1538 4.83 724 5.3 0
[4.55] [4.63]
Diagnosed with breast cancer 372,894 0.035 187,702 0.036 0.012
[0.183] [0.186]
Died from breast cancer 372,894 0.004 187,702 0.004 0.16
[0.065] [0.063]
diagnosed with breast cancer, to have received the diagnosis at an 4. The relation between educational attainment and breast
earlier age and, conditional on dying from breast cancer, to have cancer incidence and mortality
lived longer between their initial diagnosis and the time of death.
There is no significant difference in the probability of death from We start the analysis by documenting correlations between
breast cancer by SES background. These facts suggest that either the years of attained education and socio-economic background
(1) higher SES women are more likely to have been diagnosed ear- and the probability of diagnosis and death from breast cancer in
lier or that (2) higher SES women received better treatment, or Sweden. Table 2 presents the estimates. We use all available obser-
both. The differences in age at diagnosis appear in favor of the first vations to maximize power. Coefficients and standard errors are
hypothesis, but we cannot draw any firm conclusions based on this multiplied by 1000 for better presentation. Women with an extra
evidence. year of education are 3 percent (evaluated at the mean incidence of
Here it is important to consider the importance of breast can- breast cancer diagnosis in the population) more likely to have been
cer screening for early diagnosis and treatment. The large clinical diagnosed with breast cancer than their less educated peers. The
trials that produced evidence on the beneficial effects of mam- correlation with high socio-economic family background is larger
mography, were done in Sweden in the 1970s and 1980s (see – even if we control for years of attained education, women who
Tabar et al., 1985). Thus, policy makers in Sweden were quite were born in better-off families are 7 percent more likely to receive
aware of the importance of breast cancer screening at the time a breast cancer diagnosis.
our study period begins. After the first results of the random- The effect of education on deaths from breast cancer is not as
ized trials came out, the National Board of Health and Welfare clear. None of the education coefficients attain statistical signif-
issued guidelines in 1986 recommending that the county councils icance at the 10% level even though the coefficient on years of
invite women ages 40–54 years to screening every 18 months and education implies a negative correlation both with and without
women ages 55–74 years every second year. Thus, national ser-
vice screening with mammography was initiated in 1986. Local Table 2
health administrations are in charge of running the screening Correlations between years of educational attainment and diagnosis/death from
programs. All women of eligible ages receive a letter giving a spe- breast cancer in women.
cific date and time for a mammography examination. Failure to Panel A (1) (2) (3)
attend the scheduled examination or re-schedule the appointment
Diagnosis
results in a second invitation up to six consecutive invitations. A
Years of schooling 0.96* 0.89*
regional case study from Uppsala reports that of the 46,041 eligible coef * 1000 (0.09) (0.08)
women only 5.6% never attended after six attempted appoint- High SES 2.99* 1.92*
ments. Non-attenders tend to be older (over the age of 60), foreign coef * 1000 (0.54) (0.56)
born and single. Note that all foreign-born women residing in Observations 541,135 560,596 541,135
Mean incidence per 1000 34.1 34.1 34.1
Sweden are excluded from our sample by construction. Interest- R-squared 0.006 0.007 0.010
ingly, the relationship between education and the probability of Empirical model Linear Prob Linear Prob Linear Prob
non-attendance is u-shaped, with women finishing high school,
some college, and college more likely to attend than those with Panel B (1) (2) (3)
professional education or high school drop outs (Lagerlund et al., Death from breast cancer
2002). Years of schooling −0.01 −0.02
Breast cancer is a common killer in our sample. In this rel- coef * 1000 (0.01) (0.01)
High SES −0.09 0.12
atively young population, 15% of all deaths are due to breast
(0.18) (0.09)
cancer. Cardio-vascular diseases account for an extra 13.5% of Mean deaths per 1000 4.1 4.1 4.1
total mortality. Deaths from other cancers are responsible for Observations 541,135 560,596 541,135
another 33%. In total, cardio-vascular and cancer-related mortality Empirical model Linear Prob Linear Prob Linear Prob
account for close to two-thirds of all female deaths in the sample Note: Robust standard errors in parentheses; *Significant at 1%; SE clustered on the
cohorts. municipality of birth level.
controls for parental SES background. Coupled with the evidence 1000 municipality indicator variables in addition to the 17 birth
on the higher incidence of diagnoses among the more educated cohort dummies. For relatively small treatment effects, when both
women, this suggests that conditional on being diagnosed with approaches have been used in a similar context, the results are
breast cancer, more educated women are more likely to survive. almost identical.11
This is consistent both with evidence that educated people are more We also use linear probability models as one of two methods
adept at using new medical technologies (Glied and Lleras-Muney, of estimating the probability of death from breast cancer. The lin-
2008; Lichtenberg and Lleras-Muney, 2005) and with earlier diag- ear probability model is handy because it can efficiently estimate a
nosis and earlier treatment in higher SES background women. The large number of dummy coefficients in specifications where we
table of means shows supportive evidence for the latter expla- also include municipality-specific time trends. We complement
nation. Even though the mortality point estimate suggests that the linear probability estimates with estimates from Cox semi-
education has a positive effect on survival, the precision of the parametric proportional hazard models. For the time to death from
estimates is not high enough to make any strong conclusions. breast cancer outcomes we use Cox proportional hazard models of
The corresponding Cox proportional hazard estimates are: in this type:
the full sample one year of schooling reduces the probability of
death from breast cancer by a statistically insignificant 1.6% relative
I1,i,m,t (r|Ri,m,t , Ti , Mi ) = I0 (r) exp{˛ + ˇ1 Ri,m,t + 1 Ti + 2 Mi }, (2)
to the mean (SE 0.0197), which is a larger estimate than the LP
estimate evaluated at the mean (0.03%); the Cox estimate of high
SES in model 2 is a statistically insignificant decrease of 2.4% relative where r is exposure time and I0 (r) is the baseline hazard. This model
to the mean, not too far from the LP estimate of 2.2% relative to the is semi-parametric in the sense that no functional form assumption
mean. is imposed on the base line hazard. Importantly, when we consider
the hazard of death from breast cancer, we consider only deaths
5. Empirical specification from breast cancer as terminal event. Thus, all women who died
from causes other than breast cancer are considered still living
We use two main types of outcomes in the empirical analy- at the end of the observation window. Prior research has found
sis. When we consider breast cancer mortality, we use the binary that the compulsory schooling reform did not significantly affect
mortality outcome and the time to death as the outcome vari- life expectancy for (high and low SES) Swedish women (Meghir
ables. When we study the incidence of breast cancer, we use a et al., 2012). Moreover, the age at first diagnosis in this sample
binary outcome variable equal to one if the woman was ever diag- is fairly young. These two facts suggest that a competing risks
nosed with breast cancer after 1984 and zero otherwise. We use phenomenon is an unlikely explanation for our estimates. Never-
the same identification strategy for the effect of the reform for theless, as Honoré and Lleras-Muney (2006) show that decreasing
both types of outcomes. If the reform would have been randomly cardio-vascular disease mortality in the US contributed to a steady
distributed among Sweden’s 1000 or so municipalities we could (non-declining) cancer mortality rate between the 1970s and 2000s
have simply compared the outcomes in the treated and non-treated we construct Peterson bounds on our estimates taking into account
municipalities conditional on year of birth. However, as has been the association between cardio-vascular and cancer mortality risks.
discussed in previous studies (see e.g. Meghir et al., 2012), this was The assumption here would be that obtaining extra education,
not the case. Therefore, we will control for both birth cohort and while reducing the likelihood of cardio-vascular mortality, indi-
municipality of birth. We start with the following latent variable rectly increases the likelihood of breast cancer mortality. To control
specification: for unobserved differential trends that might affect municipalities
differently depending on the timing of the education reform, we
∗
yi,m,t = ˛ + ˇ1 Ri,m,t + 1 Ti + 2 Mi + εi.m.t , (1) include linear trends by year of reform implementation. All munic-
where i, m and t are sub-indices for individual, municipality and ipalities that implemented the reform in the same year are assigned
birth cohort, respectively; y* is a latent variable for health status; T the same linear trend. The empirical results section reports the
is a vector of dummy variables for year of birth; M is a corresponding results from these preferred specifications.
vector of dummy variables for municipality of birth; finally, ε is an Table 3 demonstrates the effects of being exposed to the school-
individual random disturbance. ing reform on the number of years of attained education for Swedish
The key identifying assumption is that the distribution f(·) of women of affected cohorts. We first show the effects on the entire
ε does not depend on the assignment to reform treatment, con- sample and then split the sample according to the education level
ditional on cohort and municipality. In practice we impose the of the father. We expect that the education reform affected chil-
stronger assumption that the distribution of ε is independent of dren from low SES families more as they were more likely to drop
all right hand side variables. It is important to note that the reform out of school earlier. The results confirm that women from low SES
assignment in this analysis depends on the municipality of birth, backgrounds increased their education by more than those from
rather than the municipality of schooling. On the one hand, this high SES backgrounds. The reform resulted in an average increase
means that the estimates are of the “intention-to-treat” type. On of 1.8 months of schooling for girls coming from relatively disad-
the other hand we avoid selection issues coming from differential vantaged backgrounds. The corresponding estimate for the high
(and potentially endogenous) mobility.10 SES group is about one third of the size and does not attain sta-
For the binomial outcome breast cancer diagnosis, we use tistical significance at the 10% level. We present estimates from
linear probability models. The reason for using a linear proba- models including linear time trends grouped by year of imple-
bility model, rather than e.g. logit and probit, which restrict the mentation and from specifications including municipality-specific
probabilities in the [0, 1] interval and relax the linearity assump- linear time trends. The estimated coefficients are very similar,
tion, is computational convenience, since all models include about which is reassuring that unobserved municipality-specific changes
coincidental with reform implementation are unlikely to bias our
results.
10
Meghir and Palme (2005) show, however, that 90.1 percent have the same
reform assignment based on predictions from their municipality of birth as their
municipality of schooling; 5.3 percent moved from reform to non-reform munici-
11
palities; 4.6 moved in the other direction. See for example Meghir et al. (2011).
Table 3
The effect of education reform on women’s educational attainment in years of education.
Father’s education All Low FE High FE
(1) (2) (3) (4) (5) (6)
Reform 0.119* 0.106* 0.149* 0.14* 0.055 0.043

(0.029) (0.023) (0.022) (0.021) (0.035) (0.033)
Mean years of education 11.7 11.7 11.1 11.1 12.8 12.8
Linear trends by year of reform implementation Yes Yes Yes
Municipality trends Yes Yes Yes
Observations 540,852 540,852 360,240 360,240 180,612 180,612
R-squared 0.041 0.045 0.041 0.045 0.024 0.031
Note: Robust standard errors in parentheses; SE clustered on the municipality of birth level. + Significant at 10%; **significant at 5%; *significant at 1%.
Table 4
Educational reform and the risk of diagnosis and death from breast cancer.
Father’s education All Low High
Diagnosis
Reform coef * 1000 1.5** 1.71** 1.6 1.92+ 1 1.1
(0.78) (0.775) (1.1) (1.025) (1.5) (1.47)
Mean dep var * 1000 34.1 33.7 35
Municipality trend Yes Yes Yes
Observations 560,596 560,596 372,894 372,894 187,702 187,702
R-squared 0.006 0.0075 0.006 0.009 0.010 0.015
Death from breast cancer

Reform coef * 1000 0.63** 0.66** 0.68+ 0.74+ 0.65 0.6
(0.3) (0.33) (0.4) (0.44) (0.9) (0.87)
Mean dep var * 1000 4.1 4.1 4.2 4.2 3.9 3.9
Municipality trend Yes Yes Yes
Observations 560,596 560,596 372,894 372,894 187,702 187,702
R-squared 0.002 0.004 0.003 0.0061 0.006 0.010
Note: Municipality fixed effects included in all specifications; birth cohort dummies included in all specifications; robust standard errors in parentheses; standard errors
clustered on the municipality of birth level; + significant at 10%; **significant at 5%; *significant at 1%.
6. Results education although the precision is not sufficient to make any def-
inite conclusion.12
6.1. Main findings The linear probability estimation results in Table 4 also show
that the reform causes a significant increased risk of mortality with
We next turn to the effects of the compulsory education reform breast cancer as a primary cause of death.13 That is, the expected
on breast cancer incidence and death. Since we know from previ- improvement in the effect of better response to cancer treatment
ous research (see e.g. Meghir and Palme, 2005) that the reform had from more education was not sufficient to offset the increased risk
very different effects on later-life economic wellbeing depending of being diagnosed with breast cancer.
on parental SES, we run separate regressions by women’s fam- In addition to the linear probability models we obtained Cox
ily SES background. It is important to note that since the reform PH mortality estimates stratified at the municipality level. The
was not limited to simply increasing the number of compulsory Cox semi-parametric model imposes fewer restrictions on the
years of education, but had additional elements, the results that estimates, however it suffers from severe incidental parameters
follow are not directly comparable with the education correla- problems with a large number of dummy variables, such as would
tions presented in Table 2. The results on how reform treatment be included in a specification including municipality specific lin-
affected the probability of diagnosed breast cancer are shown ear trends. That is why we ran the Cox estimations with linear
in the top panel of Table 4 and on mortality from the disease trends by year of implementation only, so these estimates are com-
in the bottom panel. We present estimates with year of imple- parable to the linear probability coefficients reported in columns
mentation specific linear trends for easy comparison with the (1), (3), and (5). The hazard is stratified by municipality of birth,
semiparametric Cox estimates, as well as results from linear prob- allowing for potentially different underlying breast cancer mortal-
ability models including municipality-specific linear trends. We ity hazards by municipality. The Cox estimates are as follows: full
multiply all coefficients and standard errors by 1000 for ease of
presentation.
There is a significantly positive effect of reform assignment
12
If we consider only the reform’s effect on schooling attainment, we would mul-
on the probability of being diagnosed with breast cancer in the
tiply the reform estimates from Table 4 by 1/(estimated change in years of education
full sample. Although the precision in the estimate in the low from Table 2), resulting in much larger estimates of the effect of an additional year of
SES subsample is somewhat inferior, it is obvious that the effect education than what is obtained in the correlations reported in Table 3. We empha-
is attributable to the group originating from low SES families, size, however, that using the reform as an IV for years of attained education is
who experienced the largest effect of the education reform. The most likely flawed, as the reform contained additional elements that challenge the
exclusion restrictions.
point estimate of the magnitude of the effect suggests a 1.5 13
In the analyses we exclude all women who have received a diagnosis of breast
percentage point elevated risk, which is somewhat more than cancer pre-1985 to avoid selection bias. This is because our mortality data start in
the correlation estimate corresponding to one year extra year of 1985 and survival following a breast cancer diagnosis could be related to the reform.
sample coefficient 0.18+ (SE 0.1), which is very similar to the LP sample. The reference cohort is the one born 2 years before the first
estimate evaluated at the mean (15% increase); low father’s edu- treated cohort. The regressions control for municipality and year of
cation sample estimate 0.2 (SE 0.15) – again very similar to the LP birth fixed effects, as well as municipality group by year of imple-
coefficient estimated at the mean – a 16% increase. mentation linear trends. As the figures demonstrate the conditional
probability of diagnosis and death is not significantly different from
6.2. Competing risks zero in cohorts born pre-implementation. There is however a sharp
increase in the probability that starts with the cohort right before
A potentially important concern in analyzing mortality by dif- the first fully treated cohort and levels off at a new and increased
ferent causes has been raised by Honoré and Lleras-Muney (2006). level with the second fully treated cohort (1 year after year zero of
Technological progress in medicine or any other factor that affects the implementation in the figures below).
the treatment or detection of certain diseases would affect the Second, we performed a number of placebo tests in which we
probability of death from related diagnoses but also the prob- pretend that the reform was implemented earlier or later than
ability of death from other conditions, which pose “competing the actual implementation year. The placebo treatment groups are
risks”. In essence, failure to die from one condition at a given age defined by falsely assigning treatment to women born 6, 4 and 2
increases the probability of death from another condition. Honoré years before the first fully treated cohort, as well as 2, 4 and 6 years
and Lleras-Muney (2006) show in particular that cardio-vascular after the first cohort. In the first arrangement women who were not
(CVD) and cancer deaths in the US are related in this manner. treated receive false treatment status. In the latter arrangement we
Improvements in the treatment of CVD led to decreased mortality pretend that women who were (actually) treated and were born 2,
from CVD but also to increased mortality from cancer compared to 4, and 6 years from the first treated cohort were not treated. Thus,
the counterfactual. This is important in our setting because educa- in this set-up treated women receive false untreated status. We
tion may have affected the early detection and proper treatment present all these tests together in Fig. 2.
of CVD, leading to a reduction in the probability of death from Every estimate is obtained from a separate regression includ-
CVD. Through the competing risks channel, this reduction may have ing cohort and municipality fixed effects, as well as year of
increased the probability of death from breast cancer. To exam- implementation linear trends. The regressions assigning treat-
ine this hypothesis, we first estimate the effects of being exposed ment to untreated cohorts include only women from untreated
to the reform on CVD mortality and compute bounds for our cohorts. Similarly, the regressions assigning non-treatment to
estimates. treated cohorts include only treated women. As the figures
The probability of death from CVD is reduced by reform treat- demonstrate, the largest in absolute value and only statistically
ment by 0.53% (Cox estimate 0.99472, CI 0.8353–1.1845) in the significant effects are estimated when we assign the correct treat-
full sample. In the subsample of low SES background women, the ment values. Further, there are no particular discernible patterns,
reform treatment leads to a 2.7% decrease in CVD mortality (Cox suggesting that there is nothing that systematically biases our
estimate 0.97304, CI 0.80374–1.178). Assuming that everyone who estimates.
did not die from CVD died from breast cancer, we compute a lower
bound on our breast cancer mortality estimates. The education 6.4. Changes in fertility behavior as a possible mechanism behind
reform increases the probability of death from breast cancer or CVD the results
by 9.8% (hazard ratio 1.098, CI from 0.997 to 1.209). Thus even if
the entire reduction in CVD mortality is translated into breast can- The causal estimates confirm the positive correlations between
cer mortality, we still find a positive effect of the reform on the education and the probability of breast cancer diagnosis. Medical
(combined) mortality rate, even though it is about half the size studies have pointed to several channels that might contribute to
of the effect we obtain when we assume the risks are unrelated these findings (see Nechuta et al., 2010 for a recent review). For two
(18%).14 of these – the inverse relation between educational attainments
and completed fertility as well as the positive relation between
education and age at first birth – we have information in our data
6.3. Parallel trends assumption
set allowing us to analyze how these two outcomes were affected
by the schooling reform.
Our difference-in-differences analysis relies on the assumption
As a background to this analysis, Table 5 shows associations
of parallel trends in the incidence of diagnosed breast cancers
between attained education and women’s fertility behavior in
before and after the cohort affected by the reform in each munici-
Sweden using the same population we analyzed in the mortality
pality. We implement two different tests of this assumption. First,
regressions. Column (1) reports the correlation between year of
Fig. 1 plots the conditional marginal effects of exposure in the
schooling and the probability of having no children; column (2)
6 cohorts pre-implementation to 6 cohorts post-implementation
displays the relation between year of schooling and number of chil-
dren; finally, column (3) The shows the correlation between years
of schooling and age at first child.
14
A separate issue emerges if we consider testing for the effect of the reform on As can be seen in Table 5, there is a statistically significant rela-
deaths from breast cancer as one of a series of multiple mortality tests we could
perform, including the reform effect on death from CVD and death from other
tion between years of schooling and each of the three outcomes
causes. We performed an adjustment procedure to calculate the q-value, which is under study. The point estimates suggest that one additional year
the P-value of the test adjusted for the false discovery rate. This methodology was of schooling is associated with a 0.003 increase in the probability of
developed by Storey and co-authors and software was created by Dabney and Storey having no children; 0.013, or an about 0.8 percent, fewer children;
(Storey, 2002; Storey and Tibshirani, 2003). We picked a 0 of 1 and an FDR threshold
and, finally, almost half a year older age at first birth.
of 0.05. The q-value on the reform coefficient in the first linear probability regression
of breast cancer mortality is 0.096 (the P-value is 0.032); the q-value on the reform In Table 6 we turn to analyzing the effect of schooling reform
coefficient in a linear probability regression with binary outcome “death from any on the same set of outcomes as those analyzed in Table 5. None
other condition” is 0.141 (P-value 0.95); the q-value on the reform coefficient in a of the point estimates attain statistical significance. Comparing
linear probability regression with binary outcome “death from CVD” 0.79 (P-value the estimates for the effect of the reform with the correlations
0.79). While a P-value threshold of 0.1 implies that 1 in every 10 tests will be a false
positive, a q-value threshold of 0.1 implies that one in every 10 positive tests will
shown in Table 5, it is evident that the precision in the reform
be a false positive. Even after a conservative adjustment for multiple hypotheses effect estimates for the probability of having no children as well
testing, we obtain a reform effect that is still significant at the 10% level. as the total number of children is too low to enable us to reject
Fig. 1. Probability of breast cancer diagnosis and death from breast cancer among cohorts of women born close to the first cohort affected by the reform. Note: Conditional
marginal effects plotted in solid line, 95% confidence intervals in dashed lines. The omitted category is women born 2 years before the first cohort that was affected by the
reform. Cohort and municipality fixed effects included in the regressions, as well as municipality group by year of implementation-specific linear trends.
Educaon Death from breast cancer

0,2 2
0,15 1,5
1
0,1
0,5
0,05
0
-6 -4 -2 0 2 4 6
0 -0,5
-6 -4 -2 0 2 4 6
-0,05 -1
-1,5
-0,1
-2
Diagnosis
0,006
0,004
0,002
0
-6 -4 -2 0 2 4 6
-0,002
-0,004
-0,006
Fig. 2. Placebo tests assigning treatment status to untreated or untreated status to treated cohorts.
the hypothesis that the effects are the same as for one additional is as small as 0.025 for age at first birth, and so we can con-
year of schooling. However, for the age at first child outcome, clude that it is unlikely that the mechanism behind our result
the point estimate is very different and the precision sufficient of the reform effect on cancer diagnosis incidence is through
to allow us to reject that the effect is as large as the almost 0.5 delayed childbearing among those who had children. For the
years as suggested by the result in Table 5. The upper confidence other two outcomes, the precision is too low for any definite
limit for a 95 percent confidence interval for the reform effect conclusions.
Table 5
Education and women’s fertility behavior.
Father’s education All All All

Outcome No children Total fertility (number of children) Age at first childbearing
(1) (2) (3)
Years of schooling 0.0034* −0.0134* 0.4608*

(0.0002) (0.0010) (0.0038)
Mean outcome variable 0.11 1.7 27
Empirical model Linear Prob OLS OLS
Observations 541,135 541,135 478,946
R-squared 0.0057 0.0057 0.0868
Note: + Significant at 10%; **significant at 5%; *significant at 1%; robust standard errors in parentheses clustered at the municipality of birth; linear trends by year of reform
implementation included in all specifications; birth cohort dummies included in all specifications.
Table 6 were sufficient for excluding it as a major mechanism behind our

Education reform and women’s fertility behavior.
results on breast cancer incidence and mortality.
Father’s education All Low education High education Epidemiological studies of the incidence of breast cancer have
(1) (2) (3) discussed the possibility that breast feeding, breast feeding dura-
Probability of not bearing a child tion and the duration of oral contraceptive use may affect the
Reform 0.0018 0.0011 0.0033 probability of breast cancer. Since we have not been able to find
(0.0018) (0.0021) (0.0028) large enough dataset including these outcomes, we have not been
Mean outcome variable 0.11 0.11 0.134
able to explore their potentials as a possible mechanism behind our
Empirical model OLS OLS OLS
Observations 560,596 372,894 187,702 results and have to leave this for further research.
R-squared 0.0050 0.0057 0.0082
Total fertility (number of children)

Reform 0.0028 0.0034 −0.0018 References
(0.0061) (0.0073) (0.0107)
Observations 560,596 372,894 187,702
Althius, M., et al., 2005. Global trends in breast cancer incidence and mortality
R-squared 0.0045 0.0055 0.0077
1973–1997. International Journal of Epidemiology 34 (2), 405–412.
Age at first childbearing Baquet, C., Horm, J., Gibbs, T., Greenwald, P., 1990. Socio-economic factors and cancer
Reform −0.0445 −0.0464 −0.0426 incidence among Blacks and Whites. Journal of the National Cancer Institute 83
(0.0350) (0.0364) (0.0572) (8), 551–557.
Barlow, L., et al., 2009. The completeness of the Swedish Cancer Register – a sample
Mean outcome variable 27 26.5 28.1
survey for year 1998. Acta Oncologica 48 (1), 27–33.
Empirical model OLS OLS OLS
Braaten, T., Weiderpass, E., Kumle, M., Adami, H.-O., Lund, E., 2004. Education and
Observations 493,751 331,164 162,587 risk of breast cancer in the Norwegian-Swedish women’s lifestyle and health
R-squared 0.0269 0.0225 0.0319 cohort study. International Journal of Cancer 110 (4), 579–583.
Note: + Significant at 10%; **significant at 5%; *significant at 1%; robust standard Center for Epidemiology, 2006. Cancer Incidence in Sweden 2004. Center for Epi-
errors in parentheses clustered at the municipality of birth; linear trends by year demiology, National Board of Health and Welfare, Stockholm.
Clark, D., Royer, H., 2012. The effect of education on adult mortality and health:
of reform implementation included in all specifications; birth cohort dummies
evidence from Britain. American Economic Review 103 (6), 2087–2120.
included in all specifications.
Cutler, D., Lleras-Muney, A., 2006. Education and health: evaluating theories and
evidence. In: House, J.S., Schoeni, R.F., Kaplan, G.A., Pollack, H. (Eds.), The Health
Effects of Social and Economic Policy. Russell Sage Foundation, New York.
Cutler, D., Lleras-Muney, A., 2011. Education and Health (Unpublished manuscript).
Danø, H., Hansen, D.K., Jensen, P., Pedersen, J.H., Jacobsen, R., Ewertz, M., Lynge,
E., 2004. Fertility pattern does not explain social gradient in breast cancer in
7. Concluding remarks
Denmark. International Journal of Cancer 111 (3), 451–456.
Ferlay, J., Shin, H.R., Bray, F., Forman, D., Mathers, C., Parkin, D.M., 2010. Estimates
Numerous studies have shown that higher educational attain- of worldwide burden of cancer in 2008: GLOBOCAN 2008. International Journal
ment is conducive to better health in the affected cohorts and their of Cancer 127 (12), 2893–2917.
Gathman, C., Jürges, H., Reinhold, S., 2012. Compulsory Schooling Reforms, Education
offspring (see review by Cutler and Lleras-Muney, 2011). Breast and Mortality in Twentieth Century Europe. University of Wuppertal (mimeo).
cancer is an exception to this rule in the sense that the incidence Glied, S., Lleras-Muney, A., 2008. Health Inequality, Education and Medical Innova-
of diagnosed cases increases with education and it has therefore tion. Demography 45 (3), 741–761.
Greene, W., 2003. Econometric Analysis. Prentice Hall, New Jersey.
been labeled a “welfare disease”. We show that this relation holds Heck, K., Pamuk, E., 1997. Explaining the relation between education and post-
also as a response to an exogenous policy change that induced an menopausal breast cancer. American Journal of Epidemiology 145 (4), 366–372.
increase in compulsory schooling. This result suggests that the rela- Hemminki, K., Li, X.J., 2003. Level of education and the risk of cancer in Sweden.
Cancer Epidemiology Biomarkers Preview 12, 796–802.
tion between women’s educational attainments and breast cancer Hemminki, K., Li, X.J., 2004. University and medical education and the risk of cancer
is likely due to some characteristic or risk factor that is acquired in Sweden. European Journal of Cancer Prevention 13, 199–205.
as additional education is obtained, rather than some innate qual- Holmlund, H., 2007. A Researcher’s Guide to the Swedish Compulsory School Reform.
Working paper 9/2007, Swedish Institute for Social Research, Stockholm Uni-
ity that is correlated with educational attainments. Many social
versity.
and health behaviors fit these categories, such as the use of hor- Honoré, Bo E., Lleras-Muney, A., 2006. Bounds in competing risks models and the
monal therapies and oral contraceptives, which have been linked war on cancer. Econometrica 74 (6), 1675–1698.
Hussain, S.K., Altieri, A., Sundquist, J., Hemminki, K., 2008. Influence of education
to increased probability of breast cancer.
level on breast cancer risk and survival in Sweden between 1990 and 2004.
The reform may have made women more willing to participate International Journal of Cancer 122 (1), 165–169.
in screening programs. Meghir et al. (2013) shows that cognitive Lager, A., Torssander, J., 2012. Causal effect of education on mortality in a quasi-
skills were improved as a result of the reform, which, in turn, experiment on 1.2 million Swedes. Proceedings of the National Academy of
Sciences 109 (22), 8461–8466.
may make people more adequately aware of different risk fac- Lagerlund, M., et al., 2002. Sociodemographic predictors of non-attendance at invi-
tors (see e.g. Cutler and Lleras-Muney, 2006). Since participation in tational mammography screening – a population-based register study. Cancer
screening programs is not included in our data, we are not able to Causes and Control 13 (1), 73–82.
Lange, F., 2011. Education and allocative efficiency. Evidence from cancer screening.
estimate this effect separately. However, given that about 80 per- Journal of Health Economics 30, 43–54.
cent of Swedish females participate in the nationwide screening Lichtenberg, F., Lleras-Muney, A., 2005. The effect of education on medical technol-
program it is not likely that improved participation in the pro- ogy adoption: are the more educated more likely to use new drugs? Special issue
of the Annales d’Economie et Statistique in memory of Zvi Griliches, No 79/80.
gram makes up the entire effect. Further, superior screening among Lleras-Muney, A., 2005. The relationship between education and adult mortality in
higher SES women by itself would not explain the elevated risks of the U.S. Review of Economic Studies 72 (1), 189–221.
death that we document. The findings in this study imply that the Lund, E., Jacobsen, B.K., 1991. Education and breast-cancer mortality – experience
from a Large Norwegian Cohort Study. Cancer Causes and Control 2, 235–238.
quality and availability of breast cancer screening and preventive
Marklund, S., 1981. Skolsverige 1950–1975: Försöksverksamheten. Liber Utbild-
health care must keep pace with improving educational opportu- ningsförlaget, Stockholm.
nities for women world-wide. Meghir, C., Palme, M., 2003. Ability Parental Background and Educational Policy:
Empirical Evidence from a Social Experiment. Institute for Fiscal Studies, IFS
We find no convincing evidence that fertility behaviors, often
Working Papers: W03/05.
cited as a potential mechanism behind the higher incidence of Meghir, C., Palme, M., 2005. Educational reform, ability, and family background.
breast cancer in educated women, were significantly affected by American Economic Review 95 (1), 414–424.
the reform and we could therefore not conclude that it is driving Meghir, C., Palme, M., Schnabel, M., 2011. The Effect of Education Policy on Crime:
An Intergenerational Perspective. NBER Working Paper 18145.
the result of elevated risk of breast cancer caused by the educa- Meghir, C., Palme, M., Simeonova, E., 2012. Education, Health and Mortality: Evi-
tion reform. For delayed childbearing the precision in our estimates dence from a Social Experiment. NBER Working Paper 17932.
Meghir, C., Palme, M., Simeonova, E., 2013. Education, Cognition and Health: Evi- Spasojevic, J., 2010. Effects of education on adult health in Sweden: results from a
dence from a Social Experiment. NBER Working Paper 19002. natural experiment. Contributions to Economic Analysis 290, 179–199.
Nechuta, S., Paneth, N., Velie, E., 2010. Pregnancy characteristics and maternal breast Storey, J., 2002. A direct approach to false discovery rates. Journal of the Royal
cancer risk: a review of the epidemiologic literature. Cancer Causes and Control Statistical Society: Series B 64, 479–498.
21, 967–989. Storey, J., Tibshirani, R., 2003. Statistical significance for genome-wide experiments.
Oreopoulos, P., 2006. Estimating average and local average treatment effects of edu- Proceeding of the National Academy of Sciences 100, 9440–9445.
cation when compulsory school laws really matter. American Economic Review Tabar, L., et al., 1985. Reduction in mortality from breast cancer after mass screening
96 (1), 152–175. with mammography. The Lancet 325 (8433), 829–832.

Peer effects, fast food consumption and adolescent weight gain夽

Bernard Fortin a,∗ , Myra Yazbeck b
a
CIRPÉE, IZA, CIRANO and Department of Economics, Université Laval, Canada
b
School of Economics, University of Queensland, Australia
Article history: This paper aims at opening the black box of peer effects in adolescent weight gain. Using Add Health
Received 1 February 2015 data on secondary schools in the U.S., we investigate whether these effects partly flow through the eating
Accepted 12 March 2015 habits channel. Adolescents are assumed to interact through a friendship social network. We propose a
two-equation model. The first equation provides a social interaction model of fast food consumption. To
estimate this equation we use a quasi maximum likelihood approach that allows us to control for common
environment at the network level and to solve the simultaneity (reflection) problem. Our second equation
C31 I10
is a panel dynamic weight production function relating an individual’s Body Mass Index z-score (zBMI)
I12
to his fast food consumption and his lagged zBMI, and allowing for irregular intervals in the data. Results
Keywords: show that there are positive but small peer effects in fast food consumption among adolescents belonging
Obesity to a same friendship school network. Based on our preferred specification, the estimated social multiplier
Overweight is 1.15. Our results also suggest that, in the long run, an extra day of weekly fast food restaurant visits
Peer effects increases zBMI by 4.45% when ignoring peer effects and by 5.11%, when they are taken into account.
Social interactions
© 2015 Elsevier B.V. All rights reserved.
Fast food
1. Introduction per year (Finkelstein et al., 2009). Obesity is also associated with
increased risk of reduced life expectancy as well as with serious
For the past few years, obesity has been one of the major con- health problems such as type 2 diabetes (Maggio and Pi-Sunyer,
cerns of health policy makers in the U.S. It has also been one of the 2003), heart disease (Calabr et al., 2009) and certain cancers (Calle,
principal sources of increased health care costs. In fact, the increas- 2007), making obesity a real public health challenge.
ing trend in children’s and adolescents’ obesity (Ogden et al., 2012) Recently, a growing body of the health economics literature has
has raised the annual obesity-related medical costs to $147 billion tried to look into the obesity problem from a new perspective using
a social interaction framework. An important part of the evidence
suggests the presence of peer effects in weight gain. On one hand,
夽 We wish to thank Christopher Auld, Charles Bellemare, Luc Bissonnette, Vincent Christakis and Fowler (2007), Trogdon et al. (2008), Renna et al.
Boucher, Paul Frijters, Guy Lacroix, Paul Makdissi, Daniel L. Millimet, Kevin Moran, (2008) and Yakusheva et al. (2014) are pointing to the social multi-
Bruce Shearer for useful comments and Yann Bramoullé, Badi Baltagi, Rokhaya Dieye, plier as an important element in the obesity epidemics. As long as it
Habiba Djebbari, Tue Gorgens, Bob Gregory, Louis Hotte, Linda Khalaf, Lung fei
is strictly larger than one, a social multiplier amplifies, at the aggre-
Lee, Xin Meng and Rabee Tourky for useful discussions. The suggestions of two
anonymous referees have substantially improved the paper. We are grateful to
gate level, the impact of any shock (such as the reduction in relative
Rokhaya Dieye for outstanding assistance research. The usual disclaimer applies. price of junk food) that may affect obesity at the individual level.
Financial support from the Canada Research Chair in the Economics of Social Policies This is so because the aggregate effect incorporates, in addition to
and Human Resources and le Centre interuniversitaire sur le risque, les politiques the sum of the individual direct effects, positive indirect peer effects
économiques et l’emploi is gratefully acknowledged. This research uses data from
stemming from social interactions. On the other hand, Cohen-Cole
Add Health, a program project directed by Kathleen Mullan Harris and designed by
J. Richard Udry, Peter S. Bearman, and Kathleen Mullan Harris at the University of and Fletcher (2008b) found that there is no evidence of peer effects
North Carolina at Chapel Hill, and funded by grant P01-HD31921 from the Eunice in weight gain. Also, results from a placebo test performed by
Kennedy Shriver National Institute of Child Health and Human Development, with the same authors (Cohen-Cole and Fletcher, 2008a) indicate that
cooperative funding from 23 other federal agencies and foundations.
∗ Corresponding author. Tel.: +1 418 656 5678.
there are peer effects in acne (!) in the Add Health data when one
E-mail addresses: Bernard.Fortin@ecn.ulaval.ca (B. Fortin),
applies the Christakis and Fowler (2007) method discussed later
m.yazbeck@uq.edu.au (M. Yazbeck). on.
126 B. Fortin, M. Yazbeck / Journal of Health Economics 42 (2015) 125–138
While the presence (or not) of peer effects in weight has been ables. The system of equations thus allows us to evaluate the impact
widely researched,1 the literature on the mechanisms by which of an eating habits’ exogenous shock on an adolescent’s weight,
peer effects flows is still scarce. Indeed, most of the relevant lit- when peer effects on fast food consumption are taken into account.
erature attempts to estimate the relationship between variables To estimate our two-equation model, we use three waves of the
such as an individual’s Body Mass Index (BMI) and his average National Longitudinal Study of Adolescent Health (Add Health), that
peers’ BMI, without exploring the channels at source of this poten- is, Wave II (1996), Wave III (2001) and Wave IV (2008).5 We define
tial linkage.2 The aim of this paper is to go beyond the black box peers as the nominated group of individuals reported as friends
approach of peer effects in weight gain and try to identify one within the same school. The consumption behaviour is depicted
potentially important mechanism through which peer effects in through the reported frequency (in days) of fast food restaurant
adolescence overweight may flow: eating habits (as proxied by fast visits in the past week.
food consumption). Estimating our system of equations raises serious econometric
Three reasons justify our interest in eating habits in analyzing problems. It is well known that the identification of peer effects
the impact of peer effects on teenage weight. First of all, there is (first equation) is a challenging task. These identification issues
an important literature that points to eating habits as an important were first pointed out by Manski (1993) and discussed among oth-
component in weight gain (e.g., Niemeier et al., 2006; Rosenheck, ers by Bramoullé et al. (2009) and Blume et al. (2015). On one
2008).3 Second, one suspects that peer effects in eating habits are hand, (endogenous + contextual) peer effects must be identified
likely to be important in adolescence. Indeed, at this age, youngsters from correlated (or confounding) factors. For instance, students in a
have increased independence in general and more freedom as far same friendship group may have similar eating habits because they
as their food choices are concerned. Usually vulnerable, they often share similar characteristics (i.e., homophily) or face a common
compare themselves to their friends and may alter their choices environment (e.g., same school). On the other hand, simultane-
to conform to the behaviour of their peers. Therefore, unless we ity between an adolescent’s and his peers’ behaviour (referred
scientifically prove that obesity is a virus, it is counter intuitive to to as the reflection problem by Manski) may make it difficult to
think that one can gain weight by simply interacting with an obese identify separately the endogenous peer effect and the contextual
person.4 This is why we are inclined to think that the presence of effects.
real peer effects in weight gain can be estimated using behavioural We use a new approach based on Bramoullé et al. (2009) and
channels such as eating habits. Third, our interest in peer effects in Lee et al. (2010), and extended by Blume et al. (2015) to address
youths’ eating habits is policy driven. There has been much discus- these identification problems and to estimate the peer effects
sion on implementing tax policies to address the problem of obesity equation. First, we assume that in their fast food consumption
(e.g., Caraher and Cowburn, 2007; Powell et al., 2013). As long as decisions, adolescents interact through a friendship network. Each
peer effects in fast food consumption is a source of externality that school is assumed to form a network. School fixed effects are
may stimulate overweight among adolescents, it may be justified introduced to capture correlated factors associated with network
to introduce a consumption tax on fast food. The optimal level of invariant unobserved variables (e.g., similar preferences due to self-
this tax will depend, among other things, on the social multiplier selection in schools, same school nutrition policies, distance from
of eating habits, and on the causal effect of fast food consumption fastfood restaurants). The structure of friendship links within a
on adolescent weight. network is allowed to be stochastic but conditional on the school
In order to analyze the impact of peer effects in eating habits on fixed effects and observable individual and contextual variables,
weight gain, we propose a two-equation model. The first linear-in- is strictly exogenous. The possibility that friends select each other
means equation relates an individual’s fast food consumption to his using unobservable traits that may be correlated with their fast
individual characteristics, his reference group’s mean fast food con- food consumption decisions is an important issue and is discussed
sumption (endogenous peer effect), and his reference group’s mean later on.
characteristics (contextual peer effects). The endogenous peer effect To solve the reflection problem, we exploit results by Bramoullé
reflects the possibility that eating behaviour of his friends influ- et al. (2009) who show that if there are at least two agents who
ences a teenager’s own behaviour. For instance, one reason why an are separated by a link of distance 3 within a network (i.e., there
adolescent may want to go to a fast food restaurant is to be with are two adolescents in a school who are not friends but are linked
his friends during the lunch. Contextual effects, such as the average by two friends), both endogenous and contextual peer effects are
level of education of his friends’ mother, may also affect a teenager’s identified. Finally, we exploit the similarity between the linear-
eating habits. Thus, mothers with higher education may encourage in-means model and the spatial autoregressive (SAR) model with
not only their children but also their children’s friends to develop or without autoregressive spatial errors.6 The model is estimated
accurate eating bebaviour. using a quasi maximum likelihood (QML) approach as in Lee et al.
The second equation is a panel dynamic production function that (2010). The QML is appropriate when the estimator is derived
relates an individual’s BMI adjusted for age (BMI z-score or zBMI) to from a normal likelihood but the error terms in the model are not
his fast food consumption, his lagged zBMI and other control vari- truly normally distributed. We also estimate the model using gen-
eralized spatial two-stage least square proposed in Kelejian and
Prucha (1998) and refined in Lee (2003), which is less efficient than
1
For a complete review see Fletcher et al. (2011) who conducted a systematic
QML.
review of literature that shows that school friends are similar as far as body weight The estimation of the production function (second equation)
and weight related behaviours. also raises serious econometric issues. First, fast food consumption
2
One recent exception is Yakusheva et al. (2011) and Yakusheva et al. (2014) is likely to be an endogenous variable correlated with the individ-
who look at peer effects in overweight and in weight management behaviours such
ual error term. Moreover, the short and the long term impacts of
as eating and physical exercise, using randomly assigned pairs of roommates in
freshman year. fast food consumption on zBMI may be different, suggesting the
3
An indirect evidence of the relationship between eating habits and weight gain
come from the literature on the (negative) effect of fast food prices on adolescents’
BMI (see Auld and Powell, 2009; Powell, 2009; Powell and Bao, 2009). See also Cutler
5
et al. (2003) which relates the declining relative price of fast food and the increase Note that for the first equation we use wave II and for the second equation we
in fast food restaurant availability over time to increasing obesity in the U.S. use the three waves.
4 6
Of course, having obese peers may influence an individual’s tolerance for being Our approach is more general than the SAR model as the latter usually ignores
obese and therefore his weight management behaviours. contextual effects and spatial fixed effects.
B. Fortin, M. Yazbeck / Journal of Health Economics 42 (2015) 125–138 127
introduction of lagged zBMI as an explanatory variable. Finally, Add individual variable, the corresponding contextual variable at the
Health data waves are collected at irregular intervals. As a con- reference group level.
sequence, estimators obtained from standard dynamic panel data Using the same dataset as Trogdon et al. (2008) and Renna et al.
models are inconsistent. In order to deal with these problems, we (2008), Cohen-Cole and Fletcher (2008b) exploit panel information
use a nonlinear instrumental approach developed by Millimet and (wave II in 1996 and wave III in 2001) for adolescents for whom at
McDonough (2013). least one of same-sex friend is also observed over time. Compared
Results suggest that there is a positive but small endogenous with Christakis and Fowler’s approach, their analysis introduces
peer effect in fast food consumption among adolescents in gen- time invariant and time dependent environmental variables (at
eral. Based on our QML specification, the estimated social multiplier the school level). Friendship selection is controlled for by individ-
is 1.15. Moreover, the production function estimates indicate that ual fixed effects. The authors find that peer effects are no longer
there is a positive significant impact of fast food consumption on significant with this specification.
zBMI. Combining these results, we find that, in the long run, an extra All the studies discussed up to this point focus on peer effects
day of weekly fast food restaurant visits increases zBMI by 4.45% in weight outcomes without analyzing quantitatively the mecha-
when ignoring peer effects and by 5.11%, when they are taken into nisms by which they may occur. The general issue addressed in this
account. paper is whether the peer effects in weight gain among adolescents
The remaining parts of this paper will be laid out as follows. Sec- partly flow through the eating habits channel. This raises in turn two
tion 2 provides a survey of the literature on peer effects in obesity as basic issues: (a) are there peer effects in fast food consumption?,
well as its decomposition into the impact of peer effects on fast food and (b) is there a link between weight gain (or obesity) and fast
consumption and the impact of fast food consumption on obesity. food consumption? In this paper, we address both issues.
Section 3 presents the specification of our fast food equation with The literature on peer effects in eating habits (first issue) is
peer effects. Section 4 is devoted to our weight production function. recent and quite limited. In a recent paper, in which the formation
In Section 5, we give an overview of the Add Health Survey and we of the network is randomized, Yakusheva et al. (2011) estimate peer
provide descriptive statistics of the data we use. In Section 6, we effects in explaining weight gain among freshman girls using a simi-
discuss estimation results. Section 7 concludes. lar set up but in school dormitories. In their paper, they test whether
some of the student’s weight management behaviours (i.e., eating
habits, physical exercise, use of weight loss supplements) can be
2. Previous literature predicted by her randomly assigned roommate’s behaviours. Their
results provide evidence of the presence of negative peer effects
In recent years, a number of studies found strong “social network in weight gain. Their results also suggest positive peer effects in
effects” in weight outcomes. In a widely debated article, Christakis eating habits, exercise and use of weight loss supplements. In a
and Fowler (2007) found that an individual’s probability of becom- subsequent paper, Yakusheva et al. (2014) investigate the pres-
ing obese increased by 57% if he or she had a friend who became ence of peer effects in weight gain exploiting random assignment of
obese in a given interval.7 However, their analysis has been crit- roommates during first year of college. The authors find evidence
icized for suffering from a number of limitations (see Cohen-Cole that suggests that peer effects in weight gain are predominantly
and Fletcher, 2008b; Lyons, 2011; Shalizi and Thomas, 2011).8 In significant among females.10
particular, it ignores potential spurious correlations between two Our paper finds its basis in this literature as well as the literature
friends’ BMI resulting from the fact that they are exposed to the on peer effects and obesity discussed above. However, while works
same environment. Both Shalizi and Thomas (2011) and Lyons by Yakusheva et al. (2011) and Yakusheva et al. (2014) rely upon
(2011) show that relying on link asymmetries does not rule out experimental data, we use observational non-experimental data.
shared environment as it claims. Also, the simultaneity problem Peers are considered to have social interactions within a school
between these two outcomes is not directly addressed by allowing network. This allows for the construction of a social interaction
the peer’s obesity to be endogenous. matrix that reflects how social interaction between adolescents in
In the same spirit, Trogdon et al. (2008) investigate the presence schools occurs in a more realistic setting (as in Trogdon et al., 2008;
of peer effects in obesity using Add Health data. They include school Renna et al., 2008). An additional originality of our paper lies in the
fixed effects to account for the fact that students in a same school fact that it relies upon a linear-in-means approach when relating
share a same surrounding. The authors also estimate their BMI peer an adolescent’s behaviour to that of his peers. Also, the analogy
model with an instrumental variable approach using information between the forms of the linear-in-means model and the spatial
on friends’ parents’ obesity and health and friends’ birth weight autoregressive (SAR) model allows us to exploit the particularities
as instruments for peers’ BMI. They find that a one point increase of this latter model, in particular the natural instruments that are
in peers’ average BMI increases own BMI by 0.52 point. Based derived from its structural form.
on a similar approach and using Add Health dataset, Renna et al. Regarding the second issue, i.e., the relationship between weight
(2008) also find positive peer effects. These effects are significant (or obesity) and fast food consumption, it is an empirical ques-
for females only (=0.25 point).9 These analyses raise a number of tion that is still on the debate table.11 There is no clear evidence
concerns though. In particular, they assume no contextual variables in support of a causal link between fast food consumption and
reflecting peers’ mean characteristics. This rules out the reflec- obesity. Nevertheless, most of the literature in epidemiology
tion problem by introducing non-tested restriction exclusions. In finds evidence of a positive correlation between fast food con-
our approach, we introduce school fixed effects and, for each sumption and obesity (see, Anderson et al., 2011; Rosenheck,
2008).
7
They used a 32-year panel dataset on adults from Framingham, Massachusetts
and a logit specification.
8 10
For a response to these criticisms and others, see Fowler and Christakis (2008), In contrast, De la Haye et al. (2010) provide evidence that close adolescent male
VanderWeele (2011) and Christakis and Fowler (2013). friends tend to be similar in their consumption of high-calorie food.
9 11
Also Ali et al. (2011) and Ali et al. (2011) provide evidence that there are peer cor- The literature on the impact of physical activity on obesity is also inconclu-
relations in weight related behaviours and peer influence in weight misperception sive. For instance, Berentzen et al. (2008) provide evidence that decreased physical
respectively. activity in adults does not lead to obesity.

The economic literature tends to be conservative with respect j∈N
ylj
j∈N
xlj
i i
to this question. It focuses on the impact of “exposure” to fast where ni and ni are respectively his peers’ mean fast
food on obesity. Dunn et al. (2012), using an instrumental variable food consumed and characteristics. In the context of our paper,
approach, investigates the relationship between fast food avail- ˇ is the endogenous peer effect. It reflects how the adolescent’s
ability and obesity. They finds that an increase in the number of mean fast food
consumption of fast food is affected by his peers’
fast food restaurants has a positive effect on the BMI among non- consumption. It is standard to assume that ˇ < 1. The contex-
whites. Alviola et al. (2014), using a similar approach, provides tual peer effect is represented by the parameter ı.13 It captures
evidence that the number of fast food restaurants has a significant the impact of his peers’ mean characteristic on his fast food con-
impact on school obesity rates. Similarly, Currie et al. (2010) find sumption. It is important to note that the Gl matrix and the xl ’
evidence that proximity to fast food restaurants has a significant vector are allowed to be stochastic but are assumed strictly exoge-
effect on obesity for 9th graders. Also, Anderson and Matsa (2011), nous conditional on ˛l , that is, E(εli |xl , Gl , ˛l ) = 0. This assumption
exploiting the placement of Interstate Highways in rural areas to is flexible enough to allow for correlation between the network’s
obtain exogenous variations in the effective price of restaurants, unobserved common characteristics (e.g., school’s cafeteria quality)
did not find any causal link between restaurant consumption and and observed characteristics (e.g., mother’s education). Neverthe-
obesity. More generally, Cutler et al. (2003) and Bleich et al. (2008) less, once we condition on these common characteristics, mother’s
argue that the increased calorie intake (i.e., eating habits) plays a education is assumed to be independent of the idiosyncratic error
major role in explaining current obesity rates. Importantly, weight terms. Let Il be the identity matrix for a network l and l the corre-
prior to adulthood sets the stage for weight in adulthood. While sponding vector of ones, Eq. (1) for network l can be rewritten in
most of the economics literature analyses the relationship between matrix notation as follows:
adolescents’ fast food consumption and their weight using an indi-
rect approach (i.e, effect to fast food exposure), we adopt a direct yl = ˛l l + ˇGl yl + xl + ıGl xl + εl , forl = 1, ..., L. (2)
approach linking weight as a function of fast food consumption,
lagged weight and control variables. Note that Eq. (2) is similar to a SAR model (e.g., Cliff and Ord, 1981)
In the next two sections, we present our two-equation model of generalized to allow for contextual and fixed effects (hereinafter
weight production function with peer effects in fast food consump- referred to as the GSAR model). Since ˇ < 1, (Il − ˇGl ) is invertible.
tion. We first propose a linear-in-means social interaction equation Therefore, in matrix notation, the reduced form of Eq. (2) can be
of fast food consumption (first equation) and discuss the economet- written as:
ric methods we use to estimate it. We then present our econometric −1 −1
weight production function which relates the adolescent’s zBMI yl = ˛l /(1 − ˇ)l + (Il − ˇGl ) (Il + ıGl )xl + (Il − ˇGl ) εl , (3)
level to his fast food consumption (second equation). −1 ∞
where we use the result that (I−ˇGl ) = k=0
ˇk Gkl , so that
the vector of intercepts is ˛l /(1 − ˇ)l , assuming no isolated
adolescents.14
3. Social interactions equation of fast food consumption Eq. (3) allows us to evaluate the impact of a marginal shock
in ˛l (i.e., a common exogenous change in fast food consumption
We assume a set of N adolescents i that are partitioned in a set within the network) on an adolescent i’s fast food consumption,
of L networks. A network is defined as a structure (e.g., school) in when the endogenous peer effect is taken into account. One has
which adolescents are potentially tied by a friendship link. Each ∂(E(yli | ·)/∂˛l = 1/(1 − ˇ). This expression is defined as the social
adolescent i in his network has a set of nominated friends Ni of size multiplier in our model. When ˇ > 0 (strategic complementarity in
ni that constitute his reference group (or peers). We assume that i is fast food consumption), the social multiplier is larger than 1. In this
excluded from his reference group. Since peers are defined as nom- case, the impact of the shock is amplified by social interactions as
inated friends, the number of peers will not be the same for every more fast food consumption by his peers induces an adolescent to
network member. Let Gl (l = 1, . . ., L) be the social interaction matrix adopt a similar behaviour.
for a network l. Its element glij takes a value of 1/ni when i is friend We then perform a panel-like within transformation. More pre-
with j, and zero otherwise. Therefore, assuming no isolated individ- cisely, we average Eq. (3) over all students in network l and subtract
uals, the Gl matrix is row normalized. We define yli as the fast food it from i’s equation. This transformation allows us to address prob-
consumed by adolescent i in network l, xli represents the adolescent lems that arise from the fact that adolescents are sharing the
i’s observable characteristics, yl the vector of fast food consump- same environment or preferences. Let Kl = Il − Hl be the matrix that
tion in network l, and xl is the corresponding vector for individual obtains the deviation from network l mean with Hl = n1 (l l ). The
l
characteristics. To simplify our presentation, we look at only one
network within transformation will eliminate the network fixed
characteristic (e.g., adolescent’s mother education).12 The network
effect ˛l . Pre-multiplying (3) by Kl yields the reduced form of the
invariant unobservable variables are captured through fixed net-
model for network l, in deviation:
work effects (the ˛l ’s). They take into account unobserved factors
such as preferences of school, school nutrition policies, or pres- −1 −1
Kl yl = Kl (Il − ˇGl ) (Il + ıGl )xl + Kl (Il − ˇGl ) εl . (4)
ence of fast food restaurants around the school. The εli ’s are the
idiosyncratic error terms. They capture i’s unobservable character-
istics that are not invariant within the network. Formally, one can 3.1. Identification
write the linear-in-means equation for adolescent i as follows:
Our peer effects structural Eq. (2) raises two basic identification
problems.
y
j∈Ni lj
x
j∈Ni lj
yli = ˛l + ˇ + xli + ı + εli , (1)
ni ni
13
It is standard to assume the presence of a contextual effect for each individual
characteristic influencing the outcome. Otherwise, the model may impose ad hoc
exclusion restrictions which generate invalid instruments and inconsistent estima-
12
Later on, in Section 3.1.1, we will generalize the equation to account for many tors.
characteristics. 14
When an adolescent is isolated, his intercept is ˛l .
3.1.1. Simultaneity (e.g., impulsivity, a specific taste for sugar- and fat-rich food).
Simultaneity between individual and peer behaviour (the reflec- Recently, some researchers (e.g., Hsieh and Lee, 2011; Goldsmith-
tion problem) may prevent separating contextual effects from Pinkham and Imbens, 2013; Liu et al., 2013; Badev, 2013) have
endogenous effects. This problem has been analyzed by Bramoullé made attempts to develop econometric models allowing for the
et al. (2009) when individuals interact through social networks. joint estimation of network formation and network interactions.
They show that the conditions of identification depend on both the However, empirical results using Add Health data and focusing on
values of parameters and the structure of the network. More explic- outcomes such as smoking, sleeping behaviour, and scholar perfor-
itly, let us first assume throughout that ˇ + ı = / 0. Then define G mance, do not seem to detect much difference in peer effects when
the block-diagonal matrix with the Gl ’s on its diagonal. Assume first networks are assumed exogenous and when they are allowed to be
the absence of fixed network effects (i.e., ˛l = ˛ for all l). In this case, endogenous.17
Bramoullé et al. (2009) show that the structural parameters of Eq. One specification of our peer effects equation also allows the
(2) are identified if the matrices I, G, G2 are linearly independent. error terms to be (first-order) autocorrelated within networks.
This condition is satisfied when there are at least two adolescents Therefore its structure becomes analogous to that of a general-
who are separated by a link of distance 2 within a network. This ized spatial autoregressive model with network autoregressive
means that they are not friends but have a common friend.15 The disturbances (denoted as the GSARAR model). This model implies
intuition is that this provides exclusion restrictions in the model. that in addition to the endogenous and contextual effects,
More precisely, the friends’ friends mean characteristic can serve as some unobserved characteristics of the friends are also inter-
instrument for the mean friends’ fast food consumption. Of course, dependent. In this case, the error terms in (2) can be written
when fixed network effects are allowed, the identification condi- as:
tions are more restrictive. Bramoullé et al. (2009) show that, in this
εl = Gl εl + l , (5)
case, the structural parameters are identified if the matrices I, G, G2
and G3 are linearly independent. This condition is satisfied when at where the innovations, l , are assumed to be i.i.d. (0, 2 Il ) and || < 1.
least two adolescents are separated by a link of distance 3 within a Given these assumptions, we can write:
network, i.e., we can find two adolescents who are not friends but
3 > 0 while g 2 = g = 0. εl = (Il − Gl )−1 l . (6)
are linked by two friends. In this case, glij ij ij
Hence, no linear relation of the form G3 = 0 I + 1 G + 2 G2 can exist. Allowing for many characteristics and performing a Cochrane-
This condition holds in most friendship networks and, in particular, Orcutt-like transformation on the structural equation (4) in
in the data we use.16 deviation, the latter is given by the following structural form:
Kl Ml yl = ˇKl Ml Gl yl + Kl Ml Xl + Kl Ml Gl Xl ı + l , (7)
3.1.2. Correlated effects
where Xl is the matrix of adolescents’ characteristics18
in the lth
The presence of confounding unobservable variables affect-
network, Ml = (I − Gl ) and l = Kl Ml l .
ing fast food consumption and correlated with the explanatory
Following Lee et al. (2010), we propose two approaches to esti-
variables raises difficult identification problems. First, since ado-
mate the peer effects equation (7): a quasi maximum likelihood
lescents are not randomly assigned into schools, endogenous
approach (QML) and a generalized spatial two stage least squares
self-selection through networks may be the source of potentially
(GS-2SLS) approach. The QML estimators are estimated assum-
serious biases in estimating (endogenous + contextual) peer effects.
ing that the disturbances are normally distributed. However, we
Indeed, if the variables that drive this process of selection are
do allow the log-likelihood function to be partially misspecified,
not fully observable, correlations between unobserved network-
as standard errors are computed to be robust to non-normal dis-
specific factors and the regressors are potentially important sources
turbances (using a sandwich formula). Assuming that the error
of bias. In our approach, we assume that network fixed effects cap-
terms are i.i.d. and under a number of regularity assumptions (see
ture these factors. This is consistent with two-step models of link
Lee et al., 2010, p. 152), QML estimators are consistent but not
formation. Each adolescent joins a school in a first step, and forms
asymptotically efficient. On the other hand, GS-2SLS estimators also
friendship links with others in his school in a second step. In the
assume that the error terms are i.i.d. but impose less regularity con-
first step, adolescents self-select into different schools with selec-
ditions than QML estimators. QML estimators are asymptotically
tion bias due to specific school characteristics. In a second step,
more efficient than GS-2SLS estimators.19
link formation takes place within schools randomly or based on
observable individual characteristics only. Recall also that network
4. Weight production function
fixed effects take into account common unobservable variables at
the school level that may influence fast food consumption (e.g.,
In this section, we propose a dynamic (AR(1)) weight production
availability of fast food restaurants).
function that relates an individual’s zBMI in period t (assumed a
Of course, one limitation of using network fixed effects is that
year) to his lagged zBMI, his fast food consumption and his own
it ignores the possibility that the links formation within a network
characteristics in period t. Let yitb be an individual i’s zBMI level in
depend on omitted variables. The matrix G may be endogenous f
even when controlling for the network fixed effects and observable period t, and yit be the individual’s fast food consumption in period
characteristics. Thus friends may select each other using unob-
servable traits that may be correlated with fast food consumption
17
One reason may be that this data base includes a very large number of observable
characteristics, some of them being used in the regressions. Another explanation
is that, although statistically significant, the explanatory power of the individual
15
More generally, Eq. (2) is identified when individuals do not interact in groups characteristics on the probability that two individuals are friends is extremely small
or interact in groups with at least three different sizes (see Bramoullé et al., 2009). (Boucher, 2014).
16 18
Identification fails, however, for a number of non trivial networks. This is notably Following the linear-in-means model, we allow the peers’ mean characteristic
the case for complete bipartite networks. In these graphs, the population of students corresponding to each individual’s characteristic to have a potential effect on his
is divided in two groups such that all students in one group are friends with all fast food consumption. Therefore, we do not impose ad hoc (identifying) exclusion
students in the other group, and there is no friendship links within groups. These restrictions to the structural peer effects equation.
19
include star networks, where one student, at the centre, is friend with all other The derivation of the QML and GS-2SLS estimators are presented in the
students, who are all friends only with him. Appendix.
t. Then, for a given vector of characteristics x̃it , the data generating and serially uncorrelated. Also, while the estimators are inconsis-
process (DGP) of the weight production function can be formally tent when T is fixed and N→ ∞, Monte–Carlo simulations by MM
expressed as follows (for notational simplicity we suppress l): suggest that this approach has superior small sample properties
compared to other dynamic panel data estimators.
f
yitb = 1 yi,t−1
b
+ 2 yit + 3 x̃it + i +
it , (8) Second, some covariates (in particular, the individual’s fast food
consumption, yf ) are likely to be correlated with the unobserved
where 1 is the autoregressive parameter (|1 | < 1), i is the indi-
effect and/or to be serially correlated. In the first case, Everaert
vidual i’s time-invariant error component (fixed effect) and
it , his
(2013) suggests to use Hausman and Taylor (1981) type instru-
idiosyncratic error that may change across t. One difficult problem
ments for these covariates, that is, deviations from individual
with (8) is that the Add Health data set, the waves are irregu-
sample means (e.g., ÿf ). Also, in the presence of serially correlated
larly spaced. This means that the successive periods of observed
covariates, one solution suggested by MM is to impute data for the
data (that is, for 1996, 2001 and 2008) do not conform to suc-
missing periods. For instance, we can use current value of covariates
cessive (yearly) periods as defined by our underlying DGP. In that
to approximate missing covariates between periods m and m − 1.21
case, standard methods to estimate a dynamic panel model with
Therefore, in Eq. (10), we can write:
endogenous variables (e.g., Anderson and Hsiao, 1981; Arellano
and Bond, 1991) yield inconsistent estimators. To address this
gm −1
f j f
g
1 − 1m
point, we follow Millimet and McDonough (2013) (hereinafter MM) (2 yi,t(m)−j + 3 x̃i,t(m)−j )1 ≈ (2 yi,m + 3 x̃i,m ) .
1 − 1
approach. From repeated substitution in Eq. (8) we rewrite Eq. (8) j=1
defined over the observed periods m = 1, 2, 3, one obtains: (11)
b f
yim = 1gm yi,m−1
b
+ 2 yim + 3 x̃im + i +
˜ im , (9)
In this paper, we estimate the weight production function given
where gm is the gap size or the number of years between observed by Eqs. (9)–(11) using a nonlinear instrumental approach and based
period m and m − 1, (which, in our case, are equal to g1 = 1, g2 = 5, on current values of covariates to approximate missing data for the
g3 = 7)20 ; = (1 − 1gm )/(1 − 1 ), and missing periods. Following MM, we denote this estimator: E-NLS-
IV-C. We also present a GMM version of this estimator using a two-

gm −1
f j

gm −1
j step approach to obtain an optimal weighting matrix (clustered at
˜ im = (2 yi,t(m)−j + 3 x̃i,t(m)−j )1 + 1

i,t(m)−j , (10)
the individual level).
j=1 j=0
As discussed earlier, our interest in this production function goes
where t(m) is the actual period reflected by the observed period m: beyond a mere association between fast food consumption and
t(1) = 1 ; t(2) = 6 ; t(3) = 13. weight. We are particularly interested to analyze the magnitude of
Eq. (9) shows that when data are irregularly spaced, (1) the coef- a change in zBMI resulting from a common exogenous shock on fast
ficient on the lagged dependent variable is not constant but equal to food consumption within the network, when peer effects are taken
1gm ; (2) the error term
˜ im contains the covariates and the idiosyn- into account. Our two equation model allows us to compute this
f
cratic errors from the missing periods between m and m − 1, and the result. Partially differentiating (8) with respect to yi,t−1 and using
current error; (3) the unobserved fixed effect has a period-specific the social multiplier [= 1/(1 − ˇ)] yields the magnitude of a short
b
run change in zBMI (i.e., for yi,t−1
factor loading, . The first point raises the following difficulty: the given) resulting from a common
equation is now nonlinear in 1 , which suggests the use of a nonlin- 2
marginal shock on fast food consumption: ∂E(yitb | · )/∂˛l = 1−ˇ
. This
ear in parameters approach. More importantly, unequally spaced expression entails two components: the impact of the fast food con-
data relegate missing covariates into the error term (point 2). This 1
sumption on zBMI (= 2 ) and the multiplier effect (= 1−ˇ ). In the
is a serious source of concerns as long as some contemporary
long run, at the new stationary state, the impact of the shock on
covariates are serially correlated and therefore become mechan- 2
zBMI is given by (1−ˇ)(1− .
ically endogenous. Finally, one cannot eliminate the fixed effect i ) 1
using standard first-differencing or mean-differencing transforma-

tions since the factor loading parameter varies from one observed 5. Data and descriptive statistics
period to another (point 3).
To estimate such an equation, MM suggests the use of a non- The Add Health survey is a longitudinal study that is nationally
linear instrumental approach extending Everaert (2013) technique representative of American adolescents in grades 7 through 12. It is
for estimating dynamic panel data models. It consists first in instru- one of the most comprehensive health surveys that contains fairly
menting the lagged zBMI (= yi,m−1 b b
) with the OLS residual of yi,m−1 exhaustive social, economic, psychological and physical well-being
b b m−1 variables along with contextual data on the family, neighbourhood,
regressed on its backward mean yi,m−1 , where yi,m−1 = m 1
yb .
s=0 i,s community, school, friendships, peer groups, romantic relation-
The intuition here is that the residual (which reflects the part of
ships, etc. In wave I (September 1994 to April 1995), all students
the lagged zBMI not explained by its backward mean) is likely
(around 90 000) attending the randomly selected high schools
to be highly correlated with the lagged zBMI. However, it should
were asked to answer a short questionnaire. An in-home sample
be uncorrelated with the fixed effect reflecting the time-invariant
(core sample) of approximately 20 000 students was then randomly
unobserved part of the individual’s zBMI. Also it should be uncorre-
drawn from each school. These adolescents were asked to partic-
lated with the contemporary idiosyncratic error term as long as the
ipate in a more extensive questionnaire where detailed questions
latter is i.i.d. (0, 2 ). Therefore, the residual is a good candidate as
were asked. Information on (but not limited to) health, nutrition,
an instrument for the lagged zBMI. More explicitly, MM shows that
expectations, parents’ health, parent-adolescent relationship and
a nonlinear IV version using of Everaert (2013) technique (referred
friends nomination was gathered.22 This cohort was then followed
to as E-NLS-IV) to account for irregular spacing yields consistent
estimators, when T→ ∞ and the covariates are strictly exogenous
21
No approximation is needed for variables such age, for which we have perfect
information at each period.
20 22
One has g(1) = 1 since Wave I from Add Health data (which corresponds to m = 0) Adolescents were asked to nominate up to five female friends and five male
was collected in 1995. friends.
in-home in the subsequent waves in 1996 (wave II), 2001 (wave Table 1
Descriptive statistics.
III) and 2008–2009 (wave IV). The extensive questionnaire was
also used to construct the saturation sample that focuses on 16 Variable Mean SD
selected schools (about 3000 students). Every student attending Fast food consumptiona 2.33 1.74
these selected schools answered the detailed questionnaire. There
Female 0.50 0.50
are two large schools and 14 other small schools. All schools are
Age 16.36 1.44
racially mixed and are located in major metropolitan areas except
one large school that has a high concentration of white adolescents White 0.57 0.49
Black 0.15 0.34
and is located in a rural area. Consequently, fast food consump- Asian 0.01 0.09
tion may be subject to downward bias if one accepts the argument Native 0.13 0.33
that the fast food consumption among white adolescents is usually Other 0.14 0.35
lower than that of black adolescents. Mother present 0.85 0.35
In this paper we use the saturation sample of wave II in-home
Mother education
survey to investigate the presence of peer effects in fast food
No high school degree 0.15 0.35
consumption.23 One of the innovative aspects of this wave is the High school/GED/Vocational Instead of high school 0.36 0.48
introduction of the nutrition section. It reports among other things Some college/vocational after high school 0.21 0.39
food consumption variables (e.g., fast food, soft drinks, desserts, College 0.18 0.38
Advanced degree 0.06 0.24
etc.). This allows us to depict food consumption patterns of each
Don’t know 0.04 0.20
adolescent and relate it to that of his peer group. In addition, the
availability of friend nomination allows us to retrace school friends Father education
No high school degree 0.16 0.36
and thus construct friendship networks. To estimate the weight
High school/GED/vocational instead of high school 0.33 0.47
production function, we considered information from wave I, wave Some college/vocational after high school 0.17 0.37
II, wave III and wave IV. College 0.18 0.38
We exploit friends nominations to construct the network of Advanced degree 0.08 0.26
Don’t know 0.06 0.24
friends. Thus, we consider all nominated friends as network
Missing 0.02 0.16
members regardless of the reciprocity of the nomination. If an
adolescent nominates a friend then a link is assigned between Grade 7–8 0.11 0.32
Grade 9–10 0.27 0.44
these two adolescents (directed network with non symmetric
Grade 11–12 0.62 0.48
links).
Allowance per week 8.28 11.65
5.1. Descriptive statistics Observations 2355

a
Frequency (in days) of fast food restaurant visits in the past week.
In our social interactions equation, the dependent variable of
interest is fast food consumption, as approximated by the reported
frequency (in days) of fast food restaurant visits in the past 7 days.
Table 1 reports respectively the mean and the standard deviation
of the endogenous variable, the covariates used and other relevant the fast food variables for wave II, III and IV are detailed in
characteristics. We note that on average, adolescents’ fast food con- Table 4.25
sumption is within the range of 2.33 times/week. This is consistent
with the frequency reported by the Economic Research Service of
the United States Department of Agriculture. Around 62% of the
5.2. The construction of the graph matrix
adolescents consumed fast food twice or more in the past week
and 44% of the adolescents who had consumed fast food did so 3
We construct a sub-matrix of graph for each school separately
times in the past week.
(matrice Gl ) and then we include all these sub-matrices in the block-
The covariates of the fast food peer effect equation include
diagonal matrix G.26 As we have no prior information about how
the adolescent’s personal characteristics, family characteristics
social interaction takes place, we assume, as in most studies, that
as well as the corresponding contextual social effects. The per-
an adolescent is equally influenced by his nominated friends. In
sonal characteristics are gender, age, ethnicity (white or other)
each school we eliminate adolescents for which we have missing
and grade. We observe that 50% of the sample are females,
values. As mentioned earlier, Bramoullé et al. (2009) show that
that the mean age is 16.3 years and that 57% are white. Fam-
the structural parameters are identified if the matrices I, G, G2
ily characteristics are dummies for mother and father education.
and G3 are linearly independent. This condition is verified with
We observe that around 45% of mothers and fathers have at
our data. We also compute the Belsley, Kuh, and Welsch condi-
least some college education. To control further for parents’
tion index to check for the presence of collinearity between these
income we use child allowance as a proxy. An adolescent’s
matrices. If this index is below 30, then collinearity is said not to
allowance is on average 8.28$ per week, around 50% of the
be a problem and linear independence of the four matrices is veri-
adolescents in our sample have a weekly allowance. At this
fied. In our data, the reflection problem is clearly solved since I, G,
point, it is important to highlight that since we use cross sec-
G2 and G3 are linearly independent and the condition index value
tion data, we do not have to control for fast food prices as
is 2.21.
they are taken into account by network fixed effects. As for the
weight production function, the dependent variable that we use
is zBMI in waves III and IV.24 The zBMI variables for each wave,
25
It is important to note that information on fast food consumption was not col-
lected in wave I.
23 26
It includes all meals that are consumed at a fast food restaurant such as McDon- Following the previous literature and given the lack of information on this mat-
ald’s, Burger King, Pizza Hut, Taco Bell and other fast food outlets. ter, we assume that there can be social interactions within each school but no
24
To compute the backward mean we used all four waves. interactions across schools.
6. Results While this result suggests that one has to be quite cautious when
accepting the estimate, it can be argued that one should perform
6.1. Baseline: OLS peer effects estimates a one-tail test since one expects the endogenous peer effect to be
either positive or zero. In that case, the social multiplier associated
We first estimate a naive OLS of the peer effects equation where with an exogenous increase in an adolescent fast food consump-
1
we regress the fast food consumption of an adolescent on the tion is 1.15 (= 1−0.129 ) and is significantly different from 1 at the
average fast food consumption of his peers, his individual char- 10% level, based on a one-tail test (its standard error is 0.096 using
acteristics as well as the average characteristics of his peers. We the delta method, with a one-tail p-value of 0.059). This reflects a
then apply a panel-like within transformation to account for corre- relatively low endogenous peer effect.
lated effects (OLS w ). It is clear that the estimates of naive OLS and How can we compare these results to those obtained previ-
OLS w are inconsistent. The former ignores both correlated effects ously in the related literature? Although there are few studies that
and simultaneity problems while the latter ignores simultaneity investigated the presence of peer effects in fast food consumption
problems. However, they are reported to provide a baseline for this using the linear-in-means equation, a richer body of literature has
study. investigated a tangent issue: obesity. As compared with endoge-
Estimation results reported in Table 2 show that there is a posi- nous effects obtained in the literature on obesity, our peer effect is
tive significant peer influence in fast food consumption. According intermediate between studies that obtain no peer effects (Cohen-
to the naive OLS estimates, an adolescent would increase his weekly Cole and Fletcher, 2008b) and the literature that provides evidence
frequency (in days) of fast food restaurant visits by 0.21 in response that there are peer effects are strong, for instance, generating a
to an extra day of fast food restaurant visits by his friends. On social multiplier larger than 1.5 (e.g., Christakis and Fowler, 2007;
average, this corresponds to an increase of 9% (=0.21/2.33). OLS w Trogdon et al., 2008).29
estimate is slightly lower (=0.15, or 6.6%). This reduction in the To check the sensitivity of these results to the presence of SAR
estimated effect may partly be explained by the fact that adoles- disturbances, we also estimate our model using a GSARAR QML
cents in the same reference group tend to choose a similar level specification. The estimated spatial autocorrelation coefficient is
of fast food consumption partly because they face a common envi- negative but not significant at the 5% level. Moreover the endoge-
ronment or because adolescents with similar characteristics tend nous peer effect is large (=0.3655) but no longer significant even at
to attend the same school (homophily). As for the individual char- the 10% level (one-tail test). Also a likelihood test does not reject
acteristics, age, father education and weekly allowance positively the GSAR QML specification. Therefore we consider the latter as
affect fast food consumption. Turning our attention to the contex- our preferred one. This suggests a much lower endogenous peer
tual peer effects, we notice that the latter variable decreases with effect (=0.13), which can be interpreted as a lower bound to this
mean peers’ mother’s education and increases with mean peers’ parameter, at least when assuming that selection on unobserv-
father’s education. The former result indicates that friends’ mother ables is not an important source of biases, after controlling for
education negatively affects an adolescent’s fast food consump- network fixed effects and observable characteristics (see our dis-
tion. cussion above).
To sum up, we can say that results in general are consistent with
6.2. GS-2SLS and QML peer effects estimates the hypothesis that fast food consumption is linked to issues of
interactions with friends. However, our social multiplier estimate
Next, we estimate our peer effects equation with school fixed does not appear to be very strong (as the endogenous effects are less
effects using GS-2SLS (with i.i.d. error terms and without impos- than 0.3), at least when we consider a specification which seems
ing autoregressive disturbances: = 0). We then estimate this reasonable. This result, despite its small magnitude, addresses the
equation using a QML approach with i.i.d. error terms. Also, we esti- puzzle around the behavioural channels through which peer effects
mate another plausible version of this model by allowing network in weight gain flows. Indeed, while Yakusheva et al. (2014) in their
autoregressive disturbances (GSARAR model). attempt to uncover the channels through which these effects flow
Estimation results are displayed in Table 3. The GS-2SLS have tested for two behavioural channels exercise and eating dis-
approach (see last two columns) assumes that the instrument for orders (e.g., anorexia), they could not test for the presence of peer
∗ effects in eating habits due to data limitations.
G∗ y∗ is given by G∗ ŷ (see Eq. 14).27 One can check whether this
∗ As for estimated individual effects and focusing on the GSAR
instrument is weak by regressing G∗ y∗ on G∗ ŷ , X∗ and G∗ X∗ and
performing a Stock-Yogo test (see Table 3). It consists in comparing QML specification, they follow fairly the baseline model. Fast food
the Cragg-Donald F statistic associated with the estimated coeffi- consumption is positively associated with age and father’s edu-
∗ cation as well as positively associated with weekly allowance.
cient of G∗ ŷ (= 17.80) with its critical value when one assumes a
10% tolerance28 for the size distortion of the 5% Wald test (=16.38). Mother’s education seems to have a negative but non-significant
Based on this test, we reject that the instrument is weak. The impact on fast food consumption. It is important to note that while
endogenous effect resulting from GS-2SLS estimation is positive the general perception is that fast food is an inferior good, the
(=0.11 or 4.73%) but non significant. When using the GSAR QML empirical evidence suggests a positive income elasticity (Aguiar
approach, estimation results show a positive endogenous effect and Hurst, 2005). The positive relation between fast food consump-
of 0.129 (or 5.3%). This estimate reveals to be very close to the tion and allowance is thus in line with the positive relation between
one obtained by GS-2SLS and is slightly smaller than the OLS ones income and fast food consumption.
obtained in the previous sub-section. It is statistically significant at One advantage of our spatial approach is that it allows to iden-
the 5% level if we perform a one tail test (one-tail p value = 0.039) tify both endogenous and contextual peer effects. Turning our
but at 10% if we consider a two-tail test (two tail p value = 0.0785). attention to the latter, we note in particular that an adolescent’s
fast food consumption decreases with peers’ mother’s education
27
The star superscript indicates that the original variable has been transformed
29
to eliminate the problem of singular variance matrix generated by the use of the More specifically, Cohen-Cole and Fletcher (2008b) finds a statistically insignif-
within transformation to eliminate fixed network effects. See Appendix. icant social multiplier of 1.03, Christakis and Fowler (2007) find a statistically
28
This level of tolerance is the smallest one that can be computed given that there significant social multiplier of 2.63 and Trogdon et al. (2008) find a statistically
is only one excluded instrument. significant social multiplier of 2.08.
Table 2
Peer effects in fast food consumption.
OLS OLSw
Coef. S.E. Coef. S.E.
Endogenous peer effects 0.2078 *** 0.0331 0.1548 *** 0.0344

Individual characteristics
Female −0.0721 0.0787 −0.0847 0.0789
Age 0.1559 *** 0.0434 0.1315 *** 0.0461
White −0.1076 0.0940 −0.0602 0.1127
Mother present −0.0152 0.0997 −0.0358 0.0989
Mother no high school (omitted)

Mother high school −0.0848 0.1195 −0.0455 0.1202
Mother some college −0.0377 0.1335 −0.0210 0.1340
Mother college 0.0214 0.1421 −0.0137 0.1425
Mother advanced −0.0259 0.1875 −0.0353 0.1877
Mother don’t know −0.1714 0.2067 −0.2124 0.2059
Father no high school (omitted)

Father high school 0.2743 ** 0.2067 0.2682 ** 0.1167
Father some college 0.2117 0.2067 0.1971 0.1338
Father college 0.3115 ** 0.1375 0.2592 * 0.1381
Father advanced 0.1732 0.1752 0.1294 0.1760
Father don’t know 0.2778 0.1756 0.2393 0.1750
Father missing 0.0908 0.2338 0.0477 0.2331
Grade 7–8 (Omitted)

Grade 9–10 0.0883 0.1931 −0.0776 0.2183
Grade 11–12 0.3164 0.2265 0.1269 0.2526
Allowance per week 0.0093 *** 0.0031 0.0074 ** 0.0031
Contextual peer effects

Female −0.0898 0.1245 −0.1071 0.1285
Age −0.0321 0.0215 0.0316 0.0718
White 0.0111 0.1244 −0.0055 0.1694
Mother present 0.0773 0.1668 0.1008 0.1707

Mother high school −0.3878 ** 0.1868 −0.2977 0.1913
Mother some college −0.3947 * 0.2127 −0.3825 * 0.2168
Mother college −0.2531 0.2180 −0.2935 0.2213
Mother advanced −0.7011 ** 0.3089 −0.5954 * 0.3112
Mother don’t know −0.4337 0.3598 −0.4150 0.3610

Father high school 0.2060 0.1943 0.2999 0.1914
Father some college 0.3639 * 0.2128 0.3890 * 0.2139
Father college 0.2850 0.2238 0.3068 0.2263
Father advanced 0.2760 0.2891 0.2171 0.2953
Father don’t know 0.4737 0.2995 0.5358 * 0.3001
Father missing 0.6931 0.4619 0.7692 * 0.4640
Grade 7–8 (Omitted)

Grade 9–10 −0.0769 0.2383 0.0104 0.2773
Grade 11–12 −0.0094 0.2630 −0.0396 0.3388
Allowance per week 0.0056 ** 0.0053 0.0043 0.0054

Constant −0.5199 0.6618
N = 2339
*
Significant at 10% level.
**
***
but increases with mean peers’ father’s education. While the for- Nevertheless, we still need to provide evidence of the presence of
mer causal effect seems natural as mothers with higher education a relationship between fast food consumption and weight gain. In
may (directly and indirectly) encourage both their children and this section we report estimates of the weight production function
their friends to have better eating habits, the latter effect is rather presented earlier.
puzzling. One partial explanation is that fathers with higher edu- Results from the estimation of the production function are
cation are more likely to be absent from home. Therefore they reported in Table 5. Specification (1) shows baseline OLS estimates
have less positive influence on their children’s and friends’ eating of equation (8) (where t is replaced by m), specification (2) shows
habits. the NLS estimates of Eq. (9), specification (3) shows E-NL-IV-C esti-
mation results for Eq. (9) and finally specification (4) shows GMM
6.3. Weight production function estimates version the previous estimator using a two step with an optimal
weighting matrix. All specifications are estimated using wave 3 and
Estimation results presented in the earlier sections are consis- wave 4, but where information from wave 1 and wave 2 are used
tent with the presence of peer effects in fast food consumption. to construct the instruments.
Table 3
Peer effects in fast food consumption GSAR, GSARAR and GS-2SLS.
Quasi maximum likelihood GS-2SLS
GSAR S.E. GSARAR S.E. GSAR S.E.

†
Endogenous peer effects 0.1293 * 0.0729 0.3511 3.3869 0.1102 0.3945
−0.2342 3.6954
Individual characteristics
Female −0.0787 0.0589 −0.0861 0.0832 −0.0839 0.0780
Age 0.1401 *** 0.0479 0.1477 0.1386 0.1346 ** 0.0531
White −0.0622 0.0795 −0.0582 0.2083 −0.0619 0.1169
Mother present −0.0319 0.0760 −0.0278 0.1097 −0.0375 0.0973

Mother high school −0.0333 0.0510 −0.0549 0.3647 −0.0437 0.1212
Mother some college −0.0106 0.1156 −0.0258 0.2193 −0.0161 0.1401
Mother college 0.0040 0.1043 −0.0025 0.1384 −0.0143 0.1449
Mother advanced −0.0159 0.1737 −0.0555 0.5745 −0.0366 0.1873
Mother don’t know −0.2193 *** 0.0813 −0.2327 ** 0.1101 −0.2138 0.2098

Father high school 0.2774 ** 0.1165 0.2680 0.4137 0.2689 ** 0.1179
Father some college 0.2027 ** 0.0889 0.1923 0.4408 0.1957 0.1370
Father college 0.2775 *** 0.0837 0.2800* 0.1540 0.2577 * 0.1364
Father advanced 0.1336 0.1821 0.1418 0.1918 0.1275 0.1785
Father don’t know 0.2512 *** 0.0917 0.2519 0.2648 0.2420 0.1742
Father missing 0.0548 0.1302 0.0721 0.1264 0.0515 0.2308
Grade 7–8 (omitted)

Grade 9–10 −0.1499 0.2569 −0.1444 0.2677 −0.0789 0.1957
Grade 11–12 0.0225 0.2706 0.0329 0.3268 0.1249 0.2300
Allowance per week 0.0076 *** 0.0016 0.0076 *** 0.0016 0.0075 ** 0.0032
Contextual peer effects

Female −0.1563 0.1049 −0.1721 0.1178 −0.1108 0.1323
Age −0.0379 ** 0.0152 0.0158 0.5929 0.0359 0.0813
White 0.0079 0.1064 0.0246 0.4373 −0.0159 0.1865
Mother present 0.0651 0.2354 0.0955 0.6757 0.1078 0.1794

Mother high school −0.3181 ** 0.1327 −0.4149 0.6549 −0.3001 0.1858
Mother some college −0.4254 *** 0.1534 −0.4448 0.2741 −0.3882 * 0.2213
Mother college −0.3443 0.2255 −0.3462 0.3121 −0.3080 0.2496
Mother advanced −0.6565 ** 0.3032 −0.6871 0.9187 −0.5775 * 0.3407
Mother don’t know −0.4664 * 0.2508 −0.5488 * 0.2901 −0.4038 0.3604

Father high school 0.3197 *** 0.1211 0.3074 0.4624 0.3290 0.3128
Father some college 0.3890 *** 0.1161 0.3476 0.5849 0.4051 * 0.2440
Father college 0.3214 0.2128 0.3354 0.2238 0.3298 0.3035
Father advanced 0.1765 0.2179 0.2330 0.5672 0.2341 0.3299
Father don’t know 0.5544 *** 0.1172 0.5478 0.7167 0.5684 0.3985
Father missing 0.7770 ** 0.3058 0.7783 0.5355 0.7769 * 0.4461
Grade 7–8 (omitted)

Grade 9–10 0.1998 0.2873 0.1900 0.4396 0.0058 0.2520
Grade 11–12 0.3272 0.2311 0.384 0.3932 −0.0341 0.3195
Allowance per week 0.0025 0.0033 0.0047 0.0237 0.0048 0.0072
Log Likelihood −4488.847 −4487.55

N = 2339
Stock-Yogo test 17.80
Critical value (r = 0.10)
at sign. level of 0.05% 16.38
†
One tail test: Significant at 5% level.
*
Two tail test: Significant at 10% level.
**
Two tail test: Significant at 5%.
***
Two tail test: Significant at 1%.
In line with our expectations, the general results indicate that used the same data set. The impact of lagged zBMI is 0.7591 (com-
lagged zBMI and current fast food consumption have positive sig- pared to 0.7600 for Niemeier et al. (2006)). As for the impact of fast
nificant effect (which is between 0 and 1, in the case of the lagged food consumption, it is 0.014 (compared to 0.020 for Niemeier et al.
zBMI) on current zBMI. These results seem to be robust across dif- (2006)). 30
ferent specifications with some differences that can be explained
by the differences in the assumptions made on the DGP. More
specifically results in specification (1), our baseline specification, 30
It is important to note that Niemeier et al. (2006) used a different wave and a
are comparable to previous findings by Niemeier et al. (2006) who different approach.
Comparing specification (1) with specification (2), we notice Table 4

Descriptive statistics-production function.
that the estimate of the fast food consumption marginal effect is
much higher in the former than in the latter case (0.0145 vs. 0.0039). Variable Mean S.D.
The basic explanation is that when estimating the parameter 2 in zBMI wave I 0.4017 1.0178
the specification (1), we are not accounting for missing data on zBMI wave II 0.4485 1.0210
f
yit in time intervals between m and m − 1. In fact, we are estimat- zBMI wave III 0.7279 1.0903
g g zBMI wave IV 1.0173 0.9712
1− m 1− m
ing 2 ( 1−1 ), with ( 1−1 ) > 1, instead of estimating 2 . Also the Fast food wave II 2.3869 1.7810
1 1
Fast food wave III 2.6206 2.0685
estimate associated with the lagged endogenous variable is smaller
Fast food wave IV 2.2602 2.0790
in specification (1) than in specification (2) (i.e., 0.7591 vs. 0.9545). Agewave II 16.5741 1.5674
The explanation is that in the OLS specification we are implicitly Age wave III 22.0200 1.5613
estimating 1gm with |1 | < 1 as we are ignoring the irregularly time Age wave IV 29.1677 1.5417
Female 0.5156 0.4999
intervals.
White 0.6250 0.4842
While specification (2) accounts for the unequally spaced inter-
vals, it does not account for the correlation between lagged zBMI Obs. 1848
and the time-invariant unobserved effect and the idiosyncratic
error term of zBMI. Further it does take into account the fact that fast
food consumption may be correlated with the time-invariant unob-
served effect. To account for these possibilities, we follow MM and that lagged zBMI has a positive significant effect on current zBMI
instrument zBMI using the OLS residuals of the regression lagged level (=0.8447). This suggests that an exogenous shock on weight
zBMI on its backward mean. As for the fast food consumption, it has a stronger effect in the long term than in the short term. Based
is instrumented using Hausman and Taylor (1981) type of instru- on specification (4), an extra day of fast food restaurant visit per
ments, that is, by taking the deviations of fast food consumption 0.0031
week increases zBMI by 0.02 (= 1−0.8447 ) zBMI points or 4.45% in
with respect to individual sample mean as an instrument. the long term. The presence of a causal link between fast food
The estimation of specification (3) shows results that are consis- consumption and zBMI does not come as a surprise since pre-
tent with previous results, with some differences in the magnitude vious findings have been pointing in this direction (e.g., Levitsky
of the parameters. The estimated parameter associated with zBMI et al., 2004; Niemeier et al., 2006; Rosenheck, 2008). Combining
is smaller than the one obtained in the second specification (0.8461 the impact of fast food on weight gain with the social multiplier,
vs. 0.9545). As for the estimated parameter for fast food con- our results suggest that an extra day of fast food restaurant visits
sumption it is marginally higher when using E-NL-IV-C (0.0040 vs. per week leads to a zBMI increase of 0.0351 zBMI points, or 5.11%
0.0039). All results remain statistically significant. To complement (=4.45 % ×1.15) on average, as compared with 4.45% with no peer
results from the E-NL-IV-C we estimate an optimal GMM version effects. These results highlight a role for peer effects in fast food con-
(with a weighted matrix clustered at the the individual level) of sumption as one transmission mechanism through which weight
the same model. Estimation results are again within the expected gain is amplified.
lines and consistent with previous estimates with some variation As for the other covariates, age reveals to have a positive signif-
in the estimates of the lagged zBMI (0.8447 vs. 0.8461) and fast food icant effect on zBMI, an additional year increasing zBMI by 0.0072
consumption parameters (0.0039 vs. 0.0031). zBMI points. Also being female and white have a negative effect on
We retain specification (4) as our preferred one as it provide zBMI (respectively 0.0176 and 0.0153). To test for the relevance of
optimal and robust estimators. Results from this specification show the instruments we use Kleibergen-Paap rank test (which is used
Table 5
Weight production function.
Spec 1 Spec 2 Spec 3 Spec 4

OLS NLS E-NL-IV-C GMM (Adj. W)
Fast food consumption 0.0145*** 0.0039*** 0.0040** 0.0031**

(0.0050) (0.0009) (0.0020) (0.0014)
zBMI (lagged) 0.7591*** 0.9545*** 0.8461*** 0.8447***
(0.0115) (0.0024) (0.0186) (0.0146)
Age 0.0164*** 0.0031*** 0.0072*** 0.0072***
(0.0013) (0.0001) (0.0008) (0.0006)
Female = 1 −0.0189 −0.0022 −0.0188* −0.0176**
(0.0187) (0.0034) (0.0108) (0.0077)
White = 1 −0.0317* −0.0030 −0.0158 −0.0153**
(0.0190) (0.0034) (0.0107) (0.0077)
Obs. 3696 3696 3696 3696
Kleibergen Paap rk Wald F statistic 210.67 210.67

Stock-Yogo test Critical value (r = 0.10) at sign. level of 0.05% 7.03 7.03
Excluded instruments
Residual No No Yes Yes
Fast food de-meaned No No Yes Yes
All S.E. are robust and clustered at individual level.

GMM weighted matrix clustered at the individual level.
1848 observations per wave.
*
**
***
here to take into account the fact that the estimated standard errors Appendix A. Quasi maximum likelihood (QML) of the peer
are panel clustered). The test rejects the null hypothesis that the effects model
instruments are weak.31
Let us rewrite Eq. (7) for convenience:
Kl Ml yl = ˇKl Ml Gl yl + Kl Ml Xl + Kl Ml Gl Xl ı + l .
7. Conclusion
The elimination of fixed network effects using a within trans-
This paper investigates whether peer effects in adolescent formation leads to a singular variance matrix such that E(l l |
weight partly flow through the eating habits channel. We first Xl , Gl ) = Kl Kl 2 = Kl 2 . To resolve this problem of linear depend-
attempt to study the presence of significant endogenous peer ency between observations, we follow a suggestion by Lee et al.
effects in fast food consumption. New methods based on spatial (2010) and applied by Lin (2010). Let [Ql Cl ] be the orthonor-
econometric analysis are used to identify and estimate our model, mal matrix of Kl , where Ql corresponds to the eigenvalues of 1
under the assumption that individuals interact through a friendship and Cl to the eigenvalues of 0. The matrix Ql has the follow-
social network. Our results indicate that an increase in his friends’ ing properties: Ql Ql = In∗ , Ql Ql = Kl and Ql = 0, where n∗l = nl − 1
l
mean fast food consumption induces an adolescent to increase his with nl being the number of adolescents in the lth network. Pre-
own fast food consumption. This peer effect amplifies through a multiplying (7) by Ql , the structural model can now be written as
social multiplier the impact of any exogenous shock on fast food follows:
consumption. However, our estimated social multiplier based on
our preferred (conservative) specification is small as it is equal to M∗l y∗l = ˇM∗l G∗l y∗l + M∗l X∗l + M∗l G∗l X∗l ı + ∗l , (12)
1.15.
where M∗l = Ql Ml Ql , y∗l = Ql yl , G∗l = Ql Gl Ql , X∗l = Ql Xl , and ∗l =
We also estimate a dynamic weight production function which
Ql l . With this transformation, our problem of dependency
relates the individual’s Body Mass Index to his fast food consump-
between the observations is solved, since we have E(∗l ∗ l
| Xl , Gl ) =
tion. Our results reveal a positive significant impact of a change
in fast food consumption on the change in zBMI. Specifically, in 2 In∗ .
l
the long run, a one-unit increase in the weekly frequency (in days) Assuming that ∗l is a n∗l -dimensional i.i.d normally distributed
of fast food consumption produces an increase in zBMI by 4.45%. disturbance vector, the log-likelihood function of (12) is given by:
This effect reaches 5.11% when the social multiplier is taken into
account. This suggests the presence of a positive but low endoge-
nous peer effect. In short, our results are intermediate between −n∗ L
L
ln L = ln(2 2 ) + ln |In∗ − ˇG∗l | + ln |In∗ − M∗l |
studies on overweight or obesity that report no peer effects (e.g., 2 l l
l=1 l=1
Cohen-Cole and Fletcher, 2008a) and others that provide evidence
1 ∗ ∗
of strong peer effects (e.g., Trogdon et al., 2008; Christakis and L
Fowler, 2007) − l l , (13)
Coupled with the reduction in the relative price of fast food 2 2
l=1
and the increasing availability of fast food restaurants over time,
L
the social multiplier could somewhat increase the prevalence of where n∗ = n∗ = N − L, and, from (12), ∗l = M∗l (y∗l − ˇG∗l y∗l −
l=1 l
∗ ∗ ∗
obesity in the years to come. Conversely, this multiplier may con- Xl − Gl Xl ı). Maximizing (13) with respect to (ˇ, , ı , , ) yields
tribute to the decline of the spread of obesity and the decrease the maximum likelihood estimators of the model. Interestingly, the
in health care costs, as long as it is exploited by policy mak- QML method is implemented after the elimination of the network
ers through tax and subsidy reforms encouraging adequate eating fixed effects. Therefore, the estimators are not subject to the inci-
habits among adolescents, or used to implement network based dental parameter problem that may arise since the number of fixed
interventions to promote healthy eating behaviours (Fletcher et al., effects increases with the size of the networks sample. To compute
2011). robust standard errors, we use a sandwich form A−1 BA−1 , where A
There are many possible extensions to this paper. From a is minus the expectation of the Hessian matrix and B is the expec-
policy perspective, it would be interesting to investigate the pres- tation of the outer product of the gradient matrix. An advantage of
ence of peer effects in physical activity of adolescents. A recent this approach is that it allows us to obtain robust standard errors
study by Charness and Gneezy (2009) finds that there is room that are not driven by the normality assumption that ML imposes
for intervention in peoples’ decisions to perform physical exercise on the error term.
through financial incentives. It would be thus valuable to investi-
gate whether there is a social multiplier that can be exploited to
Appendix B. Generalized spatial two stage least squares
amplify these effects. Furthermore, in the same way, it would be
(GS-2SLS) of the peer effects model
interesting to study the presence of peer effects weight percep-
tions. So far, most of the peer effects work has focused mainly on
To estimate the model (12), we also adopt a generalized spatial
BMI outcomes. At the methodological level, a possible extension
two-stage least squares procedure presented in Lee et al. (2010).
would be to assume a Poisson or a Negative Binomial distribution
This approach provides a simple and tractable numerical method
to account for the count nature of the consumption data at hand. As
to obtain asymptotically efficient IV estimators within the class of
far as we know, no work has been carried out in this area. Finally,
IV estimators. In the case of our paper this method will consist of
it would be most useful to develop a general approach that would
a two-step estimation.32 To simply the notation, Let X∗ be a block-
allow same sex and opposite sex peer effects to be different for both
diagonal matrix with X∗ l on its diagonal, G∗ be a block-diagonal
males and females.
matrix with G∗ l on its diagonal, and y∗ the concatenated vector of
the yl∗ ’s over all networks.
31
As the model is exactly identified, it is not possible to conduct an over identifi-
cation test. 32
Note that for this particular case we impose = 0 and thus Ml = Il .
∗
Now, let us denote by X̃ the matrix of explanatory variables Charness, G., Gneezy, U., 2009. Incentives to exercise. Econometrica 77 (3), 909–931.
∗
such that X̃ = [G∗ y∗ X∗ G∗ X∗ ]. Let P be the weighting matrix Christakis, N.A., Fowler, J.H., 2013. Social contagion theory: examining dynamic
social networks and human behavior. Statistics in Medicine 32 (4), 556–577.
such that P = S(S S)−1 S , and S a matrix of instruments such that Christakis, N., Fowler, J., 2007. The spread of obesity in a large social network over
S = X∗ G∗ X∗ G∗2 X∗ . In the first step, we estimate the following 32 years. New England Journal of Medicine 357 (4), 370–379.
Cliff, A., Ord, J., 1981. Spatial Processes: Models & Applications. Pion Ltd.
2SLS estimator:
Cohen-Cole, E., Fletcher, J., 2008a. Detecting implausible social network effects in
∗ ∗ −1 ∗ acne height and headaches: longitudinal analysis. British Medical Journal 337,
ˆ 1 = (X̃ PX̃ ) X̃ Py∗ , a2533.
Cohen-Cole, E., Fletcher, J., 2008b. Is obesity contagious? Social networks vs. envi-
where 1 is the first-step 2SLS vector of estimated parameters ronmental factors in the obesity epidemic. Journal of Health Economics 27 (5),

ˆ ) of the structural model. This estimator is consistent but
(ˆ 1 , ı̂1 , ˇ 1143–1406.
1 Currie, J., DellaVigna, S., Moretti, E., Pathania, V., 2010. The effect of fast food restau-
not asymptotically efficient within the class of IV estimators. rants on obesity and weight gain. American Economic Journal: Economic Policy
Now, in the second step, we estimate a 2SLS using a new matrix 2, 34–65.
of instruments Ẑ given by: Cutler, D., Glaeser, E., Sphapiro, J., 2003. Why have Americans become more obese?
∗
Journal of Economic Perspectives 17, 93–118.
Ẑ = G∗ ŷ X∗ G∗ X∗ , De la Haye, K., Robins, G., Mohr, P., Wilson, C., 2010. Obesity-related behaviors in
adolescent friendship networks. Social Networks 32 (3), 161–167.
∗ Dunn, R.A., Sharkey, J.R., Horel, S., 2012. The effect of fast-food availability on
where G∗ ŷ is computed from the first-step 2SLS reduced form (pre- fast-food consumption and obesity among rural residents: an analysis by
multiplied by G∗ ): race/ethnicity. Economics & Human Biology 10 (1), 1–13.
Everaert, G., 2013. Orthogonal to backward mean transformation for dynamic panel
∗ −1 data models. Econometrics Journal 16, 179–221.
G∗ ŷ = G∗ (I − ˇ
ˆ 1 G∗ ) (X∗ ˆ 1 + G∗ X∗ ı̂1 ). (14) Finkelstein, E.A., Trogdon, J.G., Cohen, J.W., Dietz, W., 2009. Annual medical spending
attributable to obesity: payer-and service-specific estimates. Health Affairs 28
We then estimate: (5), w822–w831.
∗ −1 Fletcher, A., Bonell, C., Sorhaindo, A., 2011. You are what your friends eat: system-
2 = (Ẑ X̃ ) Ẑy∗ . atic review of social network analyses of young people’s eating behaviours and
bodyweight. Journal of Epidemiology and Community Health, jech-2010.
This estimator can be shown to be consistent and asymptoti- Fowler, J., Christakis, N., 2008. Estimating peer effects on health in social networks.
Journal of Health Economics 27 (5), 1400–1405.
cally best IV estimator. Its asymptotic variance matrix is given
∗ ∗ −1 Goldsmith-Pinkham, P., Imbens, G.W., 2013. Social networks and the identification
by N[Z X̃ R−1 X̃ Z] . The matrix R is consistently estimated by of peer effects. Journal of Business and Economic Statistics 31 (3), 253–264.
N 2 Hausman, J.A., Taylor, W.E., 1981. Panel data and unobservable individual effects.
R̂ = s2 ẐNẐ , where s2 = N −1 i=1 ûi and ûi are the residuals from Econometrica 49 (6), 1377–1398.
the second step. It is important to note that, as in Kelejian and Hsieh, C.-S.-C., Lee, L.-F., 2011. A social interactions model with endogenous friend-
Prucha (1998), we assume that errors are homoscedastic. The esti- ship formation and selectivity, Working paper. Mimeo.
Kelejian, H., Prucha, I., 1998. A generalized spatial two-stage least squares procedure
mation theory developed by Kelejian and Prucha (1998) under the for estimating a spatial autoregressive model with autoregressive disturbances.
assumption of homoscedastic errors does not apply if we assume Journal of Real Estate Finance and Economics 17 (1), 99–121.
heteroscedastic errors (Kelejian and Prucha, 2010). Kelejian, H., Prucha, I., 2010. Specification and estimation of spatial autoregres-
sive models with autoregressive and heteroskedastic disturbances. Journal of
Econometrics 157, 53–67.
References Lee, L., 2003. Best spatial two-stage least squares estimators for a spatial autore-
gressive model with autoregressive disturbances. Econometric Reviews 22 (4),
Aguiar, M., Hurst, E., 2005. Consumption versus expenditure. Journal of Political 307–335.
Economy 113 (5), 919–948. Lee, L.-F., Liu, X., Lin, X., 2010. Specification and estimation of social interaction
Ali, M.M., Amialchuk, A., Heiland, F.W., 2011. Weight-related behavior among ado- models with networks structure. Econometrics Journal 13 (2), 143–176.
lescents: the role of peer effects. PLoS ONE 6 (6), e21179. Levitsky, D., Halbmaier, C., Mrdjenovic, G., 2004. The freshman weight gain: a model
Ali, M.M., Amialchuk, A., Renna, F., 2011. Social network and weight misperception for the study of the epidemic of obesity. International Journal of Obesity 28 (11),
among adolescents. Southern Economic Journal 77 (4), 827–842. 1435–1442.
Alviola, I.V., Nayga Jr., P.A., Thomsen, R.M.M.R., Danforth, D., Smartt, J., 2014. The Lin, X., 2010. Identifying peer effects in student academic achievement by spatial
effect of fast-food restaurants on childhood obesity: a school level analysis. autoregressive models with group unobservables. Journal of Labor Economics
Economics & Human Biology 12, 110–119. 28 (4), 825–860.
Anderson, B., Lyon-Callo, S., Fussman, C., Imes, G., Rafferty, A.P., 2011. Peer reviewed: Liu, X., Patacchini, E., Rainone, E., 2013. The allocation of time in sleep: a social
fast-food consumption and obesity among Michigan adults. Preventing chronic network model with sampled data, Working Papers w162. Center For Policy
disease 8 (4). Research, The Maxwell School.
Anderson, M., Matsa, D., 2011. Are restaurants really supersizing America? American Lyons, R., 2011. The spread of evidence-poor medicine via flawed social-network
Economic Journal: Applied Economics 3 (1), 152–188. analysis. Statistics, Politics, and Policy 2 (1), 2.
Anderson, T., Hsiao, C., 1981. Estimation of dynamic models with error components. Maggio, C., Pi-Sunyer, F., 2003. Obesity and type 2 diabetes. Endocrinology and
Journal of the American Statistical Association 76, 598–606. metabolism clinics of North America 32 (4), 805–822.
Arellano, M., Bond, S., 1991. Some tests of specification for panel data: Monte Carlo Manski, C.F., 1993. Identification of endogenous social effects: the reflection prob-
evidence and an application to employment equations. Review of Economic lem. Review of Economic Studies 60 (3), 531–542.
Studies 58 (2), 277–297. Millimet, D., McDonough, I., 2013. Dynamic panel data models with irregular spac-
Auld, M.C., Powell, L.M., 2009. Economics of food energy density and adolescent ing: with applications to early childhood development, Working paper.
body weight. Economica 76 (304), 719–740. Niemeier, H., Raynor, H., Lloyd-Richardson, E., Rogers, M., Wing, R., 2006. Fast food
Badev, A., 2013. Discrete games in endogenous networks: Theory and policy, Work- consumption and breakfast skipping: predictors of weight gain from adoles-
ing Papers 2-1-2013. University of Pennsylvania Scholarly Commons. cence to adulthood in a nationally representative sample. Journal of Adolescent
Berentzen, T., Petersen, L., Schnohr, P., Sørensen, T., 2008. Physical activity in Health 39 (6), 842–849.
leisure-time is not associated with 10-year changes in waist circumference. Ogden, C.L., Carroll, M.D., Kit, B.K., Flegal, K.M., 2012. Prevalence of obesity and trends
Scandinavian Journal of Medicine & Science in Sports 18 (6), 719–727. in body mass index among us children and adolescents, 1999–2010. Journal of
Bleich, S., Cutler, D., Murray, C., Adams, A., 2008. Why is the developed world obese? the American Medical Association 307 (5), 483–490.
Annual Review of Public health 29, 273–295. Powell, L., Bao, Y., 2009. Food prices access to food outlets and child weight out-
Blume, L., Brock, W., Durlauf, S., Jayaraman, R., 2015. Linear social interactions mod- comes: a longitudinal analysis. Economics and Human Biology 7, 64–72.
els. Journal of Political Economy 123 (2), 444–496. Powell, L., Chriqui, J., Khan, T., Wada, R., Chaloupka, F., 2013. Assessing the poten-
Boucher, V., 2014. Conformism and Self-Selection in Social Networks. Mimeo. tial effectiveness of food and beverage taxes and subsidies for improving public
Bramoullé, Y., Djebbari, H., Fortin, B., 2009. Identification of peer effects through health: a systematic review of prices, demand and body weight outcomes. Obe-
social networks. Journal of Econometrics 150 (1), 41–55. sity Reviews 14 (2), 110–128.
Calabr, P., Golia, E., Maddaloni, V., Malvezzi, M., Casillo, B., Marotta, C., Calabrò, Powell, L.M., 2009. Fast food costs and adolescent body mass index: evidence from
R., Golino, P., 2009. Adipose tissue-mediated inflammation: the missing link panel data. Journal of Health Economics 28 (5), 963–970.
between obesity and cardiovascular disease? Internal and Emergency Medicine Renna, F., Grafova, I.B., Thakur, N., 2008. The effect of friends on adolescent body
4 (1), 25–34. weight. Economics and Human Biology 6 (3), 377–387.
Calle, E., 2007. Obesity and cancer. British Medical Journal 335 (7630), 1107–1108. Rosenheck, R., 2008. Fast food consumption and increased caloric intake: a system-
Caraher, M., Cowburn, G., 2007. Taxing food: implications for public health nutrition. atic review of a trajectory towards weight gain and obesity risk. Obesity Reviews
Public Health Nutrition 8 (08), 1242–1249. 9 (6), 535–547.
Shalizi, C., Thomas, A., 2011. Homophily and contagion are generically confounded Yakusheva, O., Kapinos, K.A., Eisenberg, D., 2014. Estimating heterogeneous
in observational social network studies. Sociological Methods & Research 40 (2), and hierarchical peer effects on body weight using roommate assign-
211. ments as a natural experiment. Journal of Human Resources 49 (1),
Trogdon, J.G., Nonnemaker, J., Pais, J., 2008. Peer effects in adolescent overweight. 234–261.
Journal of Health Economics 27 (5), 1388–1399. Yakusheva, O., Kapinos, K., Weiss, M., 2011. Peer effects and the freshman 15:
VanderWeele, T., 2011. Sensitivity analysis for contagion effects in social networks. evidence from a natural experiment. Economics & Human Biology 9 (2),
Sociological Methods & Research 40 (2), 240. 119–132.

Unintended effects of reimbursement schedules in mental health care

Rudy Douven a,b,∗,∗∗ , Minke Remmerswaal a , Ilaria Mosca c
a
CPB Netherlands Bureau for Economic Policy Analysis, The Hague, The Netherlands
b
Erasmus University, Rotterdam, The Netherlands
c
Ecorys, Rotterdam, The Netherlands
Article history: We evaluate the introduction of a reimbursement schedule for self-employed mental health care
Received 10 December 2014 providers in the Netherlands in 2008. The reimbursement schedule follows a discontinuous discrete
Received in revised form 20 March 2015 step function—once the provider has passed a treatment duration threshold the fee is flat until a next
threshold is reached. We use administrative mental health care data of the total Dutch population from
2008 to 2010. We find an “efficiency” effect: on the flat part of the fee schedule providers reduce treatment
duration by 2 to 7% compared to a control group. However, we also find unintended effects: providers
Keywords:
treat patients longer to reach a next threshold and obtain a higher fee. The data shows gaps and bunches
Mental health care
Provider payment
in the distribution function of treatment durations, just before and after a threshold. About 11 to 13% of
Regression discontinuity design treatments are shifted over a next threshold, resulting in a cost increase of approximately 7 to 9%.
Policy evaluation © 2015 Elsevier B.V. All rights reserved.
Regulated competition
The Netherlands
I11
I12
I18
1. Introduction a case mix based reimbursement that we will review in Section 22 .

Mason and Goddard (2009) review the international literature on
Before 2008, all mental health care in the Netherlands was reimbursing mental health care providers and argue that case mix
organized and funded in a national insurance scheme (Exceptional based funding offers incentives for a range of objectives, includ-
Medical Expenses Act (AWBZ)). The AWBZ was paid for by income- ing improvements in efficiency, quality of care and patient choice.
differentiated premiums raised through taxes and it provided long They criticize the Dutch reimbursement schedule and state: “. . .it
term and mental health care for all citizens. Mental health care [. . .] therefore does not appear to encourage early discharge. . .” and
providers were mainly funded with budgets. This changed in 2008, “. . .could incentivize providers. . .to deliver medically unnecessary
when the Dutch government placed a part of mental health care, treatments. . .”. Dutch policymakers also recognized that the reim-
the curative and acute mental health care, under the regime of reg- bursement schedule in mental health care might create unintended
ulated competition1 . The goal of this policy change was to improve incentives (VWS, 2010; NZa, 2010). This research aims to quantify
the efficiency in the sector by letting private insurers buy care on these possible effects.
behalf of their enrollees. Providers no longer receive budgets, but The design of a payment system is a complicated matter,
especially in mental health care. Uncertainty and variations in
treatments are likely to be great in the mental health care market
∗ Corresponding author at: CPB Netherlands Bureau for Economic Policy Analysis, making the response of patients and providers to financial incen-
PO Box 80510, 2508 GM The Hague, The Netherlands. Tel.: +31 703383377. tives larger than in other areas of health care (Frank and McGuire,
E-mail addresses: R.Douven@cpb.nl (R. Douven), M.Remmerswaal@cpb.nl 2000). A large body of the literature in health economics estab-
(M. Remmerswaal), Ilaria.Mosca@ecorys.com (I. Mosca). lishes that health care providers respond to financial incentives
∗∗
When this paper was written Douven was a Harkness Fellow 2013/2014 at Har-
vard Medical School, supported by the Commonwealth Fund, a private independent
foundation based in New York City. The views presented here are those of the author
and not necessarily those of The Commonwealth Fund, its directors, officers or staff.
1 2
Managed competition in the Dutch curative care sector was introduced in 2006 The case mix refers to the mix of different types of patients that are treated by
(Van de Ven and Schut, 2008). the provider.
140 R. Douven et al. / Journal of Health Economics 42 (2015) 139–150
(for excellent overviews see Chandra et al., 2012; McGuire, 2000). primary care, which is provided by a general practitioner, psy-
Most empirical evidence concerns the US and shows that fee-for- chologist, psychotherapist or psychiatrist4 . Patients with a more
service payment provides incentives for overtreatment. Some of serious condition need specialized care and are referred to sec-
the first papers on this topic are Epstein et al. (1986), Hickson ondary care. Secondary care is split into curative care and long-term
et al. (1987) and Stearns et al. (1992). Recently, in the Netherlands, care. Long-term care patients usually remain in an institution such
similar behavioral responses have also been reported since the as a residence or other kind of mental health facility for longer than
introduction of regulated competition in the Dutch hospital market a year. Our study focuses on patients who receive curative care.
(Douven et al., 2015) and market for general practitioners (van Dijk They can receive care in an inpatient or outpatient setting and their
et al., 2013). Less research has been done on case mix based fund- treatment does not last longer than a year.
ing in the mental health care market (Mason and Goddard, 2009). In The reform to regulated competition in 2008 required many
the US, Jennison and Ellis (1987) found an 18% increase in the rate changes for providers, health insurers and regulators. The govern-
of visits per mental health provider per month when they shifted ment decided upon a transition period between 2008 and 2010, in
from a salaried basis to a fee-for-service basis. Rosenthal (2000) which health insurers became responsible for the services of mental
has examined the effects of risk sharing with mental health care health care providers. However, during the transition period insur-
providers. She found that providers that received a salary reduced ers did not incur financial risk on providing mental health care5 .
their number of visits by 20 to 25% compared to providers who Since 2008, providers are reimbursed on their case mix, called a DBC
were still paid for each visit. Bellows and Halpin (2008) studied (Diagnosis Treatment Combination). A DBC refers to the complete
the impact of Medicaid reimbursement on mental health quality treatment episode of a patient. It starts with the initial consulta-
indicators and found evidence of upcoding of quality indicators to tion and continues until the provider ends the treatment. Consider
increase reimbursement. for example a patient with mild depression that for ten months
This is the first study to evaluate the introduction of a new receives each month an individual therapy for 60 min by a psy-
reimbursement schedule in mental health care in the Netherlands. chotherapist (and no other form of medication or treatment). This
The reimbursement function follows a discontinuous discrete step patient’s treatment can be coded with the following DBC: “Depres-
function —once the provider has passed a treatment duration sion, 250 to 800 min, no medication” (DBC Onderhoud, 2013). If
threshold the fee does not increase until a next threshold is reached. a treatment episode lasts longer than one year, the DBC is closed
We look at two effects: efficiency and unintended effects. Our study automatically. After that year a new DBC is opened. With the closed
shows that the unintended effects – i.e. providers treat patients DBC a provider can receive reimbursement from his patient’s health
longer to reach a next threshold and obtain a higher fee – out- care insurer. The fee covers all labor and capital costs related to
weigh the efficiency effect—i.e. on the flat part of the fee schedule the treatment episode. The reimbursement fee for a DBC was fixed
providers treat patients shorter and prolong treatment only if during our period of study and set prospectively by the Dutch
marginal benefits to patients outweigh marginal costs. We separate Healthcare Authority (NZa). Patients’ out-of pocket payments were
out these two effects by using regression discontinuity design (see limited6 .
e.g. Lee and Lemieux, 2010)3 . Providers’ behavior around discontin- Most mental health providers worked in large regional institu-
uous fee thresholds are most likely be explained by the change in tions in the period under consideration. These institutions can be
fee, and not by other contemporary factors such as medical quality, a regional facility for ambulatory care, but also a specialized psy-
treatment outcome, location or other unobserved factors. We use chiatric hospital. Often, many different types of mental health care
a quasi-experimental design in which 10% of all mental health care specialists work together. Their payment was before (and after)
providers are paid according to the new reimbursement schedule, 2008 still based on annual budgets. These budgets were based on
while 90% of providers were not subject to the reform. This latter expected case mix and several regional budget parameters (such
group serves as a control group. We find an efficiency effect: we as inflation, wages, capital costs etc.). Mental health care specialists
estimate a reduction in treatment duration by 2 to 7% and lower who work at a budgeted institution received a fixed salary7 . Negoti-
costs by 3 to 6% compared to a control group. However, we also ations with health care insurers only took place with the dominant
find unintended effects: in total, about 11 to 13% of treatments health insurer in the geographical region between 2008 and 2010.
are shifted to over a next threshold, resulting in a cost increase These mostly large mental health care institutions account for
of approximately 7 to 9%. about 90% of the sector (NZa, 2012). Henceforth, we will use these
The outline of our paper is as follows. Section 2 provides a con- ‘budgeted’ or B providers in our study as a control group because
cise overview of the Dutch mental health care system. Section 3 their individual salaries during 2008–2010 were not related to the
describes the economic theory relating to the new reimbursement new reimbursement schedule.
schedule. Section 4 describes the data and Section 5 presents the About 10% of the mental health care providers choose to work
estimation methods. Section 6 presents the results and Section 7 independently, e.g. private practices. Only this group of self-
concludes. employed providers, and new providers that entered the market
after January 1st of 2008, received their income according to
2. The Dutch mental health care system the new reimbursement schedule. Contrary to B providers, the
Although the mental health status of the Dutch population has

been roughly stable since 1975, the number of people that use pro- 4
As of 2008, groups of practice nurses, social workers and psychologists (named
fessional mental health services has increased with about 10% per POH-GGZ) entered the market to support general practitioners.
5
year from 535,000 patients in 2001 to about 1 million patients in Health insurers had therefore no financial incentives to control costs. The pol-
icy was that first a proper risk adjustment system should be implemented before
2009 (GGZ Nederland, 2010). health insurers could bear more financial risks. In 2013, DBC-fees became subject
Dutch mental health care distinguishes between primary and to negotiation between insurers and providers. To stimulate efficiency, the govern-
secondary care. Patients with mild mental disorders usually go to ment started programs to develop quality indicators in mental health care. In 2013,
a critical report (Rekenkamer, 2013) concluded that the stability and quality of most
indicators is poor and needs to be improved.
6
There was a mandatory annual deductible of 150 euro (in 2008) to 165 euro
3
In the analysis we use cross-sectional comparisons between the two groups (in 2010) for all curative services (including mental health services) except general
instead of a difference-in-difference model because there is no data available prior practitioner care and obstetrics.
7
to the reform in 2008. The government made agreement with labor unions about these salaries.
R. Douven et al. / Journal of Health Economics 42 (2015) 139–150 141
shows that the reimbursement schedule is a discrete step function,

in which fees are flat and only increase after a threshold is reached.
The fees at each duration threshold slightly differ across specialties
(e.g. depression, anxiety disorders etc.). Only the specialty ‘other
childhood disorders’ has higher fees for treatments with more than
3000 min (NZa, 2007–2009)10 .
The idea of the step function is a combination between a
prospective fee per episode of care (the flat part of the reimburse-
ment schedule) and fee-for-service (fees increase after a threshold
has been reached). Prospective fees create incentives for providers
to limit treatment duration which may lead to an efficient provi-
sion of care (McGuire, 2000). However, if the prospective fee is not
adjusted for the severity of the patient then a provider has a poten-
tial incentive to select less severe patients, i.e. patients that need
short treatment duration. The idea of the step function is to prevent
such selection incentives by setting higher fees for more severe
Fig. 1. Reimbursement schedule for the specialty ‘depression’. patients. However, setting higher fees exacerbate fee-for-service
incentives like overprovision of care around thresholds. For exam-
self-employed had to negotiate with all health care insurers in ple, the reimbursement fee for a treatment duration of 2900 min
the period concerned. The focus of self-employed providers was is 3703 euro, while a prolongation of the treatment with 100 min
initially mostly on new, innovative segments of the mental care yields 6374 euro. This is a small difference in terms of treatment
market, such as addiction clinics, youth mental health, and a combi- duration but a large difference in financial reward. The reimburse-
nation of services on work and mental health recovery (NZa, 2011). ment schedule in Fig. 1 may result in overprovision or “bunching”
This group of self-employed providers will be our treatment group of treatments at thresholds.
and, henceforth, we will call these providers ‘non-budgeted’ or NB In line with Ellis and McGuire (1986, 1990), referred to as E&M
providers. from here on, we formulate a utility function of provider j for pro-
To obtain mental health services patients need a referral from a viding patient i with health severity i a treatment duration xi . The
general practitioner. After a referral patients are in principle free to function is composed of two parts: benefits Bi to the patient and
choose any mental health care provider; in practice however they profits i for the provider.
will often follow the advice of their general practitioner. Although
Uij = Bi xi , i + ˛j i (xi ) (1)
B and NB providers tend to specialize in certain mental health con-
ditions (see Section 4), we assume that for a given mental health As in E&M, the agency parameter ˛j describes the extent to
condition there are on average no differences between NB and B which a provider weights the benefits to the patient relative to
providers in the types of patients they treat. This assumption is its own profits. For example, an entrepreneurial provider may
likely to hold true as we analyze many mental health conditions attribute a higher ˛j to profits. For the benefits to the patient Bi (xi ,

that are offered by both types of providers. i ) we make the standard assumptions ∂Bi xi , i /∂xi > 0 at xi = 0
2 2
and ∂ Bi xi , i /∂ xi < 0, indicating that at the start of the treat-
3. The economic theory of bunching of treatment durations ment there is a positive benefit to the patient and the marginal
benefit to the patient declines as treatment duration increases. We
In this section we will explain in more detail the new reimburse- model the profit function i (xi ) for a NB provider in (1) as follows:
ment schedule and how we separate out efficiency and unintended
effects. Treatment duration forms the basis of the size of the fee i (xi ) = P (xi ) − cxi with P (xi ) = Pl for kl ≤ xi < kl+1 (2)
in the new schedule, and is calculated as a weighted sum of sev- where kl represents the treatment duration thresholds, with l = 1,
eral components, i.e. several activities8 . Individual contact by the . . .,5, and k1 = 250, k2 = 800, k3 = 1800, k4 = 3000, k5 = 6000 min. P(xi )
provider with the patient receives the highest weight. It can be a is the flat fee rate for a treatment duration xi . For example, in Fig. 1,
consult, intake or therapy session. Lower weighted components are P(350) = 1038 and P(1000) = 2050 euro11 . Provider costs are repre-
the time that a patient spends on other organized activities, such sented by a simple linear cost function cxi and indicate production
as group therapy session, and the number of days that a patient costs as well as indirect costs such as foregone leisure time. Note
stays overnight in an institution. If for example, a patient receives that the profit function i (xi ) is discontinuous at a threshold xi = kl
eight therapy sessions of 1 h (duration is 480 min), 10 h of group In line with E&M we assume that a provider maximizing its utility
sessions (weighted as a total duration of 150 min) and three days solves the problem:
in a residence (weighted as 180 min), it accounts for total treatment
duration of 810 min. Fig. 1 shows the reimbursement schedule for max xi Bi xi , i + ˛j (P (xi ) − cxi ) (3)
DBC category ‘depression’. The X-axis shows the different classes of
Thus, given a patient’s severity i , and provider agency type ˛j ,
treatment duration. All DBC categories, in all specialties of mental
a provider will choose a treatment duration xi that solves the max-
health care, have the same treatment duration thresholds: at 250,
imization problem in (3). Solving (3) returns that marginal benefits
800, 1800, 3000, 6000 min9 . The Y-axis shows the corresponding
equal marginal costs:
fees. They are unweighted averages for the years 2008–2010. Fig. 1

∂Bi xi , i
= ˛j c, with discontinuities at xi = kl . (4)
∂xi
8
The main activities that shape a DBC are: diagnostics and treatment, daytime
activities, residence in an institution, and (medical) activities. Each one of these
activities is further split into other activities. To each one of this activity belongs a
10
duration in minutes and a corresponding tariff. No major changes occurred in the reimbursement schedule throughout the
9
Thresholds occur also at 12,000, 18,000 and 24,000 min but we capped the studied period.
11
duration time at 7000 min because such long treatment durations were rare. These are the reimbursement fees for depression.
treatment durations. For example, a provider that hits a treatment

duration threshold will not end the treatment but will prolong
treatment as long as marginal benefits to the patient outweigh
marginal costs15 . Now, consider our comparison group, the B
providers who receive a fixed salary. Compared to NB providers, we
expect no bunching at treatment duration thresholds kl because B
providers face no particular financial consequences around these
thresholds. We make a general assumption
about the behavior of
B providers, namely that ∂Bi xi , i /∂xi = d, where d is a constant.
The incentive structure may differ between B and NB providers.
B and NB providers. B providers are paid a fixed amount for a
fixed period of time. Salaried providers have no incentives to
deliver unnecessary services, nor an incentive for “underprovision”
except to the degree that providers may “shirk” under salaried
arrangements. That means that they may attempt to provide fewer
services –shorten treatment duration− than under other con-
tractual arrangements (Christianson and Conrad, 2011). Under a
prospective fee shirking is less of a problem because production of
NB providers is directly related to their income. Although, keep-
ing treatment durations short allows them to treat more patients
in a given time frame, resulting in more income. Moreover, B and
NB providers may differ in how they are confronted with costs.
Fig. 2. Bunching at treatment duration thresholds. For example, salaried providers may put less weight on costs than
NB providers because their institution covers partly these costs. In
the extreme case, they face no costs (d = 0) and salaried providers
In Fig. 2, we illustrate that solving this optimization prob- care only about patient benefits and not about costs. We take an
12
lem results in bunching at treatment duration thresholds kl . agnostic approach here and let the data decide whether treatment
We plot the various marginal benefit functions ∂Bi xi , i /∂xi duration differs between B and NB providers. There can be three dif-
and the marginal loss line ˛j c, which is discontinuous at thresh- ferent outcomes in Fig. 2. The marginal loss line d is located below
olds k1 and k2 . We observe a spike at both treatment duration the largest vertical spikes of the NB providers then B providers
thresholds because reaching such a threshold implies that the would treat all patients longer than NB providers. The marginal
provider receives a higher reimbursement fee (or bonus). The size loss line d is located above the marginal loss line ˛j c and B providers
of both spikes depends on the fee difference before and after the treat patients shorter. In Fig. 3, we plotted the third possibility: the
threshold13 . When the marginal benefit function of the patient with marginal loss line d of the budgeted providers is situated below the
severity 1 crosses the marginal loss line ˛j c in Fig. 2 the provider marginal loss line ˛j c but not below the spikes. Treatment durations
will not end its treatment but prolong treatment until k1 because of B providers can be shorter (for example for patients with sever-
its utility is maximized at the threshold k1 . A similar reasoning ity 2 ) and longer (for example for patients with severity 1 ) than
applies to the marginal benefit function 2 , the provider prolongs for NB providers. In this case we can distinguish two effects. First,
treatment until k2 . some patients may be treated shorter. We call this an “efficiency”
The result is bunching. The distribution of treatment durations effect. This effect is measured by the vertical distance between the
will exhibit gaps before treatment duration thresholds. These gaps marginal loss line d and the marginal loss line ˛j c. Second, some
are expected to be larger for treatment durations closer to k1 and patients may be treated longer. Unintended effects of bunching
k2 14 . around thresholds (vertical lines in Figs. 2 and 3) may result in
The reimbursement schedule provides also incentives for effi- “overprovision” of care.
ciency. For example, if ˛ j = 1 then ∂Bi xi , i /∂xi = c, and all dots
on the marginal loss line ˛j c in Fig. 2 correspond with socially opti-
mal treatment durations, where marginal benefit to the patient 4. Descriptive statistics
equals marginal costs (McGuire, 2000). In the case of ˛j = 1, bunch-
ing implies overtreatment. If ˛j > 1 all dots on the marginal loss line We obtained our dataset from an administrative database main-
˛j c correspond with under treatment. Bunching implies that some tained by the NZa and contains all registered DBCs from providers
treatment durations are prolonged and become closer to the cost in the secondary curative mental health care in the Netherlands for
efficient duration (although some overshooting may also happen). the period 2008 to 2010. All treatments had a minimum duration of
Similarly, if ˛j < 1, there is overtreatment and bunching implies 250 min and there were only few DBCs with a very long treatment
even more overtreatment. duration, therefore we restricted our sample to DBCs with a max-
Important for our estimation procedure is the notion that imum treatment duration of 4,000 min16 . Table 1 summarizes the
there is only a financial incentive to prolong, and not to shorten,
15
An exception could be a provider with (too) many patients in his practice. Such a
12
Bajari et al. (2011) perform a similar analyses with figures. provider may have a financial incentive to end a treatment after hitting a threshold
13
The size of the spike has to be determined empirically. Around kl , locally holds because treating a new patient may be more rewarding (in terms of profits and
∂Pi (xi ) /∂xi = −∞, implying an infinite spike. However, in practice the decision to total patient benefits). Vice versa a provider with a shortage of patients may have
prolong treatment is more discrete in nature. For substantial shorter treatment dura- an incentive to prolong treatment duration securing his financial income. In our
tions than at thresholds kl , the provider has to trade off the costs associated with analysis we assume that these are second order effects.
treating the patient longer versus the size of the fee difference equal to Pl+1 − Pl . 16
Treatment durations below 250 min belong to primary mental health care.
14
Suppose treatment duration is at a local optimum. The farther away this treat- About 8% of the treatments had treatment duration longer than 4000 min. Note that
ment duration is from a threshold duration k the more costly it will be for a provider 4000 min is well below 6000 min, so estimation errors that occur because providers
to move to the threshold k. prolong treatment duration to 6000 min are likely to be small.
Table 1
Description of data* .
Specialty Number of DBCs
2008 2009 2010 Total
Depression 80,444 78,944 76,975 236,363

Anxiety disorders 57,829 60,262 60,706 178,797
Other mental disorders and problems 49,282 50,901 49,602 149,785
Adjustment disorders 46,693 49,865 49,080 145,638
Hyperkinetic disorders 35,271 41,463 43,442 120,176
Personality disorders 39,077 39,122 39,127 117,326
Other diagnoses 26,797 28,362 30,611 85,770
Schizophrenia 24,832 27,053 28,234 80,119
Pervasive disorders 27,425 25,820 24,968 78,213
Delirium dementia and other disorders 17,796 17,680 17,617 53,093
Other substance use disorders 14,544 15,004 15,353 44,901
Alcohol use disorders 14,170 14,071 13,796 42,037
Other childhood disorders 9427 13,398 18,576 41,401
Bipolar disorders 13,228 12,423 12,349 38,000
Total 456,815 474,368 480,436 1411,619
*
The numbers in the table correspond to DBCs with treatment durations smaller than 4000 min.
and 100 (no symptoms). Providers report these GAF-scores at the

beginning of a treatment.
Table 2 shows that patients are unevenly distributed across
providers. To obtain enough power for our tests we narrowed
down our patient sample and considered only patients within the
following specialties: depression, anxiety disorders, adjustment
disorders, and personality disorders17 . To obtain similar patient
characteristics for comparing our treatment and control group we
only selected patients in the category “adults” that received indi-
vidual therapy sessions. Also, DBCs were closed on a regular basis
and patients received no prescribed medication. Furthermore, we
corrected the subsamples for the severity of the diseases. Based
on the GAF scores four subsamples per specialty were created. The
first subsample considers all patients that received as initial assess-
ment a GAF score between 41 and 70. The other three subsamples
are selected from this subsample, each containing only patients
with one of the following GAF scores: 41–50, 51–60, or 61–7018 .
For these subsamples patients treated by B and NB providers have
exactly the same characteristics and, thus, can be compared19 .
Table 3 summarizes and shows the number of observations for each
subsample20 . Note we also included the total sample in our estima-
tions. Patient characteristics of the total sample are very likely to
Fig. 3. Marginal profit line of budgeted providers lower than ˛jc . differ between B and NB providers but it provides an estimate of the
total effect of prolonging treatment durations due to the existence
of various thresholds.
data. It contains approximately 1,4 million observations in fifteen
specialties. 5. Estimation method
Table 2 distinguishes between B and NB providers. B a providers
produce the most DBCs for all categories. and some mental dis- Fig. 4 shows the distribution of treatment durations in the total
orders are almost exclusively treated by B providers, for example sample (this corresponds to ‘Total Sample’ in Table 3) for both
this holds for the categories ‘delirium, dementia and other disor- types of providers. The three vertical black lines correspond to three
ders’, ‘alcohol use disorders’ and to a lesser extent ‘schizophrenia’. treatment duration thresholds at 800, 1800 and 3000 min. The dis-
For NB providers we observe in many cases the profession of the tribution function clearly differs between the B and NB providers.
therapist, we have 1302 psychologists, 431 psychiatrists and 74 The treatment distribution for the budgeted providers is smooth
providers working in institutions. For B providers we do not observe for all treatment durations. However, in stark contrast with the B
the profession because they are all grouped together in a large
regional institution. The data contains for each DBC information
on the type of therapy (for example adult, forensic, crisis or child
17
We choose for these four categories because they are most prevalent treated
care) and whether this is individual therapy, or (also) group ther-
mental illnesses with a clear diagnosis (see Table 2).
apy or a overnight stay. Other variables are the reason for closing a 18
The patient has some mild symptoms (e.g., depressed mood and mild insom-
DBC (for example closed on a regular basis, or duration exceeding a nia) [GAF scale 61–70], moderate symptoms (e.g., flat affect and circumlocutory
year, or patient dissatisfied with treatment), and whether providers speech, occasional panic attacks) [GAF-scale 51–60] or serious symptoms (e.g., sui-
have prescribed drugs during a treatment. Another important vari- cidal ideation, severe obsessional rituals, frequent shoplifting) [GAF-scale 41–50].
19
The age and sex distributions are very similar across subsamples.
able are the global assessment of functioning (GAF) scores. The 20
The number of observations shrinks the more narrowly the subsample is defined.
GAF-score is a quality measure for the severity of a patient’s men- Important is also that many records were not filled in completely, and therefore had
tal illness. GAF-scores range between 0 (very severe symptoms) to be excluded from our subsample.
Table 2
Type of provider and number of DBCs (years 2008–2010)* .
Specialty Budgeted providers (%) Non-budgeted providers (%) Total
Depression 181,487 (77%) 54,876 (23%) 236,363

Anxiety disorders 142,747 (80%) 36,050 (20%) 178,797
Adjustment disorders 115,416 (79%) 30,222 (21%) 145,638
Personality disorders 107,545 (89%) 12,631 (11%) 120,176
Hyperkinetic disorders 91,126 (78%) 26,200 (22%) 117,326
Other diagnoses 73,216 (85%) 12,554 (15%) 85,770
Schizophrenia 75,096 (94%) 5023 (6%) 80,119
Pervasive disorders 76,633 (98%) 1580 (2%) 78,213
Delirium dementia and other disorders 52,891 (100%) 202 (0%) 53,093
Other substance use disorders 43,958 (98%) 943 (2%) 44,901
Alcohol use disorders 40,717 (97%) 1320 (3%) 42,037
Other childhood disorders 27,969 (68%) 13,432 (32%) 41,401
Bipolar disorders 34,557 (91%) 3,443 (9%) 38,000
Other mental disorders and problems 112,309 (75%) 37,476 (25%) 149,785
Total 1175,667(83%) 235,952 (17%) 1411,619
*
The numbers in the table correspond to DBCs with treatments duration smaller than 4000 min.
Table 3
Number of observations in various subsamples (years 2008–2010)** .
Budgeted providers (%) Non-budgeted providers (%) Total
1. Total sample 1175,667 (83%) 235,952 (17%) 1411,619

2. Sample depression 181,487 (77%) 54,876 (23%) 236,363
2a GAF: 41–70* 57,740 (65%) 30,508 (35%) 88,248
2b GAF: 41–50* 12,132 (71%) 4963 (29%) 17,095
2c GAF: 51–60* 32,730 (65%) 17,395 (35%) 50,125
2d GAF: 61–70* 12,878 (61%) 8150 (39%) 21,028
3 Sample anxiety disorders 142,747 (80%) 36,050 (20%) 178,797
3a GAF: 41–70* 55,505 (72%) 21,581 (28%) 77,086
3b GAF: 41–50* 10,360 (78%) 2934 (22%) 13,294
3c GAF: 51–60* 31,051 (72%) 12,086 (28%) 43,137
3d GAF: 61–70* 14,094 (68%) 6561 (32%) 20,655
4 Sample adjustment disorder 115,416 (79%) 30,222 (21%) 145,638
4a GAF: 41–70* 55,545 (72%) 21,607 (28%) 77,152
4b GAF: 41–50* 5985 (76%) 1934 (24%) 7919
4c GAF: 51–60* 30,571 (72%) 12,067 (28%) 42,638
4d GAF: 61–70* 18,989 (71%) 7606 (29%) 26,595
5 Sample personality disorder 107,545 (89%) 12,631 (11%) 120,176
5a GAF: 41–70* 39,571 (69%) 17,977 (31%) 57,548
5b GAF: 41–50* 8467 (78%) 2457 (22%) 10,924
5c GAF: 51–60* 22,102 (69%) 10,056 (31%) 32,158
5d GAF: 61–70* 9002 (62%) 5464 (38%) 14,466
*
In these samples we only consider individual adult therapies without medical prescriptions that were closed on a regular basis.
**
The numbers in the table correspond to DBCs with treatments duration smaller than 4000 min.
providers, for NB providers we observe large gaps and spikes at estimation approach which allows us to estimate in one step the
thresholds. Similar figures are obtained if we plot subsamples of distribution functions for both types of providers.
our dataset. We fit the non-linear regression equation (5) for each mental
To estimate whether B providers treat on average longer or disorder category i, and provider type j (in what follows we omit i,
shorter than NB providers we use ideas from regression discon- j):
tinuity design (RDD)21 . However, while RDD-studies use local
Yt = f ˇ + t with t = Bt − Gt + εt (5)
linear smoothing around single thresholds to determine non-linear
responses, we have reasonably large bunches and gaps of several where Yt , t = 3, 3.5, 4, . . ., 39, 39.5 is the distribution function
thresholds that may be connected22 . Therefore, we use a global of treatment durations defined in treatment duration classes of
50 min23 . Alike Lee and Lemieux (2010) we assume that all fac-
tors evolve “smoothly”. If there are no discontinuities (Gt = 0, Bt = 0)
in the reimbursement schedule, f(ˇ) would be a reasonable guess
21
RDD studies related to health care include Card et al. (2008, 2009) who study for explaining Yt . This assumption is confirmed by estimates of f(ˇ)
the discontinuity of health care utilization around age 65 when US citizens become for the distribution function of B providers.
eligible for Medicare. Sojourner et al. (2013) use RDD to study the effects of union-
In standard RDD applications, sudden shifts in the outcome
ization of nursing homes. Shi (2013) finds evidence of income manipulation when
studying labor supply responses to income cutoffs of a subsidized health insurance variable result from an exogenous change. In this study we have
program in Massachusetts. Einav et al. (2013) study the response of drug expen- the same. Bunches and gaps in treatment durations of the NB
diture to non-linear contracts in Medicare part D. These studies are all related to providers are caused by exogenous changes in the fee structure, and
consumer responses. Our study is about provider responses and more related to
Bajari et al. (2011) who study hospital’s responses to discontinuities in linear reim-
bursement schedules. Their identification strategy is much more complicated than
23
in our paper because reimbursement schedules are only discontinuous in the first Thus, Y3 represents all treatment durations in the 300–350 min time interval and
derivative, and thresholds are not fixed but may differ across hospitals. Y39.5 in the 3950–4000 min time interval. The size of the surface of all distributions
22
For example, combining several separate local linear estimation procedures to is normalized to 1. Note that we performed our analysis also for classes of 100 min
one distribution function may not necessarily result in a smooth function. time intervals. This yielded similar, but slightly less stable, results.
Fig. 4. Distribution of treatment duration for B and NB providers (all categories).
not by medical outcome or other unobserved factors of individual

patients. This implies that a conscious prolongation of treatment
duration by NB providers, introduces systematic errors in Yt . In (5),
t represents both systematic and random errors. We distinguish
systematic positive errors or “bunches” after a threshold (Bt ≥ 0)
and systematic negative errors or “gaps” before a threshold (Gt ≥ 0).
Lastly, εt represents the random error term in (5).
To estimate the smooth function f(ˇ) we constructed a class
of smoothing functions that are able to describes similar shapes
as the B providers in Fig. 4. A property of this function is that it
must increase at t = 300, has a top somewhere around t = 600 min,
and monotonically declines thereafter24 . Furthermore, the func-
tion must be flexible enough to capture various shapes. Exponential Fig. 5. Estimation of distribution function and bunches and gaps.
function (3) satisfies these criteria:
To estimate ˇ, we follow a weighted non-linear least squares
minimization problem with four restrictions.

39.5
2
min wt Yt − f ˇ with restrictions :
ˇ
t=3
(7)

10.5

20.5

32.5

39.5

Yt − f ˇ = 0, Yt − f ˇ = 0, Yt − f ˇ = 0, Yt − f ˇ =0
t=5 t=11 t=21 t=3
The first three restrictions correspond to the shift of treatment

durations: B[k] − G[k] = 0, for k = 8, 18, 3025 . We observed in the data
that bunching occurs up to 300 min after a threshold. Therefore we
fixed possible bunching to the first 300 min after a threshold in our
restrictions.
ˇ3 ˇ4 To obtain smooth convergence of our non-linear estimations,
f (ˇ) = ˇ1 + ˇ2 t + + + ˇ5 e−ˇ6 t (6) we added a fourth restriction: the total sum of the errors is zero26 .
t t
Weights wt were also introduced27 . Our global estimation strategy
with restrictions is quite powerful compared to three separate local
We have to estimate the six parameters ˇj , j = 1, . . ., 6 in (6). First,
we substitute (6) in (5). Then we estimate (5). The size of the gap
before each threshold [k] ∈ {[8], [18], [30]} should equal the size of 7.5 10.5 17.5
25
G[8] = − t , B[8] = t , G[18] = − t , B[18] =
the bunch after this threshold (see Fig. 5). This restriction reflects 20.5 4.5
29.5 8
32.5 11
our theory in Section 2: bunching after a threshold occurs through t , G[30] = − t , B[30] = t .
18 21 30
26
a shift of treatment durations from before to after a threshold. This implies all systematic shifts are explained by the three previous restrictions,
and that no treatments with duration between 300 and 500 min are shifted to over
800 min threshold, and between 3300 and 4000 min are shifted to over the 6000 min
threshold.
27
In most cases we used wt = 1, however sometimes we experimented with
somewhat higher weights to obtain smooth convergence. We performed our opti-
24
Using standard smoothing functions in econometric software programs do not mizations with the numerical non-linear global optimization function “NMinimize”
work here because these functions ‘try to explain’ the bunches and gaps as well. of the software program Mathematica. To obtain convergence we sometimes had to
RDD-estimations at each individual threshold. The global approach thresholds. In total about 11–13% of treatments are shifted to over
allows us to connect the “bunches” and “gaps” estimates at individ- a next threshold. The second column in Table 4presents
average
ual thresholds and convergence of our estimation procedure will treatment duration. The difference between f ˇ ˆ and Yt for B
only occur if our assumption of equal gaps and bunches is supported
by the data. providers is small, confirming the good fit and resulting in small
ˆ ˆ standard errors sB30 . ForNB providers the average treatment dura-
Minimization procedure (7) generates ˇ1 , . . ., ˇ6 . This allows
in ˆ
us to compute ˆ
ˆ t = Yt − f ˇ . Next, we can compute our estimates tion corresponding to f ˇ is 19–24 min lower than Yt , indicating
for the gaps and bunches. that the increase in average treatment duration as a result of bunch-
In order to present the significance of our estimates for bunches ing is relatively small31 . Important is the large difference in average
and gaps we need an estimate for our error term εt in (5). Because treatment duration between B and NB providers in the “Total sam-
our computation does not allow us to compute for each t, B̂t , Ĝt in ple”, 22.2%, and “Total sample depression”, 24.2%, indicating that
(5) separately, we cannot properly estimate the random error term B providers treat on average more sick patients. After control-
εt . Therefore we assume ε̂t = ˆ Bt where
ˆ Bt are the estimated errors ling for patient characteristics (“Subsample depression, GAF scores
of the budgeted providers after estimating (5). Thus, we assume the 41–70”) treatment duration shrinks to 2.2%. In the third column of
standard error of the non-budgeted providers sNB in (5) equals the Table 4 we present average treatment costs. The unintended effects
standard error of the budgeted providers sB :28 increase average costs per treatment by 137 to 157 euro or a cost

increase of 7.1 to 7.9%. The efficiency effect for the “Total Sample
1 2 Depression, GAF scores 41–70” yields that on average treatments
NB B
S =S = Bt (8) are 3.3% (or 39 euro) more expensive for B than NB providers. This
(74 − 6) t
effect is however more than offset by the unintended effects; sum-
We use a 68 degrees of freedom correction (see e.g. Verbeek, ming both effects yields that NB providers treat on average patients
2004), 74 minus 6 (parameters ˇ to estimate in (8)). After obtain- 165–39 = 126 euro more expensive than B providers32 .
ing these statistics we can derive additional statistics such as an In addition to the three subsamples, we have also looked into
estimate of the average treatment duration, prolongation time as a other mental illnesses (see Table 3 for the subsamples and the num-
result of shifting treatments and associated costs. ber of observations in each subsample). We performed the same
estimations for these sixteen subsamples. The results are reported
6. Estimation results in Table 5. Columns (1)–(3) present the volume effects. Column
(1) represents the size of the unintended effects: the percentage
In this section we present our estimation results for all the sam- treatments that are shifted to over a next threshold. Column (2)
ples described in Table 3. We first show our results graphically in shows the average treatment duration for the actual distribution
Fig. 6 for the three samples 1, 2 and 2a in Table 3: “total sample”, Yt , and estimated distribution f ˇ ˆ , and column (3) shows the
“depression” and the subsample “depression with similar patient
differences in treatment duration or the “efficiency” effect; the per-
characteristics (GAF-scores 41–70)”. Fig. 6 contains for each sam-
cent change in treatment duration between NB and B providers.
Fig. 6a, d and g, show Yt and
ple three panels. The first panels,
Columns (4)–(6) show the same effects but now for fees.Col-
the corresponding estimate f ˇ ˆ of the B provider, from which
umn (4) shows the average fee of a treatment for Yt and f ˇ ˆ .
we will derive an estimate for our standard error. The estimates
Column (5) presents the unintended cost effects; the percent dif-
indicate that our exponential identification in (6) can fit f(ˇ) to Yt
ference between the two variables. Finally, column (6) represents
very well. The middle panels, Fig. 6b, e and h, indicate the unin-
the cost difference related to the “efficiency” effect between NB and
tended effects. Bunches and gaps are present in all three samples.
B providers.
The size of bunches and gaps are remarkably stable across subsam-
The results in Table 5 confirm our previous findings. First of all,
ples. Bunches and gaps are largest (and significant) at the first two
we observe that the unintended effects (column (1)) are present in
thresholds of 800 and 1800 min and positive at the threshold of
all subsamples. The effects are fairly stable across all our subsam-
3000 min in all cases29 . Differences in treatment duration between
ples and vary roughly between ±11 and 13%, with some outliers33 .
both providers, after controlling for the bunches and gaps, can be
This corresponds with a cost increase that varies between ±7
seen in the right three panels, Fig. 6c, f and i. For the total sample
and 9% (column 5). The efficiency effect in column (3) shows
and depression sample (panels c and f) we observe large effects; on
that B providers treat patients approximately ±2–7% longer than
average NB providers treat patients much shorter than B providers.
NB providers with corresponding cost increases of approximately
However, this effect almost disappears in the case of patients with
±3–6% (column (6))34 . Thus, for almost all cases we find that the
similar characteristics (panel i). Controlling for patient characteris-
tics is therefore crucial to identify possible differences in treatment
duration between B and NB providers.
The estimation results of the three subsamples are summarized 30
For smaller subsamples the graph Yt is less smooth increasing the size of the
in Table 4. The first column presents the unintended effects: the standard error sB .
31
percentage of treatments that are shifted over each of the three The average prolongation of treatment duration for treatments that are shifted
over to a next threshold is about 200 min.
32
We have tested the significance of the efficiency effect with thenon-parametric

Kolmogorov–Smirnov test. It rejected the hypothesis of similar f ˇ ˆ distribution
alter the minimization method in Mathematica (gradient-based and direct search functions for B and NB providers in the first two samples in Table 4. However, it does
methods), weights and starting values. not reject the hypothesis in the third sample. We therefore test various different
28
We make the reasonable assumption that the random errors and corresponding subsamples in Table 5.
standard deviations sB and sNB are of the same order of magnitude. If there are 33
The estimation results for the unintended effects are all significant on a 0.01
small systematic errors in ˆ Bt we will overstate sNB Note that we calculate sB from level.
34
a Yt distribution that has the same number of observations as the corresponding Yt Only for the subsample adjustment disorders GAF: 61–70 we find a 0.4 higher
distribution of the NB providers. average treatment duration for NB providers. The efficiency effects are not signif-
29
The effects are smaller around the 3000 min threshold; there are fewer obser- icant at a 0.05 level (see footnote 29). However, we still conclude that efficiency
vations and it may be the case that the marginal benefit to patients is closer to zero effects are present in our data because we repeated our estimations many times
(i.e. “flat of the curve”). (see column (3)) and our data covers the complete sample.
Fig. 6. (a–c) Total sample. (d–f) Total sample depression. (g–i) Subsample depression, GAF scores 41–70.
Table 4
Estimation results for “total sample”, “total sample depression” and “total sample depression (GAF scores 41–70)”.a
Bunches and gaps (%) Average treatment Average treatment

Unintended effect duration (min) costs (euro)

B̂[8] B̂[18] B̂[30] Total Yt f ˆ
ˇ dif Yt f ˆ
ˇ Unintended effect
Total sample (all specialties)

B providers 1407 1407 2279 2266
NB providers 7.6** 3.1** 0.7** 11.5** 1170 1151 19 2053 1901 152 (8.0%)
Efficiency effect 22.2% 19.8%
Total sample depression

B Providers 1345 1345 2226 2214
NB providers 8.4** 3.6** 0.5* 12.5** 1107 1083 24 2008 1847 161 (8.7%)
Total sample depression (GAF scores 41–70)

B Providers 1224 1224 2043 2036
NB providers 8.3** 4.0** 0.6* 12.9** 1204 1185 19 2149 1986 165 (8.2%)
a
The *,** in the Table indicate significance levels of 0.05 and 0.01 respectively. Average treatment costs for B providers are calculated on the premise that they are paid
according to the reimbursement schedule for NB providers.
Table 5
Estimation results for subsamples 2a–d, 3a–d, 4a–d, 5a–d (see Table 3).
Subsample (1) Bunches, (2) Avg. treatment (3) “Efficiency” (4) Avg. treatment (5) Unintended (6) “Efficiency”
gaps (%) duration (min) effect (%) (min) costs (euro) effect (euro) effect (%)
Type of provider NB NB NB (NB-B)/B NB NB NB (NB-B)/B:

Distribution Yt Yt f ˆ
ˇ Yt , f ˆ
ˇ Yt f ˆ
ˇ Yt , f ˆ
ˇ Yt , f ˆ
ˇ
Depression
2a. GAF: 41–70 12.9 1204 1185 −3.3 2149 1986 8.2 −2.9
2b. GAF: 41–50 14.1 1330 1300 −2.3 2353 2152 9.4 −2.4
2c. GAF: 51–60 13.1 1215 1189 −4.0 2162 1987 8.8 −3.5
2d. GAF: 61–70 11.8 1105 1083 −3.9 1996 1844 8.2 −3.2
Anxiety disorders
3a. GAF: 41–70 12.1 1186 1161 −7.9 2094 1926 8.7 −7.5
3b. GAF: 41–50 11.3 1325 1303 −7.1 2327 2146 8.4 −6.9
3c. GAF: 51–60 12.8 1201 1175 −8.4 2118 1942 9.1 −8.0
3d. GAF: 61–70 11.4 1096 1076 −7.0 1944 1800 8.0 −6.6
Adjustment disorders
4a. GAF: 41–70 10.6 1054 1039 −2.2 1761 1645 7.1 −2.1
4b. GAF: 41–50 10.6 1215 1174 −2.8 2010 1839 9.3 −2.1
4c. GAF: 51–60 10.8 1061 1042 −4.5 1771 1646 7.6 −4.3
4d. GAF: 61–70 9.5 1000 984 0.4 1682 1572 7.0 0.3
Personality disorders
5a. GAF: 41–70 11.6 1391 1372 −5.5 2435 2251 8.1 −5.1
5b. GAF: 41–50 11.9 1495 1475 −5.2 2598 2402 8.2 −5.2
5c. GAF: 51–60 12.3 1422 1402 −5.3 2489 2290 8.7 −5.3
5d. GAF: 61–70 10.4 1286 1277 −5.2 2265 2117 7.0 −4.4
marginal loss line d is situated somewhat below the line ˛j c (see An important message of our study is that the unintended effects
Fig. 3 in Section 3). The unintended financial effects in column (5) clearly demonstrate that mental health care providers react to
are in all cases larger than the “efficiency” effects (column (6)). financial incentives. Since financial rewards are high, one would
To conclude, the unintended effects appear very clear in the data expect that NB providers would anticipate ex-ante on the thresh-
and are very stable across all subsamples. The “efficiency” effects olds in the reimbursement schedule. Indeed, an article in a Dutch
are smaller and less certain because these effects are estimated by newspaper suggests that some providers have institutionalized the
comparing B and NB providers. A limitation of our measure for the number of therapy sessions. A psychologist stated: “Our institution
“efficiency” effects could be that there is still unobserved variation has calculated that each patient should receive eight ór sixteen ses-
in the treatment and control group that we do not capture ade- sions. This would be financially very attractive but not be seen as
quately. For example, we may have overestimated the efficiency fraud” (Effting, 2015).
effects if NB providers select more low severity patients (even for Monitoring providers’ behavior is therefore an important ele-
groups with similar GAF scores)35 . Another possibility is that our ment for the system to function properly. In the Dutch system of
“efficiency” effect captures not genuine efficiency but quality dif- regulated competition health insurers have the role to discipline
ferences in outcome between B and NB providers. In future research providers. However, until 2014 health insurers lacked informa-
we may be able to address some of these points if more information tion about the exact treatment duration of health care providers.
becomes available. They received only global information on treatment duration of
individual providers, i.e. they received only information between
which two treatment duration thresholds the provider performed
7. Discussion
the treatment, and not the exact treatment time. Thus, insurers had
no possibility to perform the same analysis as we carried out in this
We have evaluated the implementation of a new reimburse-
paper. This is now gradually changing; since 2014 health insurers
ment schedule in Dutch mental health care. The reimbursement
obtain exact information about treatment durations and are also
schedule follows a discontinuous discrete step function: once the
becoming more financially responsible for mental health care cost
provider has passed a treatment duration threshold the fee is flat
containment.
until a next threshold is reached. We find an “efficiency” effect: on
We measure an “efficiency” effect. However, we cannot be cer-
the flat part of the fee schedule providers prolong treatment only if
tain that we measure genuine efficiency since we cannot rule out
marginal benefits to patients outweigh marginal costs. We estimate
the possibility that patients may also have received too little care.
a reduction in treatment duration by 2 to 7% and lower costs by 3 to
Our efficiency arguments do hold if we assume ˛j = 1 in our utility
6% compared to a control group. However, we also find unintended
function (1), which is a fairly standard assumption (McGuire, 2000).
effects: providers treat patients longer to reach a next threshold
In that case NB providers produce cost efficient on the flat part of the
and obtain a higher fee. The data shows gaps and bunches in the
reimbursement schedule and bunching corresponds to overtreat-
distribution function of treatment durations, just before and after
ment. Efficiency differences between B and NB providers could also
a threshold. In total, about 11 to 13% of treatments are shifted to
be related to differences in practice styles or quality of treatments
over a next threshold, resulting in a cost increase of approximately
(see e.g. Chandra et al., 2012). To address these issues more properly
7 to 9%.
quality information about treatments would be necessary.
In 2014, the Dutch government decided to pay B providers also
according to the new reimbursement schedule. Our findings sug-
35
If this were the case then we would expect that selection effects are greater for gest that this policy may lead to higher costs since the higher costs
less severe patients (groups with high GAF-scores). However, we do not observe
that efficiency effects are stronger for those groups (see Table 5).
associated with the unintended effects outweigh the lower costs
of the efficiency effect. However, an important difference is that tariffs and providers should ultimately get paid taking into account
B providers correspond to large mental health institutions where the patient’s wellbeing as well. This is a long shot and much more
doctors are still paid on a salary basis, so the unintended effects research is needed to integrate quality aspects into the payment
have to be induced by the management. Furthermore, there are still system.
many external dynamic demand and supply factors that are diffi- In this study we rely on providers that register their own DBCs.
cult to assess. For example, B providers may put a lower weight We assume that providers register their treatment duration cor-
on profits (lower agency parameter ˛j in (1) for B providers) than rectly and honestly in their administration. However, literature
NB providers because the latter category of providers is of a more indicates that fraudulent behavior may also occur in payment sys-
entrepreneurial type. In sum, the unintended effects may therefore tems based on DRGs in the US, or DBCs in the Netherlands. This
be lower for B providers than NB providers. Also, insurers may be fraudulent behavior is often referred to as ‘upcoding’ (Steinbusch
better equipped to monitor providers’ treatment duration and, in et al., 2007). The Dutch reimbursement system may be vulnerable
the longer run, quality. Another important difference is that the to this ‘upcoding’ because Dutch providers code DBCs themselves.
Dutch government changed the flat reimbursement fees to maxi- They could tamper with the data. Especially, in mental health care
mum fees. Thus, health insurers can bargain with providers lower the risk for fraud may even be greater than for less discretionary
reimbursement fees, if providers’ performances turn out to be inad- treatments, as hip or knee replacements. Third parties, such as
equate. health insurers, also might find it particularly difficult to verify and
Also the conclusion that the introduction of the new reimburse- dispute mental health diagnoses.
ment schedule for NB providers in 2008 led to higher costs is
premature. Before 2008, NB providers received a fixed fee for each
visit. A fee for each visit is similar to the reimbursement schedule Acknowledgement
in our study but now there are thresholds after each visit of 60 min.
A fee for each visit is closer to a fee-for-service type of payment and We would like to thank the Dutch Healthcare Authority (NZa)
may also result in overtreatment. Unfortunately, we have no data for providing the data. The data are not publicly available. We are
for the period before 2008 available, making a comparison between grateful to the NZa and DBC-Onderhoud for explaining the data. We
the two regimes not possible. would like to thank seminar participants at BU/Harvard/MIT Health
An important policy question is how an optimal reimburse- Economics seminar in Boston at April 4, 2014, at NZa-seminar in
ment schedule for mental health care providers should look like. Utrecht at May 8, 2014, Academy of Health in San Diego, June
The reimbursement schedule that we study in this paper is inter- 8–10, 2014, IHEA in Dublin in July 13–16, 2014, CPB-seminar in
esting because it combines a prospective fee per episode of care The Hague, October 14, 2014, ESE-seminar at Erasmus University,
with elements of fee-for-service, to prevent selection incentives. January 15, 2015, ICMPE at Venice, March 28, 2015 for comments.
The drawback is that the unintended effects are quite large which Furthermore, we are grateful to two anonymous referees, Pieter
may make the schedule less attractive than salary or even fee- van Baal, Pieter Bakx, Leon Bettendorf, Aaron Maras, Tom McGuire,
for-service. One possible way to proceed would be to improve Jan van Ours, Bastian Ravesteijn, Ingrid Seinen, Harry van Til and
the current reimbursement schedule. A first option would be to Gert Jan Verhoeven for providing comments on earlier versions of
diminish the unintended effects by changing the position of the this paper.
thresholds. Ideally, thresholds should be placed where the mass of
the distribution function f(ˇ) is small. If the mass before a thresh-
References
old is small, unintended effects will diminish because there are
only few treatments to shift over to a next threshold. Unfortu- Bajari, P., Hong, H., Park, M., Town, R., 2011. Regression discontinuity designs with
nately, the threshold of 800 min is placed just after the top of an endogenous forcing variable and an application to contracting in health care.
the distribution function (see Fig. 4), thus exacerbating the unin- In: NBER Working Paper No. 17643.
Bellows, N.M., Halpin, H.A., 2008. Impact of Medicaid reimbursement on mental
tended effects. Moving the 800 min threshold to 500 min, just
health quality indicators. Health Serv. Res. 43, 582–597.
before the top of the distribution function, would diminish the Card, D., Dobkin, C., Maestas, N., 2008. The impact of nearly universal insurance
unintended effects. Thus by taking into account provider behavior coverage on health care utilization: evidence from medicare. Am. Econ. Rev. 98
(5), 597–636.
the reimbursement could be made much more attractive. A second
Card, D., Dobkin, C., Maestas, N., 2009. Does medicare saves lives? Q. J. Econ. 123 (1),
option to diminish the unintended effects would be to decrease (or 597–636.
even increase) the number of thresholds. Here there is a trade-off Chandra, A., Cutler, D., Song, Z., 2012. Who ordered that? The economics of treat-
between efficiency, equity and selection. For example, removing all ment choices in medical care. In: Pauly, M.V., McGuire, T.G., Barros, P.P. (Eds.),
Handbook of Health Economics, vol. II. Elsevier, Amsterdam, pp. 397–432.
thresholds would yield a single prospective fee for the total treat- Christianson, J.B., Conrad, D., 2011. Provider payment and incentives. In: Glied, S.A.,
ment. This would remove all unintended effects thereby increasing Smith, P.C. (Eds.), The Oxford Handbook of Health Economics. Oxford University
efficiency. However, if patients’ characteristics across providers dif- Press, Oxford, pp. 624–628.
Douven, R., Mocking, R., Mosca, I., 2015. The effect of physician remuneration
fer substantially, it could also result in a larger income variation on regional variation in hospital treatments. Int. J. Health Econ. Manage.,
across providers thereby diminishing equity considerations across 10.1007/s10754-015-9164-2.
providers. As a result providers might increase their incentives for van Dijk, C.E., van den Berg, B., Verheij, R.A., Spreeuwenberg, P., Groenewegen, P.P.,
de Bakker, D.H., 2013. Moral hazard and supplier-induced demand: empirical
selecting more favorable patients (McGuire, 2000). Adding more evidence in general practice. Health Econ. 22 (3), 340–352.
thresholds might also be an improvement since more thresholds DBC Onderhoud, 2013. Spelregels, DBC-Registratie GGZ, Versie RG13a. DBC Onder-
will diminish the financial incentives at each individual threshold. houd, Utrecht.
Einav, L., Finkelstein, A., Schrimpf, P., 2013. The response of drug expenditure to
More research is necessary to study these trade-offs.
non-linear contract design: evidence from Medicare Part D. In: MIT Working
Another possible way is to change the nature of the payment Paper.
system and to consider a mixed payment system of a prospec- Effting, M., 2015. Tjak, tjak, volgende patient: de ontsporing van de ggz. De Volk-
skrant. (Dutch Newspaper).
tive fee and a linear reimbursement schedule, as advocated by
Ellis, R.P., McGuire, T.G., 1986. Provider behavior under prospective reimbursement.
Ellis and McGuire (1986). The prospective fee reimburses the “non- Cost sharing and supply. J. Health Econ. 5 (1986), 129–151.
contractible” activities that may vary across specialties while the Ellis, R.P., McGuire, T.G., 1990. Optimal payment systems for health services. J. Health
linear reimbursement fee should be set less than marginal (and Econ. 9 (4), 375–396.
Epstein, A.M., et al., 1986. The use of ambulatory testing in prepaid and fee-for-
average) costs. This type of fee schedule is likely to diminish the service group practices: relation to perceived profitability. N. Engl. J. Med. 314,
unintended effects as well. Lastly, quality should be integrated into 1089–1093.
Frank, R.G., McGuire, T.G., 2000. Economics and mental health. In: Culyer, A.J., New- NZa, 2010. De curatieve GGZ in 2009: Ontwikkelingen in aanbod en volume. In:
house, J.P. (Eds.), Handbook of Health Economics, vol. 1B. Elsevier, Amsterdam, Monitor. Nederlandse Zorgautoriteit, Utrecht (in Dutch).
pp. 893–954. NZa, 2011. Curatieve GGZ 2010 Een sector in ontwikkeling. In: Monitor. Nederlandse
Nederland, G.G.Z., 2010. Zorg op waarde geschat, update. In: Sectorrapport ggz 2010, Zorgautoriteit, Utrecht (in Dutch).
Amersfoort. (in Dutch). NZa, 2012. Marktscan Geestelijke Gezondheidszorg. Weergave van de markt
Hickson, G.B., et al., 1987. Physician reimbursement by salary or fee-for-service: 2008–2011. Nederlandse Zorgautoriteit, Utrecht (in Dutch).
effect on a physician’s practice behavior in a randomized prospective study. Rekenkamer, 2013. Indicatoren voor kwaliteit in de zorg. In: Algemene Rekenkamer,
Pediatrics 80, 744–750. March 28. Algemene Rekenkamer, The Netherlands, The Hague.
Jennison, K., Ellis, R.P., 1987. Comparison of psychiatric service utilization in a single Rosenthal, M.B., 2000. Risk sharing and the supply of mental health services. J. Health
group practice. In: McGuire, Scheffler (Eds.), The Economics of Mental Health Econ. 19 (6), 1047–1065.
Services: Advances in Health Economics and Health Services Research, vol. 8. Shi, J., 2013. Labor supply response to income cutoffs of health insurance in the
JAI Press, Greenwich, USA, pp. 175–194. Massachusetts reform. In: Working Paper. Boston University.
Lee, D.S., Lemieux, T., 2010. Regression discontinuity designs in economics. J. Econ. Sojourner, A.J., Grabowski, D.C., Town, R.J., Chen, M.C., Frandsen, B.R., 2013. impacts
Lit. 48, 281–355. of unionization on quality and productivity: regression discontinuity evi-
Mason, A., Goddard, M., 2009. Payment by Results in Mental Health: A Review of the dence from nursing homes. In: Working Paper, https://economics.byu.edu/
International Literature and an Economic Assessment of the Approach in the frandsen/Documents/Nursing Home Unions.pdf.
English NHS, Research Paper 50. Centre for Health Economics, The University of Stearns, S., Wolfe, B., Kindig, D., 1992. Physician responses to fee-for-service and
York. capitation payment. Inquiry 29, 416–425.
McGuire, T.G., 2000. Physician Agency. In: Culyer, A.J., Newhouse, J.P. Steinbusch, P.J.M., Oostenbrink, J.B., Zuurbier, J.J., Schaepkens, F.J.M., 2007. The risk
(Eds.), Handbook of Health Economics, vol. 1A. Elsevier, Amsterdam, of upcoding in casemix systems: a comparative study. Health Policy 81 (2–3),
pp. 461–536. 289–299.
NZa, 2007. Tariefbeschikking DBC GGZ 2008. Nederlandse Zorgautoriteit, Utrecht Van de Ven, W.P.M.M., Schut, F.T., 2008. Universal mandatory health insurance in
(in Dutch). The Netherlands: a model for the United States? Health Affairs 27 (3), 771–781.
NZa, 2008. Tariefbeschikking DBC GGZ 2009. Nederlandse Zorgautoriteit, Utrecht Verbeek, M., 2004. A Guide To Modern Econometrics, second ed. Wiley, New York,
(in Dutch). NY.
NZa, 2009. Tariefbeschikking DBC GGZ 2010. Nederlandse Zorgautoriteit, Utrecht (in VWS, 2010. Interdepartementaal beleidsonderzoek curatieve GGZ. In: Attachment
Dutch).NZa (2010). In: Invoering Prestatiebekostiging Curatieve GGZ: Advies op by the Report: Heroverweging curatieve zorg. Ministry of Health, Welfare and
Hoofdlijnen. Duth Healthcare Authority, Utrecht (in Dutch). Sport, The Hague (in Dutch).

Health and agricultural productivity: Evidence from Zambia

Günther Fink a,∗ , Felix Masiye b
a
Harvard School of Public Health, USA
b
Department of Economics, University of Zambia, Zambia
Article history: We evaluate the productivity effects of investment in preventive health technology through a random-
Received 19 August 2014 ized controlled trial in rural Zambia. In the experiment, access to subsidized bed nets was randomly
Received in revised form 12 April 2015 assigned at the community level; 516 farmers were followed over a one-year farming period. We find
Accepted 21 April 2015
large positive effects of preventative health investment on productivity: among farmers provided with
access to free nets, harvest value increased by US$ 76, corresponding to about 14.7% of the average output
value. While only limited information was collected on farming inputs, shifts in the extensive and the
intensive margins of labor supply appear to be the most likely mechanism underlying the productivity
I15
J24
improvements observed.
J43 © 2015 Elsevier B.V. All rights reserved.
Keywords:
Investment
Health
Productivity
Agriculture
Malaria
1. Introduction sector jobs are generally scarce. Despite major government efforts
to reduce the burden of the disease in recent years (NMCC, 2010;
Despite the rapid speed of urbanization over the past decades, Zambia Ministry of Health, 2006), malaria continues to be the pri-
rural small-scale farming remains the primary source of food and mary cause of short-term morbidity in the country, with children
income for a majority of the population in developing countries and adults experiencing up to five episodes of malaria per year
(World Bank, 2007). In most settings, the degree of agricultural (NMCC, 2010; WHO, 2009). Since the planting season tends to over-
mechanization is limited, so that agricultural production remains lap with the malaria season, health related absences from field work
primarily dependent on the availability and productivity of human are frequent, and are commonly cited by local farmers as primary
labor. While labor is abundant in principle in most developing cause of lost field work and income.1
countries (Pitt and Rosenzweig, 1986), labor inputs can be com- To evaluate the degree to which health affects agricultural pro-
promised by episodes of ill health and can result in output losses if ductivity, we conducted a cluster-randomized field experiment
absent labor cannot be replaced immediately. with 516 farmers in Katete District, Zambia, from December 2009
In this paper we investigate the economic impact of short-term to August 2010. As part of the experiment, farmers were ran-
morbidity on agricultural output in the context of small-scale farm- domly selected for bed net programs, which allowed them to obtain
ing in Zambia. The study setting is representative of many rural long-lasting insecticide treated nets (LLITNs) through agricultural
areas in the developing world both in terms of the general lack of loan program schemes at differentially subsidized prices. The basic
advanced farming technology and in terms of the dominant role intuition underlying the experiment is relatively straightforward:
of farming as source of nutrition and income. With farming land as long as household labor and consumption decisions are non-
available free of charge in most communities, a large majority of separable from household production decisions2 (Benjamin, 1992),
the working-age population engages in agriculture, while formal
1
On average, farmers surveyed at baseline claimed that their harvest would
∗ Corresponding author at: Harvard School of Public Health, Boston, MA 02115, increase by 30% if field work was not interrupted by episodes of ill health.
2
USA. Tel.: +1 6174327389. If consumption and production decisions were perfectly separable, family labor
E-mail address: gfink@hsph.harvard.edu (G. Fink). could be perfectly substituted for by hired labor.
152 G. Fink, F. Masiye / Journal of Health Economics 42 (2015) 151–164
decreased exposure to malaria should increase the time and energy production increases by up to 31% with HIV treatment, and attribute
farmers can spend on their fields, and thus also increase the final this increase to increased overall labor supply and improved phys-
harvest amounts. ical and mental health. Similar effects were, however, not found
In a first paper based on this experiment, we analyzed the impact for iron supplementation and deworming among tea pluckers in
of the additional LLITNs distributed on self-reported morbidity Bangladesh (Gilgen et al., 2001). Most similar to the results pre-
(Fink and Masiye, 2012). In this paper, we analyze the impact of sented in this paper are two cross-sectional studies using harvest
the net programs on agricultural productivity, the main outcome data to compute the agricultural output effect of malaria: Girardin
variable of the trial. In the first part of our analysis, we analyze the et al. (2004) analyze vegetable farming in Côte d’Ivoire, and find that
impact of the interventions on net ownership and usage. Consistent farmers who reported being sick more often had 47% lower yields.
with recent work by Tarozzi et al. (2014), we find a substantial frac- Morel et al. (2008) use total farming output to quantify the agricul-
tion of farmers to be willing to purchase LLITNs at full or partially tural loss generated by work days lost due to malaria in Vietnam,
subsidized prices when financing options are provided. On average, and find an average cost of US$ 11 per case of malaria, suggesting
farmers in the loan group acquired 0.9 nets, resulting in a 24% point returns to malaria prevention similar to the ones identified in this
increase in the average fraction of sleeping spaces covered at the paper. Conceptually, an overwhelming majority of this literature
household level. suggests strong links between health and agricultural production in
In the second part of the paper, we estimate the impact of the low-income setting; this suggests that household production, labor
bed net programs on agricultural production. In order to facili- and consumption decisions are generally not separable (Benjamin,
tate a rapid distribution of bed nets, treatments were randomly 1992), a finding which is also supported by recent evidence from
assigned at the cluster level prior to the collection of baseline Zambia (Fink et al., 2014).
data in the experiment. The non-stratified cluster-level randomiza- While this study primarily focuses on household-level out-
tion resulted in a rather unbalanced sample, with treated farmers comes, the results presented here naturally also link to the broader
on average both larger and more productive than farmers in literature on the relation between health and income. Most of the
the control group. To address these imbalances, we focus on micro-level literature in this area has focused on the long term
analyzing changes in production outcomes between the 2009 (pre- benefits of improved childhood health in terms of education and
intervention) and the 2010 (post-intervention) farming seasons. labor market outcomes (Bleakley, 2007; Bleakley and Lange, 2009;
The point estimates from our preferred specification suggest that Clarke et al., 2008; Kremer and Miguel, 2004). This paper highlights
the returns to bed nets in the study sample were large: on aver- a more immediate and direct effect of health on income similar
age, we find that access to free bed nets (three nets for a typical to the results shown in Thomas et al. (2010) for iron supplemen-
household) increased agricultural output by US$ 76, which corre- tation; this effect will clearly not apply in all low resource settings,
sponds to 14.7% of the average annual harvest value. To address but may be of particular importance among rural and frequently
omitted variable bias concerns, we include a large set of covariates impoverished populations.
in our empirical models, and run an extensive series of robustness The rest of the paper is structured as follows: we provide a
and heterogeneity checks. Overall, treatment effects appear largest detailed description of the study site and local agriculture prac-
among more educated farmers as well as farms with more diver- tices in Section 2. In Section 3, we present the study design and
sified portfolios, and larger for cotton (as the more labor intensive provide details on study implementation. In Section 4, we ana-
crop) than for maize. lyze the effects of the bed net programs on net ownership and net
In the last part of our analysis, we explore potential mechanisms usage. In Section 5, we estimate the impact of the net programs on
underlying the productivity impacts observed. Unfortunately only productivity. Section 6 shows some evidence on the mechanisms
limited and self-reported data on malaria incidence (and no data underlying the main productivity results. We conclude with a short
on parasitemia or asymptomatic malaria) was collected as part of summary and discussion in Section 7.
this project. However, the general patterns observed in the data
suggest that the programs likely induced substantial reductions in
the days of field work lost due to ill health. Given that full recov- 2. Study background
ery from acute malaria is often slow, reduced exposure to malaria
can increase the marginal product of labor (Nur, 1993), particu- Fig. 1 shows the geographic location of the study site within
larly in cases where malaria induces anemia (Ehrhardt et al., 2006). Zambia. Katete district is one of eight districts within Zambia’s East-
While there is theoretically also the possibility that the reduced ern Province. Eastern Province is one of the least developed regions
exposure to ill health may have been associated with a reduction of Zambia, with a majority of the population living below the one-
in direct medical expenditure, most malaria treatment in the area dollar-per-day poverty line, and an estimated under-5 mortality
appears to be provided for free, so that no evidence of lower health rate of 151 per 1000 live births (Macro International, 2007). Katete
expenditure was found. district is similar in its topography to the Western part of Malawi,
Even though this paper is to our knowledge the first one using which is located about 100 km east of the district. The current
experimental data to evaluate the productivity effects of malaria, district population is estimated at 250,000, approximately half of
several studies have analyzed agricultural output in the context which live in the urban centers of Sinda and Katete (Zambia Central
of nutrition and other diseases. Following the initial work by Statistic Office, 2011a).
Strauss (1986) as well as Pitt and Rosenzweig (1986), Behrman Malaria is endemic in most parts of Zambia, and the primary
et al. (1997) document a rather robust association between cause of short term morbidity in the country (Zambia Ministry of
nutritional improvements and production in agricultural settings. Health, 2012). The regional climate displays pronounced seasonal
Loureiro (2009) and Ulimwengu (2009) find positive associations fluctuations, with virtually no rainfall from May to November, fol-
between health and productivity using stochastic frontier regres- lowed by a period of major rainfall from December to April. The
sion techniques. Audibert and Etard (2003) examine the effect of strong seasonal patterns are directly reflected in the seasonal fluc-
schistosomiasis among rice-growers, and find that exposure to tuations of malaria. Malaria in the area is considered endemic and
schistosomiasis reduces production by 26%. Fox et al. (2004) ana- seasonal, with a majority of the transmission occurring between
lyze the productivity declines associated with HIV positivity, and December and May, when continued rainfalls support the breed-
find that HIV-positive workers earn on average 16–17% less over a ing of the Anopheles mosquito larvae. According to the latest round
two year period. Similarly, Baranov et al. (2012) show that maize of the Malaria Indicator Survey, Eastern region is among the areas
G. Fink, F. Masiye / Journal of Health Economics 42 (2015) 151–164 153
Fig. 1. Zambia (white) and Katete District (shaded red). (For interpretation of the references to color in this text, the reader is referred to the web version of the article.)
with the highest parasite prevalence in the country, with parasites and per 5 km2 as computed by the Malaria Atlas Project in 2007
detected in 22 percent of children under the age of five in early (Hay and Snow, 2006). Katete district ranks among the most highly
2010 (NMCC, 2011). Fig. 2 shows the clinical burden of Plasmodium exposed areas of the country, with an estimated annual burden of
falciparum malaria in terms of the number of clinical cases per year approximately 500 cases per 5 km2 .
Since 2006, Zambia has made major efforts to reduce the burden
of malaria (Ashraf et al., 2010; Zambia Ministry of Health, 2006).
As part of the internationally supported Rollback Malaria Initia-
tive, four principal strategies have been employed by the country
through the National Malaria Control Centre: indoor-residual
spraying (for densely populated and primarily urban areas), mass
distribution of long-lasting insecticide treated nets (LLITNs), inter-
mittent preventive treatment of malaria in pregnancy (IPTp) and
case management through diagnostics and artemisinin-based com-
bination therapies (NMCC, 2007, 2009, 2010; Zambia Ministry of
Health, 2006). Between 2006 and 2008, 96,000 LLITNs were dis-
tributed in Katete district (NMCC, 2011). At the time of the 2008
Malaria Indicator Survey, the average number of LLITNs in Eastern
Province was 0.96 nets (NMCC, 2009), a level very similar to the
one observed in the study area at the beginning of the study in
December 2009.
The rural part of Katete targeted in this study is sparsely popu-
lated, with clusters of family-run farms grouped into small villages.
With an average size of approximately 4 ha (10 acres), the typi-
cal farm is small, and most planting and harvesting done without
machinery. Farm land is generally owned by communities, who
allocate the land to families via local headmen and chiefs. The
amount of farm land families can get access to is – at least the-
oretically – not limited; any individual can claim additional land
from the chief as long as they can show they have the manpower
and skills to use the land (Nolte, 2012).
3. Study design, randomization and descriptive statistics
3.1. Intervention background
As stated above, small-scale farming constitutes the primary

source of nutrition and income in the region. Cotton is the main
Fig. 2. Regional distribution of malaria incidence. cash crops in the area, and sold to multinational cotton buyers (“gin-
Source: Malaria Atlas Project. ners”), who process the cotton and sell it on international markets.
Fig. 3. Cluster-level group assignment.
To enhance productivity and strengthen commercial links with and a “subsidized price loan”, which required farmers to repay
farmers, cotton ginners have set up a variety of agricultural loan ZK 12,500 (US$ 2.5) at the end of the harvesting season. Both
schemes, which allow farmers to receive cotton seeds and “chem- prices are substantially higher than what households reported
icals” (fertilizer and pesticides) as well as agricultural machinery to have paid for bed nets in the past; 90% of bed nets found in
throughout the planting and growing season on a loan basis. Upon households at baseline were received through free governmental
receipt of the cotton, ginners deduct the outstanding loan amount or NGO programs (most likely received as part of the large national
from the final sales price, and pay out the remaining balance in cash. programs); 7% had paid ZK 3000 (US$ 0.6) – the price public health
Agricultural loans are generally provided free of interest, and are facilities charged prior to the national mass distribution, and the
given under the assumption that farmers will sell their harvest to remaining 3% reported to have paid a price between ZK 5000 and
the cotton company offering them the agricultural loan. At the time ZK 10,000 acquiring nets through door-to-door sales.
of the study, our partner organization (Dunavant Cotton) was pro- All nets were distributed at the beginning of the weeding season
viding loans to approximately 80,000 cotton farmers across Zambia. (late December) in order to provide protection for farmers during
According to the latest national estimates, 67% of the Zambia labor the peak malaria season (December–May).
force is employed in agriculture (Zambia Central Statistic Office,
2011b), which corresponds to approximately 1.3 million farming 3.3. Sampling frame, enrollment and program assignment
households. The share of farmers working with Dunavant is rel-
atively small (approximately 6% of all farming households), both The sampling frame for the study was provided by Dunavant
because Dunavant is only one of about 10 cotton ginners in the Zambia. Dunavant has nine regional offices, which distribute farm-
country and because only about 25% of small-scale farmers in the ing inputs and acquire cotton through local sheds and distributors.
country grow cotton (Goeb, 2011). Each distributor handles between 10 and 50 farmers in his commu-
nity. At the time the study was launched, Dunavant was working
3.2. The intervention with 96 distributors in the study area. In order to reduce the risk
of local spillovers, we restricted the sample to distributors operat-
In order to assess farm’s willingness to pay for nets within ing in spatially separated areas, with a minimum distance of 3 km
existing agricultural loan schemes, two different net programs between any two locations. This left us with a final sample of 49 dis-
were implemented as part of the experiment: a free net program tributors and their respective villages. On average, each of the 49
and a bed net loan program. Under the free net program, selected distributors was working with about 20 farmers, a listing of which
farmers were allowed to obtain one free bed net for each uncov- was provided to the study by Dunavant. We randomly selected 11
ered sleeping space in the household. Clusters in the loan arm farmers from each distributor, and visited them for a baseline inter-
were assigned to one of two loan types: a “full price loan”, which view in December 2009. Only the 11 farmers selected for the study
required farmers to repay the full price3 of the net (Zambian were allowed to receive the free or subsidized nets; with an average
Kwacha (ZK) 25,000, US$ 5) at the end of the harvesting season, village size of about 50 households, this means that the program
covered about 20% of the average village population. Out of 539
farmers invited to participate in the study, 516 farmers (95.7%)
3
“full price” we charged reflects the current wholesale price, which is about 30% were enrolled in the study, and completed the baseline interview
below regional retail prices of about ZK 35,000 (US$ 7). in December 2009.
Table 1
Descriptive statistics.
Control (N = 153) Loans (N = 185) Free nets (N = 155) Equal means test (p-value)a
Mean St. dev. Mean St. dev. Mean St. dev. Control vs. Control vs. Loan vs. All Means
Loans Free nets Free nets equal
Farmer age 39.24 12.59 41.32 13.24 39.70 12.90 0.27 0.80 0.31 0.50
Farmer is married 0.83 0.38 0.83 0.37 0.88 0.32 0.96 0.21 0.15 0.30
Farmer years of education 4.34 3.41 4.14 3.66 4.21 3.88 0.63 0.81 0.90 0.89
Members under age 5 0.87 0.88 0.95 1.00 0.86 0.86 0.55 0.93 0.39 0.66
Members age 5–14 1.40 1.36 1.88 1.50 1.92 1.55 0.00 0.00 0.82 0.00
Members age 15–59 2.44 1.22 2.76 1.53 2.84 1.58 0.09 0.03 0.71 0.07
Members age 60+ 0.18 0.49 0.22 0.51 0.17 0.44 0.54 0.97 0.51 0.78
Chicken, geese and ducks 4.58 6.20 6.89 8.45 6.26 7.96 0.01 0.07 0.49 0.01
Goats, pigs and sheep 2.57 3.40 3.41 4.18 3.00 3.68 0.12 0.35 0.41 0.29
Cows 1.47 3.51 2.34 4.09 1.92 3.24 0.07 0.32 0.37 0.19
Bicycles 0.77 0.56 0.93 0.81 0.87 0.60 0.02 0.22 0.44 0.08
Mobiles 0.23 0.45 0.36 0.80 0.35 0.75 0.14 0.17 0.95 0.23
TVs 0.07 0.25 0.14 0.37 0.06 0.27 0.04 0.98 0.05 0.08
Mosquito nets 1.09 1.33 1.53 1.25 0.48 0.69 0.08 0.01 0.00 0.00
Cars, tractors and trucks 0.00 0.00 0.06 0.43 0.03 0.20 0.05 0.05 0.33 0.03
Maize planting area (ha) 1.78 1.17 2.16 3.16 1.86 1.13 0.19 0.71 0.27 0.38
Cotton planting area (ha) 1.36 1.09 1.38 1.25 1.17 0.73 0.94 0.21 0.32 0.33
Other crops (ha) 0.58 0.77 0.73 0.84 0.83 0.93 0.32 0.15 0.55 0.39
Cotton harvest 2009 (bales) 9.46 8.59 12.16 11.12 11.60 11.33 0.09 0.19 0.75 0.19
Maize harvest 2009 (bags) 20.13 15.62 29.83 35.96 28.16 25.98 0.01 0.00 0.68 0.00
Total harvest value 2009 (US$) 453.2 304.0 649.0 589.1 613.8 476.0 0.00 0.00 0.63 0.00
Notes: Based on 493 observations with complete information. All variables reflect baseline conditions as collected in December 2009.
a
p-Values based on cluster-bootstrapped standard errors. Each cluster corresponds to one distributor and 11 randomly selected farmers working with the distributor.
In order to accelerate the distribution of bed nets,4 bed net loan 4.15 ha (median 3.1) in 2009, and average harvest value in 2009
programs were randomized prior to the collection of baseline data was US$ 577 (median US$ 463). With an average household size
at the distributor level. Randomization was done using a simple of close to six members, this implies average per-capita resources
random number draw generated by Stata. Out of the 49 eligible of approximately US$ 0.26 per day, placing the majority of these
distributors, 15 were assigned to the control group (30%), and 15 households well below the international US$ 1.25 dollar per day
distributors (30%) were selected for the free net program. Since we poverty threshold, even when input-related expenses (such as cot-
were particularly interested in the loan group and wanted to assess ton loans) are not accounted for. Cotton farmers are on average
differences in uptake with and without subsidy, a slightly larger slightly larger than other farms;5 at the national level, the average
number of distributors (20% of distributors in each loan program) plot size among small- and medium scale (non-commercial) farm-
were randomized into the net loan programs. The spatial distribu- ers is 3.1 ha (Jayne et al., 2008). On average, farms owned one bed
tion of treatment assignment is illustrated in Fig. 3. All farms in the net at baseline; with a mean of three sleeping spaces per house-
net program arms were informed about the programs at the end of hold, this implies that two thirds of household members were not
the baseline interview, and given 48 h to decide on the number of covered by nets at the beginning of the study.
nets they wanted to receive. Ordered nets were delivered within 10 While the randomized assignment of net programs across clus-
days of the baseline survey, between December 20 and December ters generated a fairly balanced sample with respect to household
31, 2009. head characteristics, the same was unfortunately not true for farm
A first follow-up or midline survey was conducted in April 2010, size, with farms in the free net and net loan arms on average larger
during which information on recent illness episodes was collected. and more productive than farms in the control group. As Table 1
The endline survey was conducted in July and August 2010 with a shows, the largest and most productive farms were found in the
primary focus on harvest outcomes and harvest sales. net loan group, followed by the free net group. Detailed data col-
Out of the 516 farmers initially enrolled in the study, 510 (98.8%) lected on 2009 harvest outcomes suggests that the average value
were followed up successfully throughout the subsequent farm- of farm production (sum of all crops harvested multiplied with the
ing and harvesting season. One farmer passed away, three farmers median sales prices of the respective crops in the area in 2009 – see
moved, and two farmers refused to participate in the follow-up Table 1 for further details) was US$ 453 in the control group, US$
surveys. An additional two surveys were excluded from analysis 614 in the free net group, and US$ 649 in the net loan group. Similar
due to missing planting and harvesting information. Sixteen fur- differences were found for household size: on average, households
ther surveys have missing values on at least one of the extended in the loan and free net groups had 0.8 and 0.9 more household
list of covariates used in some of the specifications, resulting in a members in the 5–59 age range than households in the control
final analytical sample of 493 farmers. group.
These differences between farmers in the control and farmers in
3.4. Descriptive statistics the two intervention groups are large and statistically significant,
and complicate statistical inference, since endline differences will
Table 1 shows descriptive statistics for the households enrolled be at least be partially attributable to differences observed at base-
and followed up in the study by study arm. Average plot size was line. The imbalance does not appear to be driven by differences in
the spatial distribution or by spatial clustering of farms, but rather
4
Due to delays in funding and IRB approval, baseline surveys got pushed back
to December, so that we decided to do the distribution of nets immediately after
5
baseline to make sure farmers would benefit from them during the peak rainy Cotton ginners recommend to use at least one hectare of land for growing cotton,
season. which means that farms growing cotton rarely use less than two hectares of land.
Table 2
Program effect on ownership and coverage panel a: ownership and usage of nets.
Panel A: ownership and usage of nets
Number of nets Number of nets Number of nets Number of nets Number of nets Number of nets
received through owned at used at received through owned at used at
program endline endlinea programb endlineb endlineb
Loan program 0.811*** 0.992*** 0.823*** 0.676*** 0.710*** 0.555***

(0.147) (0.199) (0.189) (0.150) (0.168) (0.157)
Free nets 2.413*** 1.626*** 1.517*** 2.052*** 1.571*** 1.461***
(0.121) (0.190) (0.185) (0.113) (0.137) (0.141)
Control group 0.00 1.05 0.96 0.00 1.05 0.96
average
Controls No No No Yes Yes Yes
included
Observations 493 493 493 493 493 493
R-squared 0.46 0.23 0.20 0.64 0.45 0.46
Panel B: bed net coverage
Fraction of nets Fraction of Probability of Fraction of nets Fraction of Probability of

used at endlinec sleeping spaces not having any usedb sleeping spaces not having any
covered space covered coveredb space coveredb
Loan program −0.0479 0.243*** −0.279*** −0.0459 0.248*** −0.258***

(0.0318) (0.0687) (0.0700) (0.0364) (0.0667) (0.0688)
Free nets −0.0176 0.438*** −0.366*** −0.00924 0.472*** −0.383***
(0.0310) (0.0625) (0.0654) (0.0359) (0.0587) (0.0674)
Control group 0.93 0.41 0.39 0.93 0.41 0.39
average
Controls No No No Yes Yes Yes
included
Observations 419 492 493 419 492 493
R-squared 0.01 0.20 0.15 0.07 0.25 0.21
a
Usage is defined as the number of nets observed hanging by interviewers as part of the endline survey in July 2010.
b
Specifications control for plot sizes used for cotton, maize and other crops in 2009 as well as in 2010, family members under 5, family members 5–14, family members
60 and older, household wealth, mosquito nets at baseline, ownership of chickens, goats and sheep, farmer age, education and marital status, total maize harvest 2009, total
cotton harvest 2009 and total economic value of production 2009.
c
Fraction of nets used not defined for households without a bed net. Cluster-bootstrapped standard errors in parentheses.
p-Values based on cluster-bootstrapped standard errors. Each cluster corresponds to one distributor and 11 randomly selected farmers working with the distributor.
***
p < 0.01, ** p < 0.05, * p < 0.1.
seems to reflect presumably random variations in productivity level 4. Program impact on bed net ownership, usage and
across farmers.6 While a large number of meta-reviews in the medi- coverage
cal literature suggest that results from imbalanced trials controlling
for baseline characteristics do on average not yield different results Table 2 shows the main results for bed net ownership and usage.
from fully balanced trials (Berger, 2010; Knottnerus and Tugwell, As described above, endline surveys were conducted in July and
2012; Riley et al., 2013), there is clearly a concern that observable August 2010. During both visits, net status was verified by inter-
differences may be correlated with other unobservable character- viewers. We consider a bed net “used” if the net was observed
istics such as malaria knowledge or farming skills. To deal with hanging by interviewers during the visit.7 In Panel A of Table 2,
these concerns, we follow the approach proposed by Glennerster we show the impact of the net programs on the number of nets
and Takavarasha (2013) as well as by Bennett et al. (2014) and received (columns 1 and 4), the number of nets owned at end-
estimate both models where we control for lagged dependent vari- line (columns 2 and 5) and the number of nets in use at endline
ables and models where we use differences in outcome measures (columns 3 and 6). In Panel B of Table 2 we show the impact of
between the 2009 (pre-intervention) and 2010 (intervention) sea- the two treatments on bed net coverage in the household, i.e. the
sons as the dependent variable. These models exclusively identify number of nets hanging relative to the number of sleeping spaces
changes in the outcome measures over time, and thus directly used by the household. In columns 1–3 of both panels we show
eliminate confounding or omitted variable bias concerns due to unadjusted differences between the three groups; in columns 4–6,
time-invariant farm-specific differences prior to the intervention. we show estimates with a full set of baseline covariates to con-
The resulting estimates will yield unbiased program impact assess- trol for pre-treatment differences in household size and bed net
ments as long as the random treatment assignment is not correlated ownership.
with changes in unobservable characteristics between baseline and As documented in previous studies (Cohen and Dupas, 2010;
endline conditional on initial values, which seems reasonable given Dupas, 2014; Tarozzi et al., 2014), the demand for bed nets is highly
the relatively short time period analyzed. price elastic; on average, households in the loan group obtained
0.8 nets, compared to 2.4 nets in the free net group. It is worth
7
Nets are generally tied to a knot during the day to keep them clean, which means
6
The baseline table looks virtually the same when specific areas (Eastern, Western that observing a net as hanging does not necessarily mean it was used the previous
or central parts) or villages are excluded. night.
highlighting that despite the high price elasticity, demand is strictly In order to have a comprehensive measure of farm productivity,
positive in this setting with credit financing even when the full we defined the total economic value of production (TEVP) as
price of the net is charged. This is rather different from the zero
demand found for nets with prices over US$ 1 in settings where net
8
TEVP = qi P50,i (1)

acquisition requires an upfront cash payment (Cohen and Dupas,
2010; Dupas, 2014), but similar in magnitude to recent work by i=1
Tarozzi et al. (2014) who find that 52% of household purchase bed where qi is the quantity of crop i and P50,i is the median price for
nets when financing options are provided. one unit of the respective crop.9 In the study sample, eight crops
Given that households in the control group had more nets at were grown: cotton, maize, ground nuts, sweet potato, sunflower,
baseline, the differences in ownership at endline are smaller than soy beans, tomato and cassava. In general, price variations across
the differences in the number of nets received; on average, house- farmers were low: for the two major crops (cotton and maize) prices
holds in the loan group owned 0.9 more nets than households in are negotiated and established at the national level by the govern-
the control group, while households in the free net group owned ment. For some of the minor crops like sunflowers and soy beans,
1.6 nets more than households in the control group. prices are established at local markets. For all crops, individual
Consistent with Cohen and Dupas (2010) as well as Tarozzi et al. prices reported rarely deviated by more than 10% from the most
(2014), we found no effect of net pricing on usage. On average, commonly reported market prices.
90% of nets owned were actively used by the household during our For our analysis, we focus on cotton and maize as the two most
second follow-up, with no differences in utilization rates between commonly grown crops as well as the aggregate TEVP variable. Fig. 4
intervention and control groups. The remaining nets were generally shows kernel density estimates of all three variables in 2010 on an
found stored for future use in the households, which is consis- absolute (top panel) and logarithmic (bottom panel) scale.
tent with the generally high levels of appreciation of nets in this The average cotton harvest in 2010 was 6 bales, which corre-
population. sponds to about 480 kg of cotton harvested with an average plot
In terms of sleeping space coverage, both treatments had a size- of 1.3 ha. Average maize harvest was 28 bags, which implies an
able impact. As shown in Panel B of Table 2, 41% of sleeping spaces average yield of about 1400 kg for an area of 2 ha. Both yields
were covered on average at endline in the control group; this frac- are small when compared to industrial farmers, who frequently
tion increased to 65% in the loan group, and to 88% in the free net achieve yields of over 10 tons of maize per hectare (13 times the
group. Along the same lines, the likelihood of no sleeping space sample average) and over 2 tons of cotton per hectare (about 9 times
being covered by a bed net decreased from 39% in the control group the sample average).10 The average total economic value of all crops
to 13% in the loan group and to less than 1% in the free net group. harvested in 2010 was 2.6 million Kwacha (US$ 517).
5. Impact on agricultural production 5.1. Empirical strategy
The main hypothesis investigated in this experiment is whether The basic empirical (intention-to-treat) model we estimate to
short-term fluctuations in labor supply generated by ill health lead identify the treatment effects of interest is given by
to lower agricultural output. To measure production, we focus on
three different measures: maize harvest, cotton harvest, and total yij = ˛ + Tj ˇ + Xij + εij , (2)
production value. Maize is by far the most common crop in the
where yij is the harvest outcome of interest observed for farm i
country, and grown on approximately 50% of all plots in the sample.
in cluster j in 2010, Tj is a vector of indicator variables capturing
Maize is traded in standard bags of 50 kg, which makes measuring
the distributor-level treatment assignment, and Xij is a vector of
total production of maize relatively easy. The second most common
baseline covariates. Given the limited use of fertilizer in the region,
crop in our sample is cotton. Cotton is used as a cash crop, and,
yields tend to display mean-reverting patterns over time, with good
as described above, sold to cotton ginners at the end of the har-
yield years depleting the soil and being followed by less productive
vesting season. While ginners pay by kilogram (2009 prices were
years. Fig. 5 illustrates these patterns, showing the year-over-year
US$ 0.30 per kg), farmers generally delivery cotton in large bags,
change in output as a function of total output in the 2009 (pre-
which are referred to as “bales”, and generally contain about 80 kg
intervention) period.
each. Even though cotton and maize account for the large major-
To control for baseline differences as well as the observed mean-
ity of farming land and production in this sample, most farmers
reverting patterns, we first estimate models where we including
use small plots to grow a variety of other plants such as sunflow-
the lagged (2009) value of the outcome variable of interest. With
ers, beans, groundnuts, and sweet potatoes. The diversity in crop
lagged outcome variables, the main estimated equation becomes
portfolios means that crop-specific quantities cannot easily be com-
pared across farmers or treatment groups. On the other hand, not yijt = ˛ + yijt−1 + Tj ˇ + εij , (3)
accounting for the resources generated by these crops would clearly
mean that the effects of additional labor inputs may not be fully where yi,t−1 is the lagged value of the dependent variable, i.e. the
captured. harvest outcome in the 2009 season. Following the approach taken
To measure total farm production, detailed harvest information in Bennett et al. (2014), we also test an alternative model where we
was obtained from all crops, and then converted into monetary val- take changes in the outcome variables as dependent variable, and
ues using the median 2009 market prices reported among farmers control for an extensive set of baseline covariates, which is given
who sold the respective crops. In theory, one may wish to use farm- by
specific crop prices to account for differences in market access or yij = ˛ + Tj ˇ + Xij + εij . (4)
production quality; in practice, this is unfortunately not feasible
since a large number of farms do not sell specific crops at all, but
rather use it for their own consumption.8
9
We assume that the quality of the produced crops are comparable across farm-
ers; this assumption is empirically always true for cotton and maize (were prices
are fixed), but may not necessarily reflect prices for local small-quantity trades.
8 10
The only cash crop in the sample is cotton; all other crops are mostly used for See http://www.indexmundi.com/agriculture/ for a country ranking for crop
own consumption, with some occasional sales to cover additional cash needs. productivity.
Cotton (bales) Maize (bags) Total value (ZKR)
.0001 .0002 .0003 .0004

.03
.1
.02 .04 .06 .08
.02
Density
Density
Density
.01
0
0
0 20 40 60 0 100 200 300 0 5000 10000 15000 20000
kernel = epanechnikov, bandwidth = 0.9652 kernel = epanechnikov, bandwidth = 4.4401 kernel = epanechnikov, bandwidth = 355.2049
Ln(bales of cotton) Ln(Bags of maize) Ln(Total Value (ZKR))

.8
.6
.6
.6
.4
.4
Density
Density
Density
.4
.2
.2
.2
0
0
0 1 2 3 4 0 2 4 6 4 6 8 10
kernel = epanechnikov, bandwidth = 0.1381 kernel = epanechnikov, bandwidth = 0.1921 kernel = epanechnikov, bandwidth = 0.1668
Fig. 4. Agricultural outcomes: Kernel density estimates. Notes: total values are in rebased Zambian Kwachas (ZKR); one rebased Kwacha corresponds to 1000 “old” Zambian
Kwachas.
In this specification, yij is the change (difference) in the out- in total harvest value across the three groups are large, with farms
come between the 2009 (pre-intervention) and the 2010 harvests, in the loan and free net groups showing additional yields of US$
and Xij is a vector of baseline covariates. Table 3 shows the main 180 and US$ 156, respectively (Panel A, column 3). Given the large
treatment effect results. In Panel A of Table 3 we show uncondi- differences in total yields at baseline, a substantial fraction of this
tional differences across the three groups. The observed differences differential is clearly attributable differences in baseline covari-
ates. In Panel B of Table 3, we directly control for these baseline
differences in productivity by including lagged dependent vari-
ables in our model as outlined in Eq. (2). As expected, the lagged
300
dependent variables are highly significant in all models, and explain

a substantial fraction of the unadjusted differences observed in
Year-over-year change in output (%)
Panel A. Consistent with the convergence patterns seen in Fig. 5,

all coefficients on lagged variables are strictly positive and smaller
200
than one. The adjusted model suggests that the loan program
increased total harvest value by an average of US$ 69, while the
free net program increased yields by about US$ 65 – only the lat-
100
ter is marginally significant (Panel B, column 3). Once we estimate

Eq. (4) and control for baseline covariates in Panel C of Table 3,
we get smaller point estimates for maize, and larger point esti-
mates for cotton. Average cotton yields declined substantially from
0
2009 to 2010 (from 11.1 to 6.3 bales), which appears to be mostly

attributable to less favorable rain patterns. Declines were substan-
tially smaller in the intervention groups.
-100
In relative terms, this implies that the net programs increased

0 500 1000 1500
cotton yields by about 25% (1.3 additional bales relative to the
Total harvest value 2009 in USD
control group average of 5.2), while maize yields increased by 6%
(loans) and 12% (free nets) compared to the control group average
Fig. 5. Agricultural outcomes: year-over-year mean reversion. Notes: the percent-
of 22 bags. The relatively larger impact on cotton is consistent with
age (year-over-year) changes in output value between the 2009 and 2010 harvest
seasons as a function of the total 2009 harvest value in US$. the idea of cotton being the more labor-intensive crop; it could
2000
1500
Production Value 2010 (US$)
Production Value 2010 (US$)

1500
1000
1000
500
500 0
0
0 500 1000 1500 2000 0 500 1000 1500 2000
Production value 2009 (US$) Production value 2009 (US$)
Control Loans Control Free nets

95% CI 95% CI
CONTROL VS. LOAN CONTROL VS. FREE NETS
Fig. 6. Fractional polynomial predictions: farm yields 2010 as function of 2009 farm yields.
also be interpreted as evidence of households prioritizing maize

Table 3
as the primary food crop, and therefore reducing (increasing) labor
Productivity impact. inputs on cotton rather than maize in cases where labor constraints
become (less) binding. Overall, the fully adjusted models shown in
Panel A: Unadjusted
Panel C suggest that the net loan program increased total harvest
Dependent Bags of maize Bales of cotton Total harvest value by (a not statistically significant) US$ 45, while the free net
value program increased harvest value by US$ 76 (Panel C, column 3).
(1) (2) (3) To provide a more direct sense of how overall production was
Loan program 10.25*** 1.917* 179.9*** affected by the program, we plot 2010 yields as fractional polyno-
(3.826) (1.101) (54.71)
mial functions of 2009 yields by treatment arm in Fig. 6. As the
Free nets 8.611*** 1.179 155.9***
(3.057) (0.751) (44.38) figure illustrates, farmers in the loan and free net groups did con-
Control group average 22.1 5.2 202.0 sistently better in 2010 than farmers in the control group. These
R-squared 0.022 0.014 0.032 differences appear to be larger for very small farmers (less than
Baseline covariates No No No US$ 300 production value) and appear to be particularly large for
farms at the upper end of the yield distribution with more than $
Panel B: Controlling for lagged dependent variables
1000 production value in 2009.
Dependent Bags of maize Bales of cotton Total harvest Given that the treatment assignment was made at the distribu-
value
tor level, one may find it more intuitive to analyze year-over-year
Loan program 4.107 1.197 69.25 changes at the distributor (cluster) level. In Fig. 7, we show the
(2.839) (0.799) (44.34) cluster-level distribution of year-over-year changes. Each obser-
Free nets 3.521 0.610 65.08*
(2.394) (0.693) (36.83)
vation in the plot corresponds to the average absolute change in
Lagged dependent 0.633*** 0.266*** 0.565*** production value in the cluster. The boxes at the center of the figure
(0.0916) (0.0899) (0.0731) show the median, 25th and 75th percentile for villages in each arm;
Control group average 22.1 5.2 202.0
R-squared 0.369 0.188 0.409
Baseline covariates No No No
300
YoY Change in Yields (Cluster Average)
Panel C: Differences in outcomes with full set of covariates

200
Dependent Bags of maize Bales of cotton Total harvest

value
Loan program 1.439 1.371* 45.82

100
(2.972) (0.737) (42.83)

Free nets 2.747 1.366* 75.60**
(2.701) (0.733) (35.85)
Control group average 1.99 −4.23 −51.19
0
R-squared 0.331 0.688 0.438

Baseline covariates Yes Yes Yes
-100
Notes: Specifications in Panel A do not include covariates. Specifications in Panel

B control for lagged dependent variables only. Specifications in Panel C control for
lagged dependent variables as well as for plot sizes used for cotton, maize and other
-200
crops in 2009, family members under 5, family members 5–14, family members
60 and older, household wealth, mosquito nets at baseline, ownership of chickens,
goats and sheep, farmer age, education and marital status. Cluster-bootstrapped
Control Loans Free nets
standard errors in parentheses. Each cluster corresponds to one distributor and
11 randomly selected farmers working with the distributor. *** p < 0.01, ** p < 0.05,
* p < 0.1 Fig. 7. Changes in farm yields at the cluster level.
the outside whiskers show the lower and upper adjacent values11
Average days of field work lost per bout of malaria

8
(Frigge et al., 1989) While the large majority of clusters in the con-
trol arm fared worse in 2010 than in 2009 (negative year-over-year 7
change), year-over-year changes were positive for a majority of vil- Days of caretaker field work lost
lages in the loan group, and were positive for close to 75% of villages 6 Days of patient field work lost
in the free net group.
In order to make sure these results are not driven by individ- 5
ual farmers or clusters, we run a series of robustness checks in

4
Table 4, based on the fully adjusted specifications shown in Panel
C of Table 3. In column 1 of Table 4, we exclude households with
3
heads who did not receive any schooling. In column 2, we restrict
our analysis to farms with a medium degree of crop diversifica- 2
tion (3–5 crops) to ensure that the year-over-year changes are not
affected by differential exposure to price and crop risk. In order 1
to deal more directly with pre-existing differences in productivity,
we exclude the 20% least productive farmers in the baseline sea- 0
son from the regressions in column 3; in column (4) we exclude Under 5 Age 5-14 Age 15-59 Age 60+
the most productive farmers in 2009, and last, in column 5, we
Fig. 8. Work days lost by malaria episode.
exclude both the least and the most productive farmers from the
analysis. Overall, the results appear rather robust, with average pro-
gram estimates between US$ 44 and US$ 79 for net programs, and by age group.14 The average number of days of field work lost by
estimates between US$ 54 and US 110 for free net programs, both malaria episode is 4.7; the burden is substantially lower for children
within the confidence intervals of our main impact estimate of US$ (slightly less than 4 days taken off for taking care of children on
46 and US$ 76 (Table 3, Panel C). In terms of program impact, largest average) than for adults, where the average number of days lost per
effects are observed for more educated and diversified households; episode is about 7, with a majority of days lost directly attributable
in terms of baseline yields, effects seem to be larger at the very to the patient’s absence from the field.
bottom and the very top of the distribution, a pattern which is Given that the second follow-up was done after the harvest with
consistent with the non-parametric estimates shown in Fig. 6. most farmers not working, we further restrict our analysis to illness
episodes reported in the first follow-up. In total, only 85 house-
6. Mechanisms and discussion holds reported illness episodes in the 2-week period preceding the
interview in April, which means that we have only very limited
The primary mechanisms through which the programs were power to detect effects, despite the substantial differences in the
intended to affect productivity was household health. In order likelihood of reporting an illness episodes: in the control group, 32
to monitor the health of households, two follow-up visits were out of 153 (21%) of farms reported an illness episode in the two-
conducted, one in April 2010, and one in July/August 2010. Dur- week period prior to the first follow-up. In the loan group, 33 out
ing both follow-up visits, interviewers went over the household of 185 (18%) reported a health problem, and last, in the free net
roster collected during the baseline assessment, and asked respon- group, only 20 out of 155 (13%) of households reported an illness
dents to indicate for each household member whether they had episode; in relative terms, this means that the likelihood of expe-
been sick since the last interview. In a previous paper (Fink and riencing an acute morbidity episode was reduced by about 40% in
Masiye, 2012), we used data on self-reported morbidity outcomes the free net group. In absolute terms, these reductions are of course
from both follow-ups to identify the health impact of the additional relatively small. Over a six-month period, the observed differences
nets. Given the concerns with longer-term recall data highlighted would correspond to approximately one more illness episode per
in the recent literature (Arnold et al., 2013; Das et al., 2012), we household.
restricted our analysis to morbidity episodes in the two weeks pre- In Table 5, we analyze the program impact on field work labor,
ceding the interview. Consistent with the estimates reported in the as well as total health expenditure. In the first four columns, we
bed net literature (Lengeler, 2004), we found large health benefits show unadjusted treatment effects. In columns 5–8, we show the
attributable to the additional nets, with treated households repor- same estimates when a full set of household covariates is included;
ting reductions in the incidence of health problem of any kind of different from the productivity impact estimates, the inclusion of
about 40% and in the incidence of confirmed malaria of at least covariates does not make much of a difference in these specifica-
50%.12 tions. On average, we find that net programs reduce the number
In order to directly assess the impact of ill health on farm labor of days of field work lost by 0.3 days over a two-week period,
inputs, farmers were asked to report the number of days of field which corresponds to about 3.6 days over the agricultural season.
work list for each illness episode. Specifically, farmers were asked Given the small number of illness episodes the confidence inter-
to report how many days of field work were lost because (i) the vals around this estimate are wide however, and the estimated
sick person was not able to work and (ii) because somebody else coefficients not statistically significant from zero. In terms of the
in the household had to take care of the sick person. Fig. 8 shows estimated effects, about 60% of the field work losses appear to result
the average number of days of field work lost per malaria episode13 from the person sick not being able to complete field work, and
40% appear to result from field workers having to take care of other
household members. It is worth pointing out here that these meas-
11
Adjacent values are defined as the most extreme values within 1.5 interquartile ures only capture complete absence from field work; no data was
ranges of the lower and upper quartile, respectively. See Frigge et al. (1989) for collected on worker strength or ability to work on the fields. In
further details.
12
“Confirmed malaria” was defined as a fever episode where respondents indicate
that they were tested for malaria and that the test was positive.
13 14
Malaria episodes include all recent illnesses attributed to malaria by respon- Since no field work was reported during the second follow-up (scheduled in
dents (based on the answer provided to the question “What health problem did the the post-harvest season in July), all numbers reflect the morbidity reports provided
person have”). during the first follow-up round in April.
Table 4
Robustness checks and heterogeneous treatment effects.
Change in production value 2010–2009 (US$)
Net loans 64.60 43.83 61.11 41.46 77.51

(47.27) (48.78) (53.41) (48.49) (59.06)
Free nets 110.4** 100.8** 67.75 60.23** 53.70
(45.21) (44.85) (44.82) (30.19) (39.66)
Full set of Yes Yes Yes Yes Yes
covariates
Sample Education of Medium Excluding bottom Excluding top 2009 Excluding clusters
head > 0 diversification: 2009 productivity productivity in top or bottom
3–5 crops quintile quintile quintile in 2009
Observations 341 395 386 400 293
R-squared 0.467 0.456 0.470 0.392 0.414
Notes: All specifications control for lagged dependent variables as well as for plot sizes used for cotton, maize and other crops in 2009, family members under 5, family
members 5–14, family members 60 and older, household wealth, mosquito nets at baseline, ownership of chickens, goats and sheep, farmer age, education and marital status.
Cluster-bootstrapped standard errors in parentheses. Each cluster corresponds to one distributor and 11 randomly selected farmers working with the distributor. *** p < 0.01,
** p < 0.05, * p < 0.1
Table 5
Labor supply and health expenditure effects.
Days of field Days of field Total days of Total health Days of field Days of field Total days of Total health
work lost due work lost due field work lost expenditure work lost due work lost due field work lost expenditure
to own sickness to other in last 2 weeks last two weeks to own sickness to other in last 2 weeks last two weeks
in last 2 weeks sickness in last in last 2 weeks sickness in last
2 weeks 2 weeks
Any programa) −0.174 −0.129 −0.303 0.0783 −0.170 −0.0818 −0.252 0.0740
(0.149) (0.129) (0.258) (0.0604) (0.154) (0.117) (0.234) (0.0573)
Loan program −0.206 −0.0764 −0.283 0.0842 −0.182 0.00153 −0.180 0.0972
(0.157) (0.136) (0.257) (0.0676) (0.145) (0.125) (0.239) (0.0668)
Free net −0.134 −0.192 −0.326 0.0713 −0.154 −0.196 −0.350 0.0423
program
(0.170) (0.138) (0.273) (0.0850) (0.192) (0.152) (0.292) (0.0905)
Control No No No No Yes Yes Yes Yes

included
Control group 0.391 0.360 0.752 0.072 0.391 0.360 0.752 0.072
mean
Observations 493 493 493 493 493 493 493 493
Notes: All specifications control for plot sizes used for cotton, maize and other crops in 2009 as well as in 2010, family members under 5, family members 5–14, family
members 60 and older, household wealth, mosquito nets at baseline, ownership of chickens, goats and sheep, farmer age, education and marital status, total maize harvest
2009, total cotton harvest 2009 and total economic value of production 2009. Cluster-bootstrapped standard errors in parentheses. *** p < 0.01, ** p < 0.05, * p < 0.1.
highly endemic areas like the one studied, asymptomatic malaria any lacking labor could be hired in local markets. During the first
infections are common, and likely to substantially reduce farm- follow-up rounds, we directly questioned farmers about labor sub-
ers’ ability to complete field work tasks even when they do not stitution. In total, 167 instances were reported where the head of
suffer from acute infections (Nur, 1993), As part of the follow-up household was not able to work on the field because he or she was
interviews, we also collected information on health expenditure. sick.15 Out of these 167 episodes, substitute labor was hired only in
Zambia’s health sector – in particular in rural areas – is dominated 10 cases (6%). In 7 out of these 10 substitution cases, farmers found
by public health facilities which generally provide basic health ser- somebody to work for free; only three farmers reported to pay for
vices (including malaria testing and drugs) for free. In our sample, labor, with wages ranging between 0.5 and 11 dollars per day. Anec-
90% of respondents indicated zero out-of-pocket expenditure for dotally, local piece work (“ganyu”) labor is widely available (Fink
household members getting treated when sick. Given this, the very et al., 2014); in practice, most small-scale farmers do however not
small and not significant estimated program impact on total health appear to have the resources to hire such labor.
expenditure found in Table 5 is not surprising. Even if one is willing to accept that hiring short-term labor
may be hard for farmers, the estimated impact numbers appear
large. As shown in Goldberg (2014), agricultural wages for day
7. Discussion labor are frequently less than US$ 1 in rural Malawi. In the Zambia
settings, wages appear slightly higher, with median daily wages
The estimates presented in this paper suggest a rather large varying between KR 12.5 and KR 25 (US$ 2.5–5) reported for the
positive impact of malaria prevention on agricultural productivity. peak labor season in focus groups. Even at US$ 5, direct labor costs
While the data collected as part of the project does unfortunately are unlikely to account for the full differences in output observed.
not allow us to precisely identify the mechanism underlying this
impact, increased labor inputs appear to be the most plausible
causal pathway. Given that labor is in principal abundant in Zam- 15
Note that the total number of episodes here related to the four months period
bia, one might argue that short-term labor supply shocks should between the baseline and the midline survey, and thus is different from the 85 illness
not matter at all for farm production; in settings where labor can episodes reported for the two-week period preceding the interview analyzed in the
be freely hired, morbidity should not affect farming output, since previous section of the paper.
Our estimates suggest that having all sleeping spaces covered with Table 6
Treatment and agricultural loan performance.
bed nets will save the average farm approximately 3–5 working
days, which translates to about US$ 15–25, and therefore to 30% No payment Partial Cotton sales to
of the estimated effects at most. Two factors may at least partially payment Dunavant (kgs)
explain the remaining gap: first, recovery from malaria is a slow (4) (5) (6)
process, frequently taking more than two weeks, during which Net loan program −0.003 −0.038 98.1
farm workers are likely less productive (Nur, 1993) even if they (0.06) (0.02) (92.6)
Free net program −0.001 −0.006 89.3
report back to work on the field and thus would not be counted
(0.06) (0.02) (108.6)
as “able to work” in our analysis. A second possibility is that farm- Observations 450 450 450
ers in the study may have only reported major health events, so R-squared 0.045 0.064 0.119
that the reported numbers do not fully capture the true program Notes: All specifications control for plot sizes used for cotton, maize and other crops
impact. In the 2007 Zambia Demographic and Health Survey (Macro in 2009 as well as in 2010, family members under 5, family members 5–14, family
International, 2007), 20% of children under-5 were reported to have members 60 and older, household wealth, mosquito nets at baseline, ownership
suffered from fever or diarrhea over the 2-week period preceding of chickens, goats and sheep, farmer age, education and marital status, total maize
harvest 2009, total cotton harvest 2009 and total economic value of production 2009.
the survey. Even if adults are substantially less prone to be sick, the Robust standard errors in parentheses are clustered at the cluster level.
reported morbidity prevalence seems very low: 32 episodes across *** p < 0.01, ** p < 0.05, * p < 0.1.
153 households in the control group implies about one episode for
every 25 individuals or a fever prevalence of about 4%, which is
about one fifth of the under-5 prevalence in the DHS. Given that the field work and net distribution were supported
One alternative explanation for the relatively large effects by Dunavant and respondents may associate interviewers with
observed are local spillover effects, which could potentially also the company (even though no information provided by the farmer
undermine the internal validity of the study: as demonstrated was shared with the company), farmers in the loan groups might
by Apouey and Picone (2014), social interactions in the realm of selectively under-report their cotton production in order to not
malaria and malaria prevention are likely, and may lead to stronger be obliged to repay their full loan to Dunavant. To investigate this
associations between health behaviors and health outcomes at the hypothesis, we compared repayment and cotton sales (the amount
village or regional level. Even though it is quite likely that farmers of cotton sold from farmers to Dunavant) across the three groups
interact with other farmers outside the 3 km radius chosen for the in Dunavant records.
randomization, large spillover effects do not seem very likely in our Out of the 493 farmers in our main analytical sample, admin-
setting: first, by working only with small-scale farmers having an istrative records could be found for 450 farmers (91.3%), with no
outgrower contract with Dunavant, we covered only a small frac- differences in tracking rates across arms. Table 6 compares three
tion (<25%) of farmers in each village, which means that changes in outcomes from Dunavant perspective: the likelihood of farmers’
net coverage levels at the village level were relatively small. The- default, the likelihood of a farmer’s partial default, and the total
oretically, our intervention could also have increased knowledge amount of cotton sold to Dunavant. While all the estimates suggest
or improved behavior. However, given that we did not provide any that farmers in the two nets arms did slightly better than farmers
information or encourage usage of nets at all but simply provided in the control arm (consistent with the documented productivity
access to subsidized nets large changes in knowledge or behavior effects), we do not find any evidence of farmer in the loan program
among untreated farms do not seem likely. In terms of the actual displaying higher default rates; if anything, the point estimates sug-
nets distributed, we closely monitored their usage and ownership, gest that farmers in the loan program perform best of all three
and found that virtually all nets (97%) remained in the house- groups. It also seems important to highlight that general repay-
holds who originally acquired throughout the study period, which ment levels are high, with 84% of farmers fully repaying loans right
makes us relatively confident that direct spillovers to control vil- after the harvest, and an additional 3% making partial payments.
lages (nets reaching control villages) did not occur. One could also This suggests that farmers are fully cognizant of their credit com-
have expected the intervention to increase the appreciation and mitments, and unlikely to just have signed up for the loan programs
utilization of bed nets – we do not find any differences in utilization with the expectation of not repaying them.
across groups. In terms of cotton production, the overall sales estimates are not
Even though we think that the most likely mechanism from statistically significant, but consistent with the numbers reported
bed nets to agricultural productivity are the direct and indirect in Table 3. The numbers reported in column 3 of Table 6 indicate
costs associated with ill-health, one alternative interpretation of that farmers sold on average an additional 90 kg of cotton to Duna-
the results presented is that subsidized or free nets constitute an vant. Given that each bale corresponds to about 80 kg of cotton,
upfront financial transfer, which allows farmers to spend money these numbers are very similar in magnitude to the 1.37 bales
originally earmarked for bed nets on other farming related items. reported in Panel C of Table 3.
One could argue that farmers could sell bed nets, and use the
resources for their own consumption or agriculture related expen- 8. Summary and conclusion
diture. Given the data collected as part of this project, this appears
rather unlikely. As discussed in Section 3 of the paper, less than 10% All results presented from our experiment suggest that preven-
of nets available at baseline were actively purchased by farmers; tive health investment in the form of LLITNs can lead to substantial
in general, bed net purchases appear to be more of an exception. improvements in agricultural output among small-scale farmers.
We also checked all households for the nets distributed: as stated The estimates presented in this paper suggest that a typical house-
above, out of 547 nets distributed in December 2009, 528 (97%) hold can increase harvest revenues by approximately US$ 76 if it
were located in the original households16 during the second follow- covers all sleeping places with bed nets. These numbers seem rather
up interview in July, so that frequent sales of nets can be excluded large in absolute terms, but also seem consistent with farmer per-
as alternative pathway. ceptions: when prompted regarding the expected health benefits
of reducing the burden of malaria at the beginning of the harvesting
season, farmers indicated that their farms would on average have
16
Respondents were asked about the source of each net in their household as part 30% higher yields if nobody in the household would fall sick; given
of both follow-up surveys. that bed nets reduce the incidence of fevers by 30–50% (Fink and
Masiye, 2012; Lengeler, 2004), the 15% increase in yields observed empirically, but does not find any evidence for free distribution
in this study appears to be within the range expected by farm- campaigns affecting subsequent willingness to pay in the context of
ers. It is worth highlighting that the study site was intentionally bed nets. A similar argument would be that free nets do not change
identified as an area with traditionally high malaria exposure: even willingness to pay directly, but may affect the perceived probability
though the overall farm structure in the sample analyzed appears of receiving free nets in the future. While we cannot fully rule
fairly similar to other small-scale farms in rural Zambia and neigh- out this hypothesis, waiting for future distributions appears to be
boring countries in terms of their size and profitability, malaria rather risky as a strategy given that net benefits are immediate and
prevalence and incidence differs largely across regions, and will go substantially beyond the productivity improvements observed.
almost certainly affect the overall impact of similar programs in Under the assumption that farmers are risk-averse, a more plau-
other areas. sible explanation for the reluctant uptake of pre-financed nets
Nevertheless, the high expected and realized returns to health might simply be that nets constitute a relatively large investment,
investment documented in this paper naturally raise the question which may not be attractive to farmers even if the mean return
of why private investment in bed nets remains low. From a private to the investment is positive and large. With an average dispos-
sector perspective, the provision of bed nets may seem an attractive able (cash) income of less than US$ 200 per year, committing to
investment for larger companies interacting with farmers such as an end-of-season payment of US$ 10–15 may appear difficult to
cotton ginners, both to generate goodwill and higher yields. Follow- farmers given the already high uncertainty faced with respect to
ing the study presented in this paper, our study partner, Dunavant final harvest outcomes. Recent evidence from Ghana suggests that
Cotton, did indeed decide to distribute bed nets to close to half reducing risk exposure with insurance programs may be central to
of the farmers with the support of the World Bank. After an ini- increasing farm investment and productivity (Karlan et al., 2012);
tial review, Dunavant opted against a continuation of the program further research in this area will be needed to better understand
because the overall benefits in terms of contracts signed did not the determinants of farm behavior and to design optimal policies
appear large enough to support net programs on a continued basis in this area.
from the ginner’s perspective.17
While ginning companies may not be able to capture a large Acknowledgements
enough fraction of additional farming outputs, the limited willing-
ness of farming households to invest in bed nets and their own The authors would like to thank the Milton Foundation for
health appears a bit puzzling. Even though similarly low farmer funding this project, as well as Dunavant Cotton and in particu-
willingness to invest has been documented for other high-yield lar Rodrick Masaiti for the invaluable logistical support during all
investment options such as fertilizer or enhanced farming tech- stages of the field work. We would also like to thank Richard Sedl-
nologies (Duflo et al., 2008; Udry and Anagol, 2008), the low levels mayr and Felix Lam for their input into the study design and the
of (unsubsidized) adoption appear particularly puzzling in the con- coordination of the field work, Peter Mulenga for the coordination
text of nets. Nets are a technology well-known by farmers, and, if of data entry, and Jenny Aker, Nava Ashraf, David Atkin, Jessica
farmers’ own statements are to be believed, a technology widely Cohen, Erica Field, Maggie McConnell, Kelsey Jack, Michael Kre-
recognized as effective. When prompted at baseline, farmers in the mer, Zoe McLaren and John Strauss as well as the participants at
study indicated that the burden of malaria in their households could the NEUDC conference and the development seminar at Bocconi
be reduced by approximately 50% if full bed net coverage was avail- University for their comments and suggestions.
able, a number which largely coincides with the WHO’s current net
effectiveness estimates (Lengeler, 2004). The fact that farmers of References
the study area believe in the effectiveness of nets in the area is
also underlined by the fact that more than 90% of nets were used Apouey, B., Picone, G., 2014. Social interactions and malaria preventive behaviors in
throughout the study – overall, lacking belief in net effectiveness sub-Saharan Africa. Health Economics 23, 994–1012.
Arnold, B.F., Galiani, S., Ram, P.K., Hubbard, A.E., Briceno, B., Gertler, P.J., Colford Jr.,
does not appear to be a big issue in the studied setting.
J.M., 2013. Optimal recall period for caregiver-reported illness in risk factor and
Two possible explanations for the lack of preventive investment intervention studies: a multicountry study. American Journal of Epidemiology
in the context of malaria are lacking capital markets or credit con- 177, 361–370.
Ashraf, N., Fink, G., Weil, D.N., 2010. Evaluating the effects of large scale health inter-
straints more generally. The results presented in this study clearly
ventions in developing countries. In: The Zambian Malaria Initiative (Ed.), NBER
show some evidence in support of this hypothesis. Compared to the Working Paper, vol. 16069. National Bureau of Economic Research, Cambridge,
very small fraction of women purchasing bed nets at prices above MA.
US$ 1 when upfront payments are required (Cohen and Dupas, Audibert, M., Etard, J-F., 2003. Productive benefits after investment in health in Mali.
Economic Development and Cultural Change 51, 769–782.
2010) the demands for nets appears substantially increased when Baranov, V., Bennett, D., Kohler, H.-P., 2012. The Indirect Impact of Antiretroviral
financing options are made available as it was done in this study Therapy. PSC Working Paper Series, 9-27-2012.
and in Tarozzi et al. (2014). These increases in demand are both con- Behrman, J.R., Foster, A.D., Rosenzweig, M.R., 1997. The dynamics of agricultural
production and the calorie-income relationship. Journal of Econometrics 77,
sistent with lacking access to credit (Udry and Anagol, 2008) and 187–208.
with models of hyperbolic discounting (Duflo et al., 2010); demand Benjamin, D., 1992. Household composition labor markets, and labor demand: test-
for nets may also have been particularly strong because repayment ing for separation in agricultural household models. Econometrica 60, 287–322.
Bennett, D., Naqviy, S.A.A., Schmidt, W-P., 2014. Learning, Hygiene and Traditional
was due during the harvesting season, when farmers are relatively Medicine. Working Paper.
well endowed with cash (Duflo et al., 2010). Berger, V.W., 2010. Testing for baseline balance: can we finally get it right? Journal
However, even when full (zero-interest) financing was offered of Clinical Epidemiology 63, 939–940, author reply 940-932.
Bleakley, H., 2007. Disease and development evidence from hookworm eradica-
to farmers in this study, uptake was only modest. One possible
tion in the American South, February 2007. Quarterly Journal of Economics 122,
explanation for this is that farmers may expect to receive bed nets 73–117.
for free due to the large number of free or highly subsidized distri- Bleakley, H., Lange, F., 2009. Chronic disease burden and the interaction of education,
fertility and growth. Review of Economics and Statistics 91, 52–65.
bution campaigns run by the Government of Zambia in the region
Clarke, S.E., Jukes, M.C.H., Njagi, J.K., Khasakhala, L., Cundill, B., Otido, J., Crudde, C.,
over the past years. Dupas (2014) investigates this hypothesis Estambale, B.B.A., Brooke, S., 2008. Effect of intermittent preventive treatment
of malaria on health and education in schoolchildren: a cluster-randomised,
double-blind, placebo-controlled trial. Lancet, 127–138.
Cohen, J., Dupas, P., 2010. Free distribution or cost-sharing. Evidence from a ran-
17
See http://papers.ssrn.com/sol3/papers.cfm?abstract id=2358045 for a more domized malaria prevention experiment. Quarterly Journal of Economics 125,
detailed report on this initiative. 1–45.
Das, J., Hammer, J., Sánchez-Paramo, C., 2012. The impact of recall periods on Lengeler, C., 2004. Insecticide treated bednets and curtains for preventing malaria.
reported morbidity and health seeking behavior. Journal of Development Eco- Cochrane Database of Systemic Reviews, CD000363.
nomics 98, 76–88. Loureiro, M.L., 2009. Farmers’ health and agricultural productivity. Agricultural Eco-
Duflo, E., Kremer, M., Robinson, J., 2008. How high are rates of return to fertilizer. nomics 40, 381–388.
Evidence from field experiments in Kenya. American Economic Review Papers Macro International, 2007. Zambia: DHS, 2007 – Final Report (English). Macro Inter-
and Proceedings 98, 482–488. national, Calverton, MD.
Duflo, E., Kremer, M., Robinson, J., 2010. Nudging farmers to use fertilizer: the- Morel, C.M., Thang, N., Xa, N., Hung, L.X., Thuan, L.K., Ky, P.V., Erhart, A., Mills, A.J.,
ory and experimental evidence from Kenya. American Economic Review 101, D’Alessandro, U., 2008. The economic burden of malaria on the household in
2350–2390. south-central Vietnam. Malaria Journal, 7.
Dupas, P., 2014. Short-run subsidies and long-run adoption of new health products: NMCC, 2007. Zambia Malaria Indicator Survey 2006. Zambia Ministry of Health,
evidence from a field experiment. Econometrica 82, 197–228. Lusaka.
Ehrhardt, S., Burchard, G., Mantel, C., Cramer, J., Kaiser, S., Kubo, M., Otchwemah, R., NMCC (Ed.), 2009. Zambia Malaria Indicator Survey 2008. Zambia Ministry of Health,
Bienzle, U., Mockenhaupt, F., 2006. Malaria, anemia, and malnutrition in African Lusaka, Zambia.
children – defining intervention priorities. Journal of Infectious Diseases 194, NMCC, 2010. Zambia Malaria Indicator Survey 2010. Zambia Ministry of Health,
108–114. Lusaka.
Fink, G., Masiye, F., 2012. Assessing the impact of scaling-up bednet coverage NMCC (Ed.), 2011. ITN Distribution Data Base. NMCC, Lusaka.
through agricultural loan programmes: evidence from a cluster randomised Nolte, K., April 2012. Large scale agricultural investments under poor land gover-
controlled trial in Katete, Zambia. Transactions of the Royal Society of Tropical nance systems: actors and institutions in the case of Zambia. In: World Bank
Medicine and Hygiene 106, 660–667. Conference on Land and Poverty Paper 2012.
Fink, G., Jack, B.K., Masiye, F., 2014. Seasonal Credit Constraints and Agricultural Nur, E.T.M., 1993. The impact of malaria on labour use and efficiency in the Sudan.
Labor Supply: Evidence from Zambia. NBER Working Paper., pp. 20218. Social Science & Medicine 37, 1115–1119.
Fox, M.P., Rosen, S., MacLeod, W.B., Wasunna, M., Bii, M., Foglia, G., Simon, J.L., 2004. Pitt, M.M., Rosenzweig, M.R., 1986. Agricultural prices, food consumption, and the
The impact of HIV/AIDS on labour productivity in Kenya. Tropical Medicine & health and productivity of Indonesian farmers. In: Singh, I.J., Squire, L., Strauss,
International Health 9, 318–324. J. (Eds.), Agricultural Household Models. Johns Hopkins University Press, Balti-
Frigge, M., Hoaglin, D.C., Iglewicz, B., 1989. Some implementations of the box plot. more.
The American Statistician 43, 50–54. Riley, R.D., Kauser, I., Bland, M., Thijs, L., Staessen, J.A., Wang, J., Gueyffier, F.,
Gilgen, D.D., Mascie-Taylor, C.G., Rosetta, L.L., 2001. Intestinal helminth infections, Deeks, J.J., 2013. Meta-analysis of randomised trials with a continuous outcome
anaemia and labour productivity of female tea pluckers in Bangladesh. Tropical according to baseline imbalance and availability of individual participant data.
Medicine & International Health 6, 449–457. Statistics in Medicine 32, 2747–2766.
Girardin, O., Daoa, D., Koudou, B.G., Essé, C., Cissé, G., Yao, T., N’Goran, E.K., Tschan- Strauss, J., 1986. Does better nutrition raise farm productivity. Journal of Political
nen, A.B., Bordmannd, G., Lehmannc, B., Nsabimana, C., Keiser, J., Killeen, G.F., Economy 94, 297–320.
Singer, B.H., Tanner, M., Utzinger, J., 2004. Opportunities and limiting factors of Tarozzi, A., Mahajan, A., Blackburn, B., Kopf, D., Krishnan, L., Yoong, J., 2014.
intensive vegetable farming in malaria endemic Côte d’Ivoire. Acta Tropica 89, Micro-loans insecticide-treated bednets malaria evidence from a randomized
109–123. controlled trial in Orissa (India). American Economic Review 104, 1909–1941.
Glennerster, R., Takavarasha, K., 2013. Running Randomized Evaluations: A Practical Thomas, D., Frankenberg, E., Friedman, J., Habicht, J.-P., Ingwersen, N., McKelvey, C.,
Guides. Princeton University Press, Princeton, NJ. Mohammed Hakimi, J., Pelto, G., Sikoki, B., Seeman, T., Smith, J.P., Sumantri,
. Goeb, J.C. (Ed.), 2011. Impacts of Government Supports on Smallholder Cotton C., Suriastini, W., Wilopo, S., 2010. Causal Effect of Health on Labor Market
Production in Zambia, vol. Master of Science. Michigan State University, Lansing. Outcomes: Experimental Evidence.
Goldberg, J., 2014. Kwacha gonna do? Experimental evidence about labor supply in Udry, C., Anagol, S., 2008. The Return to Capital in Ghana.
rural Malawi. Working paper. Ulimwengu, J., 2009. Farmers’ health and agricultural productivity in rural Ethiopia.
Hay, S.I., Snow, R.W., 2006. The malaria atlas project: developing global maps of African Journal of Agricultural and Resource Economics 3, 83–100.
malaria risk. PLOS Medicine 3, 473. WHO (Ed.), 2009. World Malaria Report 2008. WHO, Geneva.
Jayne, T.S., Zulu, B., Kajoba, G., Weber, M.T., 2008. Access to land and povery reduction World Bank (Ed.), 2007. World Bank Development Indicators CD-ROM.
in rural Zambia: connecting the policy issues. In: Food Security Research Project Zambia Central Statistic Office (Ed.), 2011a. 2010 Census of Population and Housing.
Working Paper, p. 34. Zambia Central Statistic Office, Lusaka, Zambia.
Karlan, D., Osei, R., Osei-Akoto, I., Agricultural, Udry C., 2012. Decisions after Relaxing Zambia Central Statistic Office (Ed.), 2011b. Living Conditions Monitoring Survey
Credit and Risk Constraints. Mimeo. Report 2006 and 2010. Zambia CSO, Lusaka, Zambia.
Knottnerus, J.A., Tugwell, P., 2012. Good baseline balance – a prerequisite for valid Zambia Ministry of Health, 2006. A 6-year Strategic Plan A Road Map for Impact on
comparison. Journal of Clinical Epidemiology 65, 119–120. Malaria in Zambia 2006–2011s. NMCC, Lusaka.
Kremer, M., Miguel, E., 2004. Worms identifying impacts on education and health Zambia Ministry of Health (Ed.), 2012. Health Management Information System
in the presence of treatment externalities. Econometrica 72, 159–217. (HMIS). Zambia Ministry of Health, Lusaka.

Welfare implications of learning through solicitation versus

diversification in health care
Anirban Basu a,b,∗,1
a
Department of Pharmacy, Health Services and Economics, University of Washington, and The NBER, Cambridge, MA 1959 NE Pacific St, Box 357660,
Seattle, WA 98195-7660, United States
b
NBER, Cambridge, MA, United States
Article history: Using Roy’s model of sorting behavior, I study welfare implications of learning about medical care quality
Received 3 April 2014 through the current health care data production infrastructure that relies on solicitation of research
Received in revised form 9 April 2015 subjects. Due to severe adverse-selection issues, I show that such learning could be biased and welfare
decreasing. Direct diversification of treatment receipt may solve these issues but is infeasible. Unifying
Manski’s work on diversified treatment choice under ambiguity and Heckman’s work on estimating
heterogeneous treatment effects, I propose a new infrastructure based on temporary diversification of
Keywords:
access that resolves the prior issues and can identify nuanced effect heterogeneity.
Learning
Diversification © 2015 Elsevier B.V. All rights reserved.
Comparative effectiveness research
Economic evaluation
Instrumental variables
Heterogeneity
C1
C9
D6
I1
One of the fundamental challenges in health care markets is who device social policies on access2 . Most public and private
lack of information about the quality of medical care and tech- stakeholders that are engaged in data production on medical qual-
nology (Arrow, 1963). Information on medical product quality is ity signals have employed such mechanisms. Recently, substantial
usually generated by employing an artificial form of ‘learning by public investments were made in the US, under the umbrella term
doing’ mechanism where a selected group of individuals (doers) is “comparative effectiveness research” (CER) and patient-centered
allowed to consume alternative medical products (e.g. using stan- outcomes research (PCOR)3 , to facilitate production of such data
dard statistical designs, such as randomized assignment of patients on alternative medical technologies that are currently being used
to products). Wisdom from their experiences is disseminated to in clinical practice, albeit with incomplete knowledge about their
other individuals, who will face the choice of using these medical comparative qualities4 .
products in the near future, and to inform other decision makers,
2
There are situations where learning from own’s doing is popular, aka the
repeated use of pharmaceutical products in chronic illnesses.
3
Patient Protection and Affordable Care Act of 2009, H.R. 3590, 111th Congress
∗ Corresponding author at: Department of Pharmacy, Health Services and §6301 (2010).
4
Economics, University of Washington, Seattle, WA 98195-7660, United States. Throughout our paper, I assume the CER compares two medical technologies
Tel.: +1 206 616 2986; fax: +1 206 543 3964. that have been approved for use based on meeting the minimum safety thresh-
E-mail address: basua@uw.edu olds as those set by the Food and Drug Administration of the United States. Our
1
I am grateful for comments from Karl Claxton, David Meltzer, Justin Robertson discussions do not encompass evaluation of experimental therapies. Such discuss-
and two anonymous reviewers and support from NIH research grants RC4CA155809 ions are delegated to future work. Also see Philipson (1997) and Malani (2008) who
and R01CA155329. Opinions expressed are mine and do not reflect those of the make distinct arguments about selection in trials of experimental therapies in the
University of Washington or the NBER. presence of health insurance.
166 A. Basu / Journal of Health Economics 42 (2015) 165–173
In this paper, using a simple Roy’s model (Roy, 1951) of sorting control/standard treatment for a population of N patients indexed
behavior, I prove that, when incremental treatment effects are het- by i. Standard treatment may also include the do-nothing option.
erogeneous across patients who have access to these treatments Let the individual-level true treatment effects represent the bene-
under insurance, a data production infrastructure for comparative fits (net of harms) of the new treatment over the control and are
medical quality that relies on soliciting voluntary participation of denoted by bi . Let p denote the price of the new treatment, which
subjects fails to identify any interpretable treatment effect param- is also the marginal cost for manufacturing the new treatment6 .
eter. Therefore, evidence generated through this process fails to Patients are members of risk classes ˝, ˝ = 1,2,.k; k ≤ N, which
inform, objectively, either the individual patient on optimal med- determine heterogeneity in treatment effects across individuals
ical care use or a social insurer on optimal medical care insurance through a production function:
coverage5 .
Unfortunately, such a data production infrastructure is and has bi = ˛k × I(˝i = k) (1)
been the norm for CER randomized clinical trial (RCT) studies. There k
are many examples of such failures in the literature. For example, where I() is an indicator function and ˛k is interpreted as the
Ioannidis and Lau (1997) show that in human immunodeficiency true comparative effect of the new treatment over the standard
virus-related trials and trials of magnesium in acute myocardial treatment in risk class k. Let’s assume that this comparative effect
infarction, when the benefit or toxicity from a treatment varies is expressed in monetary terms. That is the effectiveness unit is
with the baseline risk of each patient, the treatment effect may multiplied with the some predefined threshold representing the
be markedly different in populations with a different representa- monetary value of the marginal unit of benefit7 .
tion of high- and low-risk patients. I show that such differential A population-level average effect parameter is given as
representation of the population in trials may be driven more fun-
damentally by patient and physician behaviors and therefore the = Pr(˝ = k) × ˛k (2)
problems of interpretation of trial results are systemic. k
The implications of this finding are substantial. Incomplete com-
There are two types of decision makers, (1) the patient-
parative quality information generated by CER RCTs research has
physician dyad, which I will refer to as the individual decision
the potential to misguide treatment choices since ex-ante percep-
maker, is assumed to always have knowledge about their risk class;
tion of benefits do not coincide with the ex-post accrual of the same,
and (2) an insurer or social planner who decides the coinsurance
resulting in welfare losses (Basu, 2011). These inefficiencies in the
rate for providing health insurance coverage for the new treatment.
choice of medical products can also accentuate the inefficiencies
due to moral hazard stemming from health insurance (Arrow, 1963;
2. Data production and incompleteness in quality
Pauly, 1968), translating to higher premiums and less protection
information
against risk, in both competitive and non-competitive insurance
markets (Basu, 2011). In this paper, I focus on understanding why
A first-best scenario can be achieved under complete informa-
current data production infrastructure leads to incomplete infor-
tion, where both the insurer and the individuals are aware of the
mation.
risk classes and the production function and are able to perfectly
I begin in the next section by laying out our ideal target param-
predict bi . If individuals had full insurance they would choose treat-
eters in CER evaluations; those that we would like to obtain
ment if bi ≥ 0. Since the insurer can fully anticipate this individual
estimates for in order to guide treatment decisions at the individual
behavior, she can provide full coverage for treatment only for those
level and policy decisions on coverage at the social level. In Sec-
individuals who would experience benefits greater than cost and
tions 2 and 3, I highlight the current data production infrastructure
not provide coverage for the rest. Thus, there is no efficiency loss
and prove why it would produce incomplete information. I study
due to moral hazard.
the implications for such incompleteness on decision-making and
Under the second-best scenario, there exist asymmetry of infor-
welfare. In Section 4, I introduce a new framework for data produc-
mation where, even though, individuals are assumed to be aware
tion that can efficiently resolve the biases inherent in the current
of ˝k and b() and to be able to combine them to predict bi perfectly,
data production infrastructure by using diversification of access
the insurer cannot as they have either no or only partial informa-
to create a conduit for learning about meaningful and decision-
tion on ˝k (Arrow, 1963; Pauly and Blavin, 2008). Consequently,
relevant effect parameters. This work unifies two broad themes
the insurer cannot exclude patients from coverage who would get
in the econometrics literature, one based on Manski’s work on
treatment benefits lower than the cost of treatment (i.e. bi − p < 0).
treatment choice under ambiguity (Manski, 2000, 2004, 2009) that
This leads to moral hazard (Pauly, 2008). To counter this, the insurer
utilizes the concept of diversification of treatment and the other
may offer coverage with a fractional coinsurance rate (r), which is
based on Heckman, Vytlacil and others’ works on estimating het-
the fraction of price a patient must pay in order to receive treat-
erogeneous treatment effects (Heckman, 1997, 2001; Heckman and
ment. When r = 1, the new medical product is not covered through
Vytlacil, 1999, 2001; Heckman et al., 2006). I show how this frame-
insurance.
work can help overcome inefficiencies in health care markets that
I assume individuals choose treatment by maximizing a generic
stem from incomplete information.
Net-Benefit criterion that is based on their perceived benefits from
treatment net of the demand price they face in acquiring the treat-
1. Defining the true population average effect of a ment. I also assume that the social insurer’s goal is to maximize
treatment consumer surplus as is realized ex post based on individual level
choices. Throughout this paper, I express the realized population
Let us begin with a problem of evaluating the compara- level benefits under different levels of coverage for the new treat-
tive effectiveness of a new (approved) treatment compared to a ment as changes to the total outcomes had all patients taken the
5 6
Note that my assertions about optimality are very general and does not depend Assume for now that the marginal cost is constant.
7
on specific welfare functions. What I prove is that the structural target parameters Under the welfare economic foundations, this threshold is the inverse marginal
on which information is required to maximize any welfare function is not informed utility of income (Weinstein and Zeckhauser, 1973; Garber and Phelps, 1997;
by current data production infrastructure. Meltzer, 1997).
A. Basu / Journal of Health Economics 42 (2015) 165–173 167
standard treatment. Under any co-insurance rate r, r ∈ [0, 1], this have perfect information on either ˛k or ˛i . However, she may have
population level benefit, H0 , is given as information about the average effect, ˛.¯ The best a social insurer can
do at this point is to calculate the average net monetary benefits of
H0 = I(˛k − r × p ≥ 0) × ˛k × I(˝i = k) (3) treatment,
i k
˛
¯ −p (5)
When individuals have complete information, they choose to
and recommend coverage if ˛ ¯ −p≥0 9.
receive the new treatment only if ˛k ≥ r × p. Individuals who would
expect to get harmed by treatment (i.e. ˛k < 0) would not select Without loss of generality,
treatment even if it were available to them for free, thereby self- Assumption 1. Let > 0, the true population average treatment
limiting the magnitude of moral hazard. effect is positive, but ˛k ’s span the whole real line.
For a social insurer’s point of view, an optimal co-insurance rate
may be expressed as a solution to maximizing H0 net of costs and Assumption 2. Let ˛
¯ − p ≥ 0 and full coverage was recommended,
taking into account the social value of risk protection provided i.e. r* = 0.
by the insurance (Manning and Marquis, 1996). Consequently, the
Theorem 1. Under Assumptions 1 and 2, LPRE (0) > L2nd (r* ) for
moral hazard (welfare loss) under optimal coinsurance rate (r* ) in
∀r* ∈ [0, 1] if ˛i ≥ 0, ∀i. The welfare loss under pre-CER information
a second-best scenario is given as
with full insurance coverage is strictly larger than the welfare loss

under any second-best scenario as long as all individuals perceive
∗ ∗
L2nd (r ) = I(r × p ≤ ˛k < p) × (p − ˛k ) × I ˝i = k a positive benefit from treatment.
i k
Proof. Under the Pre-CER scenario, two groups of individuals

p making inefficient choices drive the welfare loss. The first group
= Nk × (p − ˛k ) (4) consists of people who fail to receive treatment because their ˛i < 0
˛k =r ∗ ×p but they belong to risk groups where the treatment produces incre-
mental benefits that are more than the price of the treatment (i.e.
which constitutes the welfare loss due to the total number of indi- ˛k > p). The second group consists of individuals who would receive
viduals in each risk group (Nk ) who would choose treatment given treatment as they are led to believe that they would get a positive
the lower demand price (r × p) but ultimately obtain benefits lesser benefit (˛i ≥ 0) but realize a benefit less than the price ((i.e. ˛k < p).
than the price of treatment, i.e. r* × p ≤ ˛k < p Therefore total welfare loss is given by
Reality, however, deviates from both the first and second
best scenarios, because both individuals and the social decision
maker face incomplete comparative information. To understand
this incompleteness, one must study the data production mech- LPRE,CER = I ˛i < 0 × I (˛k − p > 0) × (˛k − p) × I ˝i = k
anisms in place. I consider and compare the circumstances before i k

and after a CER study. I begin by understanding the consequences of
incomplete information before a CER is conducted and why added + I ˛i ≥ 0 × I (˛k − p < 0) × (p − ˛k ) × I ˝i = k
investments for data productions, such as those provisions by the i k
∞

latest legislations (2009 American Recovery and Reinvestment Act; ˛

¯

p
˛
¯
= 1−˚ × Nk × (˛k − p) + ˚ × Nk × (p − ˛k )
Patient Protection and Affordable Care Act of 2009, H.R. 3590, 111th s s
Congress §630 (2010)), are called for. I then study how the current ˛k =p ˛k =−∞
mechanisms of CER may continue to propagate and even enhance (6)

the welfare losses due to incomplete information.
where ˚(.) is a cumulative normal distribution. Comparing (6) to
2.1. Pre-CER information and choices (4),
∞
˛¯
Before a CER RCT is conducted, it is safe to assume that ˛k is not

LPRE,CER (0) –L2nd (0) = 1−˚ × Nk × (˛k − p)
known with certainty at both the individual and the societal level. s
˛k =p
However, prior knowledge, obtained from evidence (of size n) gen- p
˛
¯

erated during the process of approving the use of this new medical
+ ˚ −1 × Nk × (p − ˛k ) (7)
product would determine an individual patient’s anticipated belief s
˛k =0
about the incremental benefits of treatment given one’s own risk
˛¯
0

class. Let this evidence suggest that the average effectof treatment
2 /n 8 , where + ˚ × Nk × (p − ˛k )
is ˛
¯ that is a random draw from Normal , s
is the average effect parameter defined in (2) and represents ˛k =−∞
the heterogeneity and is the standard deviation of the effect in the The first and the third terms in (7) are the incremental losses
population. Let individual beliefs, ˛i , be given as a single draw from due to incomplete information pre-CER. The first term is the same
the distribution Normal (˛, ¯ s2 ) where s is the estimated standard as in (6) and comprises of individuals who fail to take treatment
deviation from prior evidence. It is assumed that s2 is a consis- but would have benefited more than its price. The loss represented
tent estimator of 2 . The schedule of ˛ across individual patients
i in the third term emanate from risk groups where ˛k < 0 and a frac-
determines the marginal benefits curve in the population in the tion of individuals in these risk groups take treatment based on
absence of a CER, which is based on aggregation of individual’s their perceived positive benefits, which was not the case under the
perceived marginal benefits. Moreover, the social insurer may not second-best scenario.
8
I take a conservative approach is assuming that ˛ ¯ and ˛i are consistent estima- 9
This is, in fact, the standard method used in most cost-effectiveness modeling
tors of . To the extent this is not true, the welfare losses described below may be studies that try to evaluate the cost-effectiveness of a new approved treatment for
higher. which there is no head-to-head comparison with its alternatives.
The second term in (7) is a pervasive benefit of incomplete for ˛K ≤ p as by construction ˛K < ˛ ¯ for all ˛K ≤ p. Therefore,
information compared to the second-best scenario (expressed LPOST∗ (r ∗ ) < LPRE (r ∗ ) ∀r ∗ ∈ [0, 1].
as negative loss). The benefit emanate from risk groups where This unambiguous dominance of an ideal CER over pre-
0 < ˛k < p and a fraction of individuals in these risk groups forgo CER scenario arises because individuals are able to better
treatment based on their perceived benefits (i.e. their ˛i < 0), self-select their optimal treatment based on the risk group
which was not the case under the second-best scenario, thereby
specific knowledge generated from an ideal CER. In fact, as
generating welfare gains. k → 0, ˚ ˛k /k → 1 for ˛k ≥ 0 and ˚ ˛k /k → 0 for ˛k < 0.
Under Assumptions 1 and 2, if one assumes that ˛i ≥ 0, ∀i, that Consequently, LPOST * (r) → L2nd (r). Therefore, one can potentially
is every individual perceives a positive benefit from treatment, approach a second-best scenario under any level of insurance cov-
Theorem 1 is proved from (7) as the first two terms drop out erage if new CER studies are able to generate information that can
and LPRE,CER (0) > L2nd (0). Thus, naturally, LPRE,CER (0) > L2nd (r* ) for enable individuals to better self-select treatments based on their
∀r* ∈ [0, 1], since L2nd (0) > L2nd (r* ) for ∀r* ∈ [0, 1]. risk classes, even if the social insurer is unaware of these heteroge-
neous effects. The growing awareness of the potential value of such
2.2. An ideal role for CER has led to considerable federal investment in CER. New legislation
such as the Affordable Care Act of 2009 has also identified the need
Often an ideal CER is construed as one having larger sample size. to risk stratify comparative effectiveness.
In fact, much of the value of information literature in medicine has However, the current data production infrastructure for CER
focused estimating the marginal value of a trial with additional may not be aligned with the goals of such legislation. The gold
patients enrolled (see literature on the Expected Value of Sample standard of data production in medical care involves controlled
Information, EVSI). However, it is not clear whether, in the presence experiments, where alternative treatments under investigation are
of heterogeneity, such an approach to CER, is welfare enhancing. allocated to a selected group of patients by a chance mechanism.
For example, increase in sample size cannot overcome the issues I will refer to such a mechanism as a randomized clinical trial
about selection into trial that we discuss later. Nevertheless, for (RCT) henceforth. I consider two issues within this data-generating
now, let’s assume that those concerns are not relevant. Then, as infrastructure that contributes towards the inability of current CER
n→ ∞, ˛−→¯ , s2 −→ 2 . Consequently, following Eq. (6) and sub- infrastructure to resolve incompleteness in information: selection
p p
in RCT enrollment and target parameters for RCTs.
stituting the post-CER estimate of average effect in it, the welfare
loss post CER with an infinite sample will be:
2.3. Non-ideal design and implementation of CER
∞

studies—Understanding selection into randomized trials
LPOST,CER(n→∞) = 1−˚ × Nk × (˛k − p)

˛k =p Unlike evaluation of experimental therapy where enrollments
may be more likely driven by altruistic motives, CER and economic

+ ˚ × Nk × (p − ˛k ) (8) evaluation is about approved and existing therapies available to

patients. Therefore, there must be a strong implicit selection pro-
˛k =−∞
cess for patients who provide consent to enroll in a CER RCT, in
which they have a non-trivial probabilistic expectation of receiving
Note that if ˚ / < ˚ ˛/s ¯ 2 , a CER of infinite size will
a treatment (most likely the new treatment) that they have some
decrease the magnitude of loss in the second term but increase the
difficulty in accessing outside of the RCT. Such difficulties must be
magnitude of loss in the first term as compared to LPRE and vice
because the cost of accessing the new treatments outside RCT must
versa. Therefore, the value of such a CER study, even with infinite
be high either due to differential insurance coverage of the treat-
sample, is indeterminate.
ments or due to strong physician preferences for one therapy over
An ideal role for CER would be when, even though the social
other. Consequently, this selection process into randomized trial
indi-
insurer continue to believe in the average effect, it can enable
implies that patients who anticipate that their expected incremen-
vidual beliefs, ˛i to be a draw from the distribution Normal ˛k , k2
tal benefits from the new treatment, compared to the control, is
where k2 is variance for risk-group-specific effects10 . If coverage is more than the cost of acquiring the new treatment outside of RCT,
recommended, welfare losses under post ideal-CER scenario will would not enroll in the RCT and would avail the new treatment out-
be a modification of (6) to side RCT. Similarly, patients anticipating a very small incremental
∞ ˛
benefit from the new treatment, even compared to the subsidized
K
LPOST∗ = 1−˚ × Nk × (˛k − p) cost of treatment within the RCT, would not enroll. That leaves
k patients who enroll in a CER RCT to be those who anticipate that
˛k =p
their incremental benefits must be less than the cost of acquiring

p ˛
the new treatment outside of RCT but more that the subsidized cost
K
+ ˚ × Nk × (p − ˛k ) (9) of new treatment within an RCT11,12 .
k
˛k =−∞
to question whether the anticipated benefits of
This brings
treatment ˛i are related in any form to the true benefits of
Note that both the terms in LPOST∗ in (9) are smaller treatment bi , even in the absence of formal CER. Obviously, if
than the corresponding terms in LPRE,CER in (6). This is ˛i = bi , then the value of any additional CER becomes zero as each

because, under
Assumptions 1 and 2 and also assuming k = patient already know their true benefits. On the other hand, the
s, ∀k, ˚ ˛K /k > ˚(˛/s)
¯ for ˛K > p as by construction ˛K >

˛
¯ for all ˛K > p, since ˛
¯ − p ≥ 0. Similarly, ˚ ˛K /k > ˚(˛/s)
¯
11
Often enrollees are paid a monetary sum to compensate for their time spent
participating in the RCTs.
12
It is not necessary for only the patient to be choosing the treatment. Even in acute
10
In practice, even in the absence of CER such a situation may arise, when individ- settings, the physician choice may also be subject to the same selection principles
uals learn by repeated consumption of therapy (e.g. pharmaceuticals) or physicians where they have some anticipation of what works for individual patients. Therefore,
are able to anticipate effect heterogeneity based on baseline risks. physicians may often select to enroll patients to their trials.

value of CER is maximized when ˛i bi , where denotes statistical where the weights wk = k × Pr ˝ = k / k × Pr ˝ = k
independence. In practice, however, it is common to find some k
and k = Pr 0 < ˛i < (COUT − CRCT ) / (1 − R ) |˝i = k . The degree
dependency between ˛i and bi . Such dependencies may arise from of selection in the trial determines the target parameter of an RCT.
biological knowledge about the treatment’s mechanism of actions, The touted internal validity of RCTs rests on obtaining a consistent
past experiences by physicians on using similar treatments on
certain patient risk-groups and by patient’s own learning by doing
estimate for this target
parameter.
When ˛i bi , F ˛i |˝i = k = F ˛i ∀k, where F() is the cumu-
mechanism in a chronic disease setting. Under such dependencies, function. This implies, k = , ∀k ⇒ wk =
effect of selection into RCT becomes non-trivial. Specifically, I show distribution
lative
Pr ˝ = k , ∀k ⇒ RCT = (according to Eqs (2) and (13)). On
Theorem 2. A CER randomized trial produces an unbiased esti-
mate of the populationaverage treatment effect (in Eq. (2)) if the contrary, if , RCT =/ since the weights would
and only if ˛i bi . If Corr ˛i , bi , > 0, RCTs will typically find small vary depending on which risk classes are more likely to enroll in
positive benefits of treatment that are likely to be biased estimate the RCT.
of true average treatment effect.
In fact, if perceived benefits are positively correlated with true
benefits, i.e. Corr ˛i , bi > 0, it implies Corr (wk , ˛k ) < 0 for ˛i > 0
Proof. I formalize selection into a CER RCT following Roy’s model
and Corr (wk , ˛k ) > 0 for ˛i ≤ 0. Individuals who correctly antic-
(Roy, 1951) of self-selection using the following notation
ipate large positive or negative benefits from treatment are less
Si = I Ui∗ ≥ 0 (10) likely to enroll in RCTs. In fact, Eqs (12) and (13) suggest that the
margin of individual who enroll in RCT anticipates a moderated
where S is an indicator for enrolling in an RCT that is driven by positive magnitude of benefits from treatment. This implies that
the latent utility U* for enrolling. Again, without loss of generality, RCT results would typically find small positive benefits of a newer
Ui * is interpreted as the anticipated incremental net benefits (net treatment and the generalizability of these results to the whole
of costs) of enrolling versus not enrolling in an RCT for individual target population remains severely compromised14 .䊏
i given that the individual anticipates a positive benefit from Consequently, in the presence of any anticipatory knowledge
treatment (i.e. ˛i > 0)13 about true treatment effects, the average effect from an RCT is not
a consistent estimator for either population average effect or the
Ui∗ = R × ˛i − CRCT − ˛i − COUT
average effect of any segment of the population: E ˆ RCT = /

= (COUT − CRCT ) − (1 − R ) × ˛i , if ˛i > 0 (11) and E ˆ RCT = / ˛k × ∀k. Next, I study how such results can mis-
where CRCT and COUT are the costs of accessing the treatment lead individual level decision-making and create inefficiencies both
within and outside an RCT, respectively; R is the known random through population-level coverage decisions and individual treat-
probability of receiving the new treatment within the CER RCT. ment selections.

In the presence of uncertainty, the population probability of an Assumption 3. In what follows, I will assume Corr ˛i , bi > 0
RCT enrollment is given by even in the absence of a formal CER.

(COUT − CRCT )
= Pr (Si ) = E Ui∗ ≥ 0 = Pr 0 < ˛i < (12) 2.4. Implications of incompleteness for decision-making
(1 − R )
Therefore, only patients who anticipate positive benefits but
whose magnitudes are less than the expected incremental cost of Theorem 3.
accessing treatment outside RCT would enroll. Interestingly, when
COUT ≈ CRCT , enrollment in CER can be quite difficult. In contrast, if (a) Under Assumption 3, CER RCT may misguide a social planner
COUT CRCT , then the probability of enrollment will approach one to provide coverage on treatments with negative average net
for everyone. Similarly, as R decreases, it reduces the cost differ- health benefits and to withhold coverage on treatments with
ential between accessing the new treatment outside and within the positive average net health benefits.
RCT, thereby lowering the probability of RCT enrollment. These fac- (b) Under Assumptions 3, LPOST (0) >=< LPRE (0). The welfare loss
tors severely limit the generalizability of results from CER RCTs. For under post-CER information can be larger than that under pre-
example, in one of the few surveys ever conducted to understand CER scenario with full insurance coverage.
the factors that determine RCT enrollment, it was found that only
2.7% of eligible patients enrolled in clinical oncology trials (Movsas Proof.
et al., 2007).
Target Parameters for RCT The goal of RCT is to estimate a (a) Based on CER RCT results, the social planner updates her belief
structurally meaningful population level parameter such as the over the average effect of the new treatment using a Bayesian
average treatment effect (ATE) of treatment compared to the con- updating rule (Basu et al., 2011):
trol. Instead, the target population ends up being defined by the
¯¯ = × ˛
˛ ˆ RCT
¯ + 1− × (14)
RCT enrollees. Consequently, its target population defines the tar-
get parameter that an RCT tries to estimate. This parameter is where the weight is determined by a weighted average
an average effect that is a weighted average of risk-class-specific of prior uncertainty 2 and the sampling variance of ˆ RCT ,
effects, where the weights are arbitrarily defined based on the risk and calculates the average net monetary benefits of treat-
class-specific propensity to enroll in the RCT. Therefore, the target ment to be ˛¯¯ − p. Under Assumption 3, Theorem 2 proves
parameter for RCT is given by

that E ˆ RCT > 0 but E ˆ RCT >=< . Therefore, since E (˛)
¯ =

RCT = wk × ˛k (13)
k
14
It is possible that under an ideal symmetric condition, the weights are such
that equivalent portions of the risk-groups with large positive effects and those
with large negative effects select out of enrolling and the average effect among the
13
If individual anticipates a negative benefit from treatment he would not consider enrollees still reflects the population average. However, such a scenario is highly
enrolling in the first place. unlikely.

, E ˛ ˆ RCT < a and E ˛
¯¯ − p < E (( − p)) if E ¯¯ − p > k (˛i )/sk (˛i ) < ˛/s
¯ for ˛k < p, implying that more individuals
with ˛k < p are drawn to not use the treatment, the CER infra-
E (( − p)) if E ˆ RCT > .
structure is welfare enhancing (i.e. LPOST CER < LPRE,CER ).䊏
This implies that CER RCT may misguide a social planner to
provide coverage on treatments with negative average net health
benefits or to withhold coverage on treatments with positive aver- 3. Learning through diversification (LtD): A new
age net health benefits. This also highlights the fact that economic framework for data production
evaluations based on CER RCT studies can be misleading. 䊏
As I have shown in the previous sections, under some gen-
(a) Individual beliefs, ˛i , about comparative effects will also evolve eral assumptions, the current CER-RCT framework that relies on
following the CER RCT using a similar Bayesian updating rule voluntary participation fails to consistently inform either the
(Basu et al., 2011): population-level or individual-level comparative effect parameters
and cannot potentially lead us towards the second—best solutions
˛i = i × ˛i + 1 − i ×
ˆ RCT )
(in fact, it may lower welfare through evidence-based misguid-
ance).
where the weights i are determined by a weighted average Manski (2009) proposed that one way a social decision maker
of prior uncertainty s2 and the sampling variance of ˆ RCT . It
can maximize welfare is through fractional allocations, where a
is important to note that even though original beliefs
may random fraction of the patient population received one treatment
have been consistent, i.e., E ˛i = bi , after CER, E ˛i = / bi . while the other receives the alternative. Manski argued that, given
Most importantly, ˛i > ˛i if ˛i < 0, under Assumption 3,
the ambiguity of evidence on counterfactual outcomes, such an
since E ˆ RCT > 0. That is, some patients who would have had allocation would maximize a broad set of utilitarian welfare func-
originally anticipated a negative effect from treatment, may tion for the social decision maker. Manski (2009) also pointed out
be rightfully so, are now led to believe in a larger, presum- that such an allocation automatically created randomized experi-
ably, positive effect from treatment. Similarly, patients who ments, which were particularly important for learning treatment
would have, rightfully anticipated large benefits from treat- responses. The current proposal builds on this idea of “diversi-
ment, would have their updated anticipation moderated by fied treatment” proposed by Manski (2009). However, our proposal
the small effect size estimated in the RCTs. Thus, the average takes into account two realities in the context of health care.
result from a CER study that is based on voluntary participa- First is that it is almost impossible, at least in the United States,
tion actually misleads individuals about their own comparative to completely restrict “receipt” of a treatment that has crossed the
effectiveness. Following (6), the welfare loss with the post CER regulatory and evidentiary hurdles and has been approved on the
information is given by basis of safety and efficacy. Therefore, diversification of treatment
allocation in terms of “receipt”, which is essential to answer CER

∞
k ˛i and PCOR type question, is usually not possible.
LPOST,CER = 1−˚ × Nk × (˛k − p)
sk ˛i Second, the social decision maker in the context of health care
˛k =p
is typically involved on deciding on insurance coverage of medi-
cal treatment, while individual subjects are typically left to decide

p
k ˛i
+ ˚ × Nk × (p − ˛k ) , (16) on the choice of treatment given insurance coverage. Therefore, a
sk ˛i social decision maker’s problem can be viewed to be a two-step
˛k =−∞
process (Dehejia, 2005). Under any information set (i.e. alternative
where k ˛i = E ˛i |˝i = k and sk2 ˛i = Var ˛i |˝i = k . coverage decisions and CER information), first physician decides
Since k (˛i )/sk (˛i ) <=> ˛/s
¯ for any k, it proves that wel- whether to prescribe treatment for each individual. Second, given
fare loss under post-CER information can be larger than that these potential allocations, the social decision maker decides on
under pre-CER scenario with full insurance coverage. Only the level of coverage for treatment that would improve population
when k (˛i )/sk (˛i ) ≥ ˛/s
¯ for ˛k ≥ p, implying that more indi- health either by sustaining the individual choices or by incen-
viduals with ˛k ≥ p are drawn to take up the treatment, and tivizing to alter them if needed. That is, a social decision maker
Fraconal Coverage
via Techn
Technology
ology dras
Full Coverage Outcomes Evaluaon
sub-groups
for sub -groups
No Coverage
for sub-groups
Sufficiency of
Cross
Crossing
ing Evidenary
Thresholds
Fig. 1. The Learning through diversification (LtD) infrastructure.

studies these potential allocations of treatments under different be used to identify Marginal Treatment Effect (MTE) parameters
information sets and makes the optimal coverage policy that would ((Heckman and Vytlacil, 1999):
generate the highest population benefits driven by the individual
allocation of treatments that would follow that policy. To the extent
∂Eϑ Y |Ð , ˝
= E (Y1 − Y0 ) |˝, V = v = MTE ˝, v , (17)
that one can combine the ideas of diversified treatment for learning ∂p
to that of the two-step process of social decision making on opti- where Y = D × Y1 + (1 − D) × Y0 is the observed outcome, unobserved
mal coverage, one can improve the decision making for both the confounders are V ∼ Uniform[0,1] by construction and probability
individual patients and the social decision maker. Here I propose a of treatment choice given the lottery can be represented by p(Ð, ˝).
“Learning through Diversification” (LtD) infrastructure (Fig. 1) that Basu (2014) extends the LIV methods to identify Person-centered
can potentially mimic the ideal CER designs discussed in Section treatment (PeT) effects, which, for persons who choose treatment,
2.2. The three main features of a LtD infrastructure are: follow

(1) Fractional coverage can be achieved using a technology lottery, EV |˝,P(Ð ),D E Y1 − Y0 |˝, P (Ð ) , D = 1
Ð: For each new product, develop a random order based on,

P(Ð )
say, birth dates so that this new product with uncertain effec- −1
(18)
tiveness profile will be paid at varying levels by insurance (in = E Y1 − Y0 |˝, V < P (Ð ) = P(Ð ) MTE ˝, v dv
continuous fashion) in the first year. That is, such coverage cre- 0
ates a completely stochastic distribution of co-insurance rates
Similarly, conditional effect for a person who did not choose
in the population, F(bi |Ð) = F(bi ). Note that the lottery is done
treatment is obtained by integrating MTEs over values of V greater
anew for each new technology so that the probability that any
than p. Mean treatment effect parameters, ˛k (Eq. (1)) or (Eq.
one person would be denied coverage for all new technologies
(2)) are readily obtained by average PeT effects over respective
will approach zero with increasing number of technologies.
subgroups (Basu, 2014).
(2) Outcomes evaluation: Using the randomization inherent in the
lottery, evaluating patient outcomes across different levels of
3.1.3. Welfare effects
coinsurance rates will directly answer the economic evalua-
Let’s take a two period model in which the first period is the
tion questions on expanding coverage for the target population.
Pre-CER period during which a CER study is being conducted. As
Additionally, the lottery would serve as a perfect instrumental
the end of the first period, the CER study results are disseminated
variable (IV) to study the comparative effectiveness of receiving
and therefore the second period represents the post-CER world.
the new product versus its competitor and the heterogeneity
Therefore, under Assumptions 1–3, total welfare loss over the two
in these effects in the population (Heckman, 1996, 1997, 2001;
periods in a CER-based data production world is given as:
Heckman and Vytlacil, 1999, 2005; Heckman et al., 2006; Basu,
2014) without the challenges of voluntary participation in CER LCER = LPRE,CER +LPOST,CER (19)
RCTs.
(3) Sequential decision making: Based on the outcome evaluation where LPRE,CER and LPOST,CER are given in Eqs (7) and (16), respec-
results, fractional allocation rules can be adapted over time tively.
for specific risk groups. Fractional allocation would continue Under the LtD framework of data production, welfare losses in
within risk groups where ambiguity persists. Optimal stopping both periods will be different. Let the total welfare loss over the
rules for fractional coverage can be developed using Bayesian two periods in a LtD-based data production world is given as:
methods. LLtD = LPRE,LtD + LPOST,LtD
3.1. Key Features of the Learning through Diversification Since the LtD infrastructure allows for consistent estimation of
Infrastructure the mean treatment effect parameters, in the second period, sub-
jective beliefs about the benefits of treatment will align with the
3.1.1. Coinsurance (demand price) as an instrument true values for subjects in each risk groups, E ˛ ˆ k = ˛k . Moreover,
Traditional IV analyses focus around the debate on whether a estimates
given that these data
were generated using at the popula-
chosen instrument is contaminated, given that the strength of the tion scale, ˚ ˛ ˆ k /ˆ k → 1 for ˛k ≥ 0 and ˚ ˛
ˆ k /ˆ k → 0 for ˛k < 0.
instrument is testable. In the LtD framework, the lottery, by design, Consequently,
is orthogonal to all confounders and therefore side steps the typical ∞
˛ˆk
debates in this literature. The strength of the instrument is driven LPOST,LtD = 1−˚ × Nk × (˛k − p)
by variation in out-of-pocket payments by patients that in turn will ˆ k
˛k =p
depend on the market price of the new technology and the price
elasticity of demand. The LtD infrastructure would be most efficient
p
˛
ˆk
in data production for CER for new technologies that come at a high + ˚ × Nk × (p − ˛k ) → 0 (20)
ˆ k
price tag, which aligns with the notion, that most welfare can be ˛k =−∞
generated if we can properly identify people who would and would
Therefore, an LtD infrastructure will always be welfare enhancing
not benefit from the most expensive technologies.
compared to the CER infrastructure as long as

3.1.2. The target parameters in the LtD infrastructure LLtD − LCER = LPRE,LtD + LPOST,LtD − LPRE,CER + LPOST,CER
Meaningful and interpretable structural parameters for evalua-
tion can be recovered using data arising out of an LtD infrastucture < 0 → (LPRE,LtD ) < (LPRE,CER + LPOST,CER )
(Heckman and Vytlacil, 1999, 2001; Heckman et al., 2006, Basu,
2014)15 . For example, local instrumental variable approaches can
correlated with factors such as income (because of the price differentials that the
15
Note that it is important to pay close attention to dealing with essential het- lottery creates), which in turn may be correlated with gains and losses from the new
erogeneity within the LtD infrastructure. This is because treatment receipt will be treatment.
That is fractional allocation should be designed in a way that econometric tools available to researchers. Both clinical guidelines
the loss during the initial data generation process is not greater and coverage decisions can then be sequentially revised to reflect
than the combined losses under the CER infrastructure both during this evidence. I show that under non-stringent conditions, the LtD
and after the data generation process. To obtain the most power infrastructure will be welfare enhancing compared to the current
for analyses and consistency within the LtD infrastructure, it may data production infrastructure, such as CER.
be useful to set the mean coinsurance rate to be 0.5. The welfare One aspect of the LtD infrastructure that would appear to be
losses, if any, during the data production period (that may be one politically challenging is the notion of fractional coverage, albeit it
or few years) can be easily recuperated from the welfare gains from is for a short time during the introduction of the new treatment.
LtD in the post data-production period that typically lasts for many A full legal and ethical consideration of such random allocation is
years. beyond the scope of this paper. However, it is important to note
that unlike earlier discussions in this line of reasoning that revolved
around quasi-random treatment prescription (Manski, 2009), the
4. Conclusions LtD infrastructure does not withhold treatment from anyone, but
rather changes the cost of accessing it in a random fashion. The
Regulatory bodies often approves a new medical treatment potential for patient welfare and the richness of scientific and policy
based on its potential safety profile and its incremental efficacy question that this infrastructure can answer should play a part in
compared to either placebo or a basic control treatment. Often deciding its ultimate feasibility.
superiority of the new treatment is not established and its compar-
ative effectiveness compared to current clinical practice remains
ambiguous. Nevertheless, the treatment becomes available for
consumption at a substantial price in anticipation of a positive References
effectiveness claims based on efficacy results. Variability in effec-
Arrow, K.J., 1963. Uncertainty and the welfare economics of medical care. The Amer-
tiveness profile remains far from known. Under such ambiguity, a ican Economic Review 53 (5), 941–973.
social insurer faces the challenge of deciding whether to pay for Basu, A., 2011. Economics of individualization in comparative effectiveness research
the treatment. In the US, public health insurance provider like the and a basis for a patient-centered health care. Journal of Health Economics 30
(3), 549–559.
Medicare usually extend full coverage of these new treatments as Basu, A., 2014. Person-centered treatment (PeT) effects using instrumental vari-
long as there is positive efficacy signals. Other countries, like UK and ables: An application to evaluating prostate cancer treatments. Journal of
Canada, formally look at the budget impact of coverage by compar- Applied Econometrics 29 (4), 671–691.
Basu, A., Jena, A.B., Philipson, T.J., 2011. The impact of comparative effectiveness
ing the costs of treatments (inclusive of its price) to the projected
research on health and health care spending. Journal of Health Economics 30
effectiveness based on efficacy signals. When coverage is allowed, (4), 695–706.
a large welfare loss may ensue even when the new treatment can Dehejia, R.H., 2005. Program evaluation as a decision problem. Journal of Economet-
genuinely produce higher effectiveness in a certain margin of the rics 125, 141–173.
Garber, A., Phelps, C., 1997. Economic foundations of cost-effectiveness analysis.
population. This is due to the lack of evidence of how to match Journal of Health Economics 16, 1–31.
patients to alternative treatments. Heckman, J.J., 1996. Randomization as an instrumental variable. The Review of Eco-
In this paper, I show that under the status quo policy of extend- nomics and Statistics 78 (2), 336–341.
Heckman, J.J., 1997. Instrumental variables: a study of implicit behavioral assump-
ing coverage to a new treatment in the absence of complete tions used in making program evaluations. Journal of Human Resources 32 (3),
information on its effectiveness profile, welfare loss can be substan- 441–462.
tial. These losses can be minimized by investments in studies that Heckman, J.J., 2001. Accounting for heterogeneity, diversity and general equi-
librium in evaluating social programmes. The Economic Journal 111,
aims at generating such evidence. However, I also show, following F654–F699.
a Roy’s model of sorting behavior, that the current infrastructure Heckman, J.J., Urzua, S., Vytlacil, E., 2006. Understanding instrumental variables in
on data production for this purpose, suffer for severe self-selection models with essential heterogeneity. Review of Economics and Statistics 88 (3),
389–432.
issues since the incentives to enroll in research studies is eroded
Heckman, J.J., Vytlacil, E., 1999. Local instrumental variables and latent variable mod-
by the low demand prices of obtaining medical care outside these els for identifying and bounding treatment effects. Proceedings of the National
studies. Consequently, the parameters identified from these stud- Academy of Sciences 96 (8), 4730–4734.
Heckman, J.J., Vytlacil, E., 2001. Local instrumental variables. In: Hsiao, C., Morimue,
ies do not inform any of the decision-relevant parameters, either
K., Powell, J.L. (Eds.), Nonlinear Statistical Modeling: Proceedings of the Thir-
at the individual or the population level. I show that if one takes teenth International Symposium in Economic Theory and Econometrics: Essays
the normative approach of a social insurer who is forward look- in the Honor of Takeshi Amemiya. Cambridge University Press, New York, NY,
ing and wants to maximize any given social welfare function pp. 1–46.
Heckman, J.J., Vytlacil, E., 2005. Structural equations, treatment effects and econo-
over a duration of time period (typically over the longevity of metric policy evaluation. Econometrica 73 (3), 669–738.
the new technology being considered), then it makes sense for Ioannidis, J.P., Lau, J., 1997. The impact of high-risk patients on the results of clinical
the social insurer, irrespective of what coverage decision is made trials. Journal of Clinical Epidemiology 50 (10), 1089–1098.
Malani, A., 2008. Patient enrollment in medical trials: selection bias in a randomized
today, to device ways to learn about variations in incremental experiment. Journal of Econometrics 144, 341–351.
effectiveness of treatment in the population so that she can encour- Manning, W.G., Marquis, M.S., 1996. Health insurance: the trade-off between
age/discourage appropriate subgroups to uptake/discard the new risk pooling and moral hazard. Journal of Health Economics 15 (5),
609–639.
treatment. In fact, generating such public evidence can directly Manski, C., 2000. Identification problems and decisions under ambiguity: empiri-
inform individuals within the population to use this new treat- cal analysis of treatment response and normative analysis of treatment choice.
ment appropriately without additional effort by the social insurer, Journal of Econometrics 95, 415–442.
Manski, C., 2004. Statistical treatment rules for heterogeneous populations. Econo-
thereby approaching the second-best solutions. Based on this
metrica 72, 1221–1246.
normative framework, I propose a positive Learning through Diver- Manski, C., 2009. The 2009 Lawrence R. Klein Lecture: Diversified treatment under
sification (LtD) infrastructure, through which a social insurer can ambiguity. International Economic Review 50 (4), 1013–1041.
Meltzer, D., 1997. Accounting for future costs in medical cost-effectiveness analysis.
achieve her objectives.
Journal of Health Economics 16, 33–64.
The LtD infrastructure comprises of introducing the new Movsas, B., Moughan, J., Owen, J., Coia, L.R., Zelefsky, M.J., Hanks, G., Wilson, J.F.,
treatment with fractional coverage based random individual-level 2007. Who enrolls onto clinical oncology trials? A radiation patterns of care
co-insurance rates. One then uses these co-insurance rates as an study analysis. International Journal of Radiation Oncology, Biology, Physics 68
(4), 1145–1150.
artificially created, but an almost perfect, instrumental variable Pauly, M.V., 1968. The economics of moral hazard: comment. The American Eco-
to study treatment effect heterogeneity based on a spectrum of nomic Review 58 (3), 531–537.
Pauly, M.V., 2008. Adverse selection and moral hazard: implications for health insur- Philipson, T.J., 1997. The evaluation of new health care technology: the labor eco-
ance markets. In: Sloan, F, Kasper, H (Eds.), Incentives and Choice in Health and nomics of statistics. Journal of Econometrics 76, 375–395.
Health Care. MIT Press, Cambridge, MA. Roy, A.D., 1951. Some thoughts on the distribution of earnings. Oxford Economic
Pauly, M.V., Blavin, F.E., 2008. Moral hazard in insurance, value-based cost shar- Papers 3 (2), 135–146.
ing, and the benefits of blissful ignoring. Journal of Health Economics 27, Weinstein, M., Zeckhauser, R., 1973. Critical ratios and efficient allocation. Journal
1407–1417. of Public Economics 2, 147–158.

Short- and medium-term effects of informal care provision on female

caregivers’ health夽
Hendrik Schmitz a,∗ , Matthias Westphal b
a
University of Paderborn, RWI & CINCH, Germany
b
Ruhr Graduate School in Economics, University of Duisburg-Essen & CINCH, Germany
Article history: In this paper, we present estimates of the effect of informal care provision on female caregivers’ health.
Received 24 October 2013 We use data from the German Socio-Economic Panel and assess effects up to seven years after care
Received in revised form 13 January 2015 provision. The results suggest that there is a considerable negative short-term effect of informal care
provision on mental health which fades out over time. Five years after care provision the effect is still
negative but smaller and insignificant. Both short- and medium-term effects on physical health are virtu-
ally zero throughout. A simulation analysis is used to assess the sensitivity of the results with respect to
potential deviations from the conditional independence assumption in the regression adjusted matching
I10
I18
approach.
C21 © 2015 Elsevier B.V. All rights reserved.
J14
Keywords:
Informal care
Regression adjusted matching
Propensity score matching
Mental health
Physical health
1. Introduction costs of long-term care in the European Union (EU 27) to increase
from 1.2% of GDP in 2007 to 2.5% in 2060 (Alzheimer’s Disease
Europe’s societies are getting older. Low birthrates and popu- International, 2013).
lation ageing due to technological progress in medicine shift the Already today, costs are one reason why many governments pre-
age structure towards higher shares of elderly individuals. This has fer informal care (care provision of close relatives and friends) over
strong implications for labour markets and social security systems professional formal care provision. In Germany, for instance, the
with the long-term care sector as one important part of those. The public long-term care insurance paid 700D per month in 2012 for
World Alzheimer Report, for instance, expects, as a result of grow- care recipients of the highest care level who are cared by family
ing numbers of people in need of long-term care, publicly funded members and 1550D per month to the same recipient cared by
professional caregivers. Germany is a country in which long-term
care is still predominantly regarded the task of the family (Schulz,
夽 We thank Martin Fischer, Pilar García-Gómez, Audrey Laporte, Jan Marcus, 2010) and informal care is more common than in comparable states
Jürgen Maurer, Alfredo Paloyo, Stefan Pichler, and Arndt Reichert for valuable sug- like the Netherlands (Bakx et al., 2015). More than one million offi-
gestions. We further thank two anonymous referees and two editors for very helpful cial care recipients (about 46% of all) are exclusively cared by family
comments. Moreover, we are grateful for comments at the 22nd European Work- members rendering informal care the most important part of the
shop on Econometrics and Health Economics (Rotterdam), the meeting of the health
economics section of the VfS (Hamburg), the annual meeting of the dggö (Essen),
German long-term care system.
the CINCH health economics seminar in Essen, the CINCH academy, the economics However, provision of informal care is both mentally and phys-
of disease conference in Darmstadt, and seminars in Bayreuth and Paderborn. All ically challenging. We, therefore, analyse the question of whether
errors are our own. Financial support by the Fritz Thyssen Stiftung is gratefully there are some hidden costs – or costs often neglected in the pub-
acknowledged.
∗ Corresponding author at: University of Paderborn, Warburger Strasse 100, 33098 lic debate – that make informal care provision not as economic as
Paderborn, Germany. Tel.: +49 5251 603213. often thought. This could be the case if informal care provision goes
E-mail address: hendrik.schmitz@uni-paderborn.de (H. Schmitz). along with health impairments of the caregivers. Other costs (not
H. Schmitz, M. Westphal / Journal of Health Economics 42 (2015) 174–185 175
considered here but heavily analysed in the economic literature1 ) justify the conditional independence assumption that would allow
are forgone income for those who leave the labour force to provide for a causal interpretation of the results. To be more precise, we
care. use a regression adjusted matching approach. Although we argue
The economic literature on health effects of caregiving is fairly below that, given our, data we can justify the conditional indepen-
scarce.2 To the best of our knowledge, there are only three stud- dence assumption, we allow in a sensitivity analysis that follows
ies on the effect of care provision on health in a narrow sense. Coe Ichino et al. (2008) for certain deviations from this assumption.
and van Houtven (2009) estimate health effects of informal care- Second, to the best of our knowledge, this is the first study
giving in the US using seven waves of the Health and Retirement that does not only look at contemporary, or short-term effects
Survey (HRS). They use sibling characteristics and the death of the of informal care provision on health, but also on medium-term
mother as instrumental variables that control for selection into and effects of up to seven years after care provision. By medium-term
out of caregiving in order to identify causal effects. They find that effects we mean: if a women provides care in a certain year, what
continued caregiving leads to a significant increase in depressive is her expected change in health up to seven years afterwards.
symptoms for both sexes while physical health does not seem to be This adds on work by Coe and van Houtven (2009) who also dis-
affected. Do et al. (2015) use data from South Korea where informal cuss persistence of health effects but need to stick to a two year
care is quite common among females caring for their parents-in- period. Medium-term consequences could be more severe than
law. The data allow identifying a health effect for daughters-in-law instantaneous short-term health impacts restricted to the period
where selection into care is taken into account by instrumenting of providing care. Moreover, knowledge about the persistence of
the informal care decision with parents-in-law’s health endow- health effects is arguably more important for policy makers than
ment. Their findings suggest that there is an increased probability about short-run effects only.
of worse physical health by providing informal care. Di Novi et al. The results suggest that there is a considerable negative short-
(2013) use the first two waves of SHARE to estimate the effect of term effect of informal care provision on mental health which,
caregiving on self-rated health and quality of life, measured by the however, fades out over time. Five years after care provision the
CASP-12. They find positive effects of care provision on self-rated effect is still negative but smaller and insignificant. Both short- and
health (seen as a measure of physical health) and mixed evidence medium-term effects on physical health are virtually zero through-
regarding quality of life (seen as a measure of mental health). out. The sensitivity analysis suggests that sensible deviations from
Two further papers evaluate the relationship of caregiving and the conditional independence assumption do not change these
caregiver drug utilisation. On the one hand, drug intake could be results.
seen as an objective measure of poor health. On the other hand, it The paper is organized as follows. Section 2 briefly outlines the
sheds light on direct costs of caregiving. Van Houtven et al. (2005) institutional setting of long-term care in Germany. Section 3 dis-
assess the impact of caring on the intake of drugs using data on cusses the empirical approach, Section 4 presents the data. The
caregivers for US veterans. One finding is that the intensive care results are reported in Section 5 while Section 6 assesses the sensi-
margin is an important factor for drug intake. Schmitz and Stroka tivity of the results. Section 7 concludes.
(2013) exploit data of a large German sickness fund that enables
to consider prescriptions of anti-depressants and drugs to restore
physical health. Their results support Van Houtven et al. (2005), 2. Institutional background
providing some evidence that caregiving increases the intake of
anti-depressants in particular if coupled with having a job. Other The German social long-term care insurance system was
studies look at broader welfare consequences of caring and use introduced in 1995 as a pay-as-you-go system. It is financed by
life satisfaction as a proxy (Bobinac et al., 2010, Van den Berg and a mandatory pay payroll tax deduction of currently 2.35% of gross
Ferrer-i Carbonell, 2007, Leigh, 2010, van den Berg et al., 2014). One labour income (2.6% for employees without children). In order to
issue with these studies is that they do not address reverse causality qualify for benefits, individuals need to be officially defined as care
and selection problems based on time-varying unobserved hetero- recipients and be classified into one of three care levels. In care
geneity. level one individuals need support in physical activities for at least
We use representative household data from the German Socio- 90 min per day and household help for several times a week. Indi-
Economic Panel to estimate the effects of informal care provision viduals in need of more care are classified into care levels two or
on female caregivers’ health. The outcome variables are mental three, where the benefits increase in care levels.
and physical summary scale measures (called MCS and PCS) for Benefits also depend on the type of care, where monthly pay-
the years 2002 to 2010 that capture the multidimensional nature ments for informal care range from 235D (level one) to 700D (level
of health. Our contributions to the literature on health and infor- three), for professional ambulatory care from 450D to 1550D and
mal care are twofold: First, we use a different approach to address for professional nursing home care from 1023D to 1500D . The lat-
selection into and out of care provision. Except for Di Novi et al. ter, in particular, does not fully cover the expenses for nursing home
(2013), previous studies that deal with endogeneity problems all visits and copayments of up to 50% are standard. Copayments for
use instrumental variables approaches. We try to identify the effect professional ambulatory care are smaller and amount to an aver-
of caring using different assumptions that can put the literature on age of 247D or about 20% (Schmidt and Schneekloth, 2011). Social
a broader basis and thereby complement it. Our approach is to fully welfare may step in if individuals are not able to bear the copay-
exploit the time dimension and richness of panel data in order to ment. Thus, the decision for formal or informal ambulatory care is
usually not driven by financial aspects as each care recipient who
is assigned a care level is entitled to benefits for all kinds of care.
1
The introduction of the insurance system in 1995 stressed the
E.g., Carmichael and Charles, 2003; Heitmueller, 2007; Heitmueller and Inglis,
2007; Bolin et al., 2008; Leigh, 2010; Van Houtven et al., 2013; Meng, 2013. family as the main provider of care, as it is thought to provide
2
In the medical literature, there is a fair amount of studies on the relationship of care cheaper, more agreeable, and more efficiently. From the care
health and care provision. They mainly stem from the US (see e.g., Schulz et al., 1995; recipient’s perspective, the decision to receive informal care typi-
Stephen et al., 2001; Gallicchio et al., 2002; Tennstedt et al., 1992; Beach et al., 2000; cally expresses a preference for being cared by familiar relatives
Ho et al., 2009; Shaw et al., 1999; Lee et al., 2003; Dunkin and Anderson-Hanley,
1998; Colvez et al., 2002). In general, these studies use non-representative samples
or friends. In some cases, informal care recipients are addition-
and widely disregard endogeneity problems. Furthermore, they often concentrate ally supported by professional carers. These are, on average older
on more specific definitions of care, such as caring for people with dementia. recipients with a higher care level and, thus, a higher care burden
176 H. Schmitz, M. Westphal / Journal of Health Economics 42 (2015) 174–185
(Schulz, 2010). Apart from the care burden, a reason for profes-
sional care can be the absence of appropriate informal caregivers,
either because they chose to only participate in the labour market
or because their own physical or mental health conditions prohibits
the full amount of necessary care provision.
From the caregiver’s perspective, affection and sense of respon-
sibility towards a loved parent or spouse mainly drive the decision
to provide care. Although the insurance benefits for informal care
are often passed on to the care provider this comparably small
Fig. 1. Basic time structure.
amount cannot be regarded a financial incentive to provide care,
as it is also needed to cover other expenses for care provision (see
Schmidt and Schneekloth, 2011 for all points). However, the insur- if it does not hold and both, regression model and propensity score
ance funds do pay pension contributions for informal carers who estimation are wrongly specified, the estimates are biased. The esti-
provide care at least 14 h a week (Schulz, 2010). In 2002, people mation strategy is a two-step process, originally proposed by Bang
cared on average 14 h per week for care recipients whose assess- and Robins (2005). As a first step, the probability of being a care-
ment of needs is at least classified as the lowest official category giver (the propensity score) conditional on relevant covariates is
(Schneekloth and Leven, 2003). estimated with a probit model. Subsequently, treatment and con-
Between 2001 and 2011 there were only minor adjustments trol group are matched. We use an Epanechnikov kernel with a
to the German long-term care system. They were minor because bandwidth of 0.03 in the basic specification. To further increase
benefits were increased but only to keep pace with the inflation the comparability, the sample is restricted to the common support
(Rothgang, 2010) and, thus, did not change the incentives to provide of the propensity scores of the treatment and control group.
care. As of 2008, employed individuals are allowed to take a 10 day As a second step, the health outcome is regressed on informal
(not repeatable) unpaid leave to organize or provide care in case of care and, again, all control variables where the observations are
an incidence of care dependency in the family. However, only very weighted by the kernel weights W estimated by the matching algo-
few caregivers make use of this.3 Thus, the tasks of informal care- rithm: ˇ ˆ = (X WX)−1 X y. Standard errors are computed according
givers, the composition of caregivers and care recipients as well as to the suggestion of Marcus (2014) who employs robust standard
financial incentives remained fairly similar over time. errors of the regression above since they are slightly more con-
servative but easier to estimate than bootstrapped standard errors
3. Empirical strategy that, in addition, are not formally justified.4 However, we cluster
standard errors on the individual level since individuals appear
We aim at estimating the effect of informal care provision on several times in the data set.
health. Certainly, the decision to provide care is not random per se. We employ the time structure as presented in Fig. 1. Assign-
Given that someone close becomes care dependent, some individ- ment to treatment T occurs in t = 0. We condition on a large set
uals choose to provide care while others do not. The willingness to of covariates in t = −1, thus reducing the potential problem that
provide care depends on factors such as the financial and temporal covariates are affected by the treatment status. We, then, compute
affordability, own health endowment as well as innate tendencies the treatment effect four times: 1 year after treatment, 3 years after
such as personality traits. treatment, 5 years after treatment, and 7 years after treatment. Note
To deal with this problem we apply the model of Rubin (1974). that conditioning variables and treatment group assignment are
Following his notation we observe Y = T · Y1 + (1 − T) · Y0 , where T always the same and determined in t = −1 and t = 0, respectively. As
indicates whether an individual is assigned to treatment (2 h of explained in Section 4, the outcome variable is available biannually
informal daily care provision, but we will also consider alterna- between 2002 and 2010 in our data set. Since we condition on pre-
tive definitions) or control group, Y is the outcome (health), and treatment outcome (see explanation below), the earliest possible
the index {0, 1} indicates the potential health outcome of being a treatment year is 2003. We use the maximum available informa-
caregiver or not. If we simply compare the realized outcomes, i.e., tion in the data and pool it to one sample. Then, individuals treated
E(Y1 |T = 1) − E(Y0 |T = 0), selection bias will most likely arise due to in 2003 (call this wave 1) can be followed until t = 7 in 2010 whereas
the non-randomness of care provision. However, the average treat- individuals treated in 2009 (call this wave 4) can only be followed
ment effect on the treated (ATT) can be identified if the conditional until t = 1. Hence, the effect in t = 1 will be measured more precisely
independence assumption holds and assignment to treatment is than the one in t = 7.
random conditional on controls: Y1 , Y0 ⊥ T|X. That is, if all the deter- Even though we condition on a large set of covariates that are
minants that simultaneously influence the health outcome and the supposed to capture the process of the decision to provide care,
selection into treatment are observed. Then, ATT = E(Y1 − Y0 |T = 1, X) there are probably some threats to the conditional independence
is the causal ceteris paribus impact of informal care provision on assumption. First, there might be health driven selection into
health. treatment. Individuals who are confronted with the question to
We use propensity score methods to estimate this effect and provide care but are themselves in poor health might not be able
combine matching with regression methods, thus employing the to do so. As informal care provision is both physically and mentally
so called regression adjusted matching approach (see, for exam- challenging, this possible selection holds for both dimensions of
ple, Rubin, 1979). The advantage to using either only matching or health. If this is indeed the case and informal care provision has
linear regression is that it yields consistent estimates if either one negative health effects, ignoring this reverse causality problem
of each method fails to remove the selection bias. This is called the would lead to an underestimation of the true effects (in absolute
double robustness property (Bang and Robins, 2005). Nevertheless, values). We follow, e.g., Lechner (2009a) and García-Gómez (2011)
this method rests on the conditional independence assumption and and match individuals on pre-treatment outcomes (here, health
status in t = −1), thus only comparing individuals of the same
3
Schmidt and Schneekloth (2011) report that only 9000 out of possibly 150,000
4
made use of this until 2011. The most frequent reason for not making use in their We can confirm this finding in our data. Bootstrapped standard errors yield
survey was that individuals were not aware of the possibility. slightly less conservative standard errors.
Table 1 transition of care provision between t = 2 and t = 3. Hence, we follow,

Stratified sample.
e.g., Lechner et al. (2011) and use the standard static version. This
Stratum t = −1 t=0 enables us to estimate the average effect of care provision in t = 0 on
1 Care Care health in later years. This effect is generated by dynamics in care
No care provision which are not explicitly modelled but implicitly taken
2 No care Care into account. The descriptive statistics in the next section show that
No care the vast majority of care durations is between one and three years.
Hence, in reality, there is much less heterogeneity in care durations
than implied by all theoretically possible paths in Fig. 2(b).
baseline health status before treatment. This rules out that indi-
Above, we have set out selection issues and the responses to
viduals in the control group are in worse health due to a selection
those that are facilitated by our data. To assess the adequacy of
of healthy individuals into care provision.
our responses we report a sensitivity analysis in Section 6. We esti-
A second issue is unobserved heterogeneity, confounders that
mate a short- and a medium-term effect of care provision, where
both affect treatment and outcome, but are not observable for
by medium-term effect we mean the expected health effect of care
the researcher. As Lechner (2009a) suggests, this problem can be
provision in a certain year, five or seven years after. Given that an
mitigated by stratifying the sample according to care provision
individual cannot foresee her care provision path in the future, this
in t = −1. Comparing only individuals with the same care sta-
expected effect (though probably a composite of effects from dif-
tus in t = −1 accounts for a lot of unobserved heterogeneity that
ferent paths) is arguably the most relevant one from an individual
affects treatment participation. Hence, the conditional indepen-
perspective when deciding about providing care in t = 0 or not.
dence assumption is much more likely to hold within the strata of
previous care provision.5 Moreover, stratifying the sample at least
4. Data
mitigates the problem that control variables, though dated back
to t = −1, could be determined by care provision in t = 0 through
We use data from the German Socio-Economic Panel (SOEP)
confounders that both affect past control variables and current
which is a yearly repeated representative longitudinal survey of
treatment status.
households and persons living in Germany that started in 1984.
Hence, we generate two samples based on information in t = −1
The SOEP covers a wide range of questions on the socio-economic
and estimate the treatment effects independently for each sample
status like on work, education, health, and personal attitudes (see
as depicted in Table 1. Both estimated treatment effects and their
Wagner et al., 2007, for details). Currently, some 22,000 individ-
variances for each stratum are merged as weighted means.6
uals above the age of 18 from more than 10,000 households are
Note that treatment is only defined as care provision in t = 0
interviewed each year.
while we leave future care status unrestricted as exemplarily
We restrict the sample to women that have complete informa-
shown in Fig. 2(a) for care starters. The most important advan-
tion on treatment status in t = 0 and control variables in t = −1. Since
tage of this is that selection out of care provision due to bad health
caregiving among men is much less common and we observe con-
is no problem in identifying medium-term effects of care provi-
siderably fewer male caregivers we drop men, as it turned out to
sion because future health status – potentially affected by care and
be very difficult to properly model the treatment participation (the
potentially leading to selection out of care provision in later years –
propensity scores yielded only very low values). Moreover, we drop
does not affect the treatment group assignment at all. A drawback
female professional caregivers from the sample, as they might mix
might be that in this static model sequential paths of care provision
up professional and personal affairs. Beyond that, no further restric-
are not explicitly modelled. Fig. 2(b) shows some examples of paths
tions are imposed on the sample. Pooling all waves as shown in
after t = 0. Individuals who care in t = 0 might either stop in t = 1 or
Fig. 1, we end up with a sample of 31,177 person-year observations
go on and stop later, or even stop and take up care provision again.
in t = 0. The lowest line of Table 2 shows the number of observa-
The same holds for the control group that includes individuals who
tions in the sample. Of the 31,177 observations in t = 0, we observe
cared later on.
the health status of 28,622 in t = 1, of 20,288 in t = 3, and so on. This
Lechner (2009b) and Lechner and Miquel (2010) present a
number strongly drops over time, mainly because more and more
dynamic matching model that enables to compare the effect of, say,
episodes are right censored (again, see Fig. 1).
providing care in each period between t = 0 and t = 7 with, say, not
We identify caregivers depending on how individuals respond
caring in any of the periods.7 We do not make use of such a frame-
to the following question: “What is a typical day like for you? How
work. Most importantly, because the numbers of observations in
many hours do you spend on care and support for persons in need
the 256 (= 28 ) different paths become very small in our sample,
of care on a typical weekday?” which has been included into the
except for the path of never carers (0 − 0 − . . . −0). E.g., only 23
SOEP questionnaire since 2001.8 Answers to this question are also
observations in our data set provide care in each year between t = 0
shown in Table 2. 862 or 41% of all individuals who care a positive
and t = 7. Second, given that we condition on pre-treatment out-
number of hours per day, care for 1 h. 24% care 2 h, whereas 10%
come, we would need to condition on the health status at each node
care 3 h per day. Note that the numbers in t = 1 (and later) do not
in Fig. 2(b) in order to make the “dynamic conditional indepen-
refer to care provision in t = 1 but to the number of observations
dence assumption” (Lechner and Miquel, 2010) credible. However,
who care in t = 0 and are still observed in t = 1.
as the outcome variable is only available in every other year, we
If an individual states caring at least 2 h per day we consider
cannot, for instance, condition on health in t = 2 in modelling the
her a caregiver. That is, the treatment indicator is the binary vari-
able of caring for at least 2 h per day. This comes closest to other
definitions in the literature, e.g., Leigh (2010) who defines care pro-
5
However, for stratum 1 (individuals who already provided care in t = −1) there vision as caring at least for 10 h per week. Below we show that the
is presumably more unobserved heterogeneity left, since here, all individuals that
results are robust to higher or lower thresholds. The question does
have been caring, potentially for several years, are pooled. Thus, we identify only an
average effect over all conceivable care spells. This, however, holds for all studies
that cross-sectional data or panel data and do not explicitly model the dynamics of
care.
1 8
This question does not refer to child care which is a separate category in the
6
ATT = 1
n i∈1,2
ni ·
ATT i , se =
n2 i∈1,2
n2i · se2i . time use questionnaire. The Supplementary Material includes a paragraph on the
7
See also Augurzky et al. (2012) for an application. justification for the validity of self-reported answers to these kinds of questions.
Fig. 2. Group assignment rules. Note: 1 = providing care; 0 = not providing care; X = care status not specified (= either 1 or 0). Right panel does not include all possible paths
but only a small excerpt.
Table 2
Sample size.
t=0 t=1 t=3 t=5 t=7
Hours of care = 0 29,080 26,667 18,956 11,455 5,194

Hours of care = 1 862 (= 41%) 800 564 357 160
Hours of care = 2 507 (= 24%) 479 317 197 85
Hours of care = 3 203 (= 10%) 193 140 81 36
Hours of care = 4 167 (= 8%) 152 100 53 24
Hours of care > 4 358 (= 17%) 331 211 111 53
All observations 31,177 28,622 20,288 12,254 5,552
Source: SOEP, own calculations. Number in parentheses is the share among all individuals with positive hours of care. Hours of care are measured in t = 0 only.
not allow for a link between caregiver and care recipient. Hence, et al., 2007). Thus, both variables capture the multidimensional
we have no information on the care recipient and we are not able aspect of health. The scales range from 0 to 100, normalised to
to stratify our analysis with respect to her (e.g., in order to eval- mean values of 50 and standard deviations of 10 in the 2004 refer-
uate differences between caring for spouses or parents). This is a ence sample. Higher values mean a better health status. MCS loads
common shortcoming in this literature. information on perceived melancholy, time pressure, mental bal-
Table 3 gives a notion of the duration of care episodes. It counts ance and emotional problems into one summary scale.10 The SF-12
the consecutive years individuals provide care of at least 2 h per is commonly used to measure general health and functioning in
day. In presenting the numbers we distinguish between uncen- epidemiological research (Ware et al., 1996). It includes informa-
sored spells (of individuals that are observed to provide no care tion on subjective health but the component summary scales are
before and after a care episode) and censored spells (individuals correlated actual with health diagnoses. For example, Gill et al.
that either enter the sample as caregivers or are caregivers at the (2007) find that MCS”is a useful screening instrument for depres-
end of the observation period). Due to the sample construction, sion and anxiety disorders in the general community, and thus, a
there are many right censored individuals which complicates the valid measure of mental health”. This view is supported by Vilagut
interpretation of the table somewhat. What should be taken away et al. (2013) who find”acceptable results for detecting both active
from it is that the vast majority has care spells of about one to three and recent depressive disorders in general population samples”.
years. Therefore, the effects after seven years are mainly driven by This property could build the bridge between the short-term symp-
individuals who had shorter caregiving episodes. Individuals who toms that are measured to longer-lasting health consequences that
constantly care over many years hardly add to the results.9 are thus also captured by this summary scale. Salyers et al. (2000)
The two outcome measures are a mental and a physical health regard it as a valid and reliable instrument to measure health-
score that are based on information from the SF-12v2 question- related quality of life. Recently, MCS has also been used in the
naire, a component of the SOEP, which includes twelve questions economic literature where it was shown to be correlated with, e.g.,
on mental and physical health. All items capture the general cur- unemployment (Schmitz, 2011; Reichert and Tauchmann, 2011),
rent mental and physical health status since all questions relate and unemployment of spouses (Marcus, 2013). MCS and PCS were
to the past four weeks, see the questionnaire in Table A2 in the first introduced in the SOEP in 2002 and subsequently sampled
Appendix. Answers to these questions are collapsed into the Men- every other year. This is why we restrict our observation period
tal Component Summary Scale (MCS) and the Physical Component to the years 2002–2010.
Summary Scale (PCS) by explorative factor analysis (see, Andersen
9 10
This is due to the very low number of observations. Moreover, these 19 indi- The physical component comprises: Physical fitness (2 Questions), general
viduals caring throughout in our sample exhibit a mean MCS of 45.81 (compared health, bodily pain, role physical (2). The mental component comprises: Mental
to 49.38 overall). Thus, they do not affect the results in a quantitatively important health (2), role emotional (2), social functioning, vitality. See the questionnaire in
way. Table A2 in the Appendix.
Table 3
Care duration.
Years of consecutive care as of t = 0 1 2 3 4 5 6 7 8 Total
Uncensored Observations 348 107 35 29 8 7 6 - 542

Share 65% 19% 7% 6% 1% 1% 1% - 100%
Censored Observations 238 183 77 90 37 39 12 19 693

Share 35% 26% 12% 13% 5% 5% 1% 3% 100%
thereof:
Left censored Observations 80 27 16 11 10 6 4 19 173
Share 46% 16% 9% 7% 6% 3% 2% 1% 100%
Total Observations 586 290 112 119 45 46 18 19 1235

Share 47% 23% 9% 10% 4% 4% 1% 2% 100%
Source: SOEP, own calculations. Uncensored individuals did not provide care in t = −1 and stopped caregiving some time before t = 7. Therefore, the maximum observable
care duration is 7 years. In contrast to the empirical analysis in the rest of the paper, this table uses information up to the wave of 2011 or t = 8 in order to be able to calculate
the number of individuals who exactly care for 7 years.
We now turn to the selection of the control variables. Taking on the SOEP and in years after the treatment assignment,13 they are
the burden of care could theoretically be modeled as a three-stage useful controls because these measures are supposed to be stable
process. Women provide care if (i) they need to. Given that they over a shorter period of time. The individual average of each mea-
need to provide care, they (ii) must be willing to do so. Finally, (iii), sure is taken over all years as a proxy for time invariant personality.
they need to be able to provide care.11 Finally, on the third stage, the own health status determines the
At the first stage, the event that someone close becomes care ability to provide care. As discussed in Section ‘2, we control for pre-
dependent is a prerequisite of the need to provide informal care. treatment health (MCS and PCS). Moreover, we control for health
This first stage in general depends on the age and the intra-familial satisfaction and life satisfaction. All control variables are listed in
social environment. We model the social environment by using Table 4. Variables that might theoretically belong into the model
indicators whether parents are alive, their age as well as the num- but were not significant in the propensity score regression are left
ber of siblings.12 The latter can reduce the need to provide care for out. This holds, for instance, for income, the age of the father, the
frail parents as siblings could step in. Variables on this stage are number of brothers, or calendar year dummies.
sometimes employed as instruments for care provision in other
studies.
5. Results
At the second stage, given that someone close is in need of
care, the willingness to provide care can be modeled as a function
5.1. Matching quality
of socio-economic characteristics and personality traits. Socio-
economic characteristics grouped in here are, e.g., own age, marital
Table 4 reports descriptive statistics of all covariates for different
status, employment status, and level of education. Note, however,
subgroups. It reveals that the mean as well as the standard deviation
that family background variables might also belong to the first
of the covariates are significantly different in the unweighted base-
stage. For instance, singles do not need to care for a spouse or
line sample. Column 4 gives the standardized difference between
parents-in-law. Furthermore, we use character traits measured
both means. Without matching almost all confounders are differ-
in the Big Five Inventory (Big5), well-known in psychology for
ent at the 5% significance level between the carer and non-carer
being a proxy of human personality (see McRae and John, 1992
sample. In particular age, the age of the mother, and marital sta-
or Dehne and Schupp, 2007) as well as positive and negative reci-
tus exhibit large differences but also personality traits seem to be
procity. Although the SOEP captures each item of the Big5 with
quite strong predictors of care provision. The kernel matching algo-
relatively few questions in the 2005 and 2009 questionnaires, sur-
rithm equalizes both samples by assigning different weights to each
veys revealed sufficient validity and reliability (see Dehne and
member of the control group. In order to compute these weights, we
Schupp, 2007). The items of the Big5 are: neuroticism, the ten-
employ an Epanechnikov kernel with a bandwidth of 0.03. Whereas
dency of experience negative emotions; extraversion, the tendency
a bandwidth of 0.06 does not accomplish to equalize all covariates,
to be sociable; openness, the tendency of being imaginable and
a bandwidth of half the size balances every control variable to a
creative; agreeableness, the dimension of interpersonal relations
standardized bias around 5 or less.
and conscientiousness the dimension of being moral and orga-
As regards the propensity score, the regions of common sup-
nized (see Budria and Ferrer-i Carbonell, 2012). There are three
port are roughly [0.04, 0.14] for the stratum of women who did not
questions for each of these items which are gathered on a 7-item
provide care in t = −1 and [0.23, 0.87] for those who did provide
scale. Furthermore, there is positive reciprocity, the tendency of
care. The overlap within each stratum is good as we do not lose
being cooperative and negative reciprocity, the tendency of being
treatment observations by restricting the sample to the common
retaliatory. For each personality measure, the score is generated
support.14 The low probabilities in the first stratum are simply due
by averaging over the outcome of the corresponding questions per
to the small amount of caregivers. This indicates that there is a
individual. Although these questions are only prompted twice in
large unobserved component determining caregiving. But we argue
that this unobserved heterogeneity is not a big concern given the
estimation strategy outlined in Section 3. Yet, there is one advan-
tage of this fuzziness: It brings about a sufficiently large amount
11
Note that we do not explicitly model this three-stage process but that we just
have it in mind. Which variable belongs to which stage is then just a matter of
interpretation.
12 13
However, the number of brothers does not seem to play a role statistically. Thus, The Big5 are included in the surveys in 2005 and 2009, whereas questions on
in the empirical model we only focus on the number of sisters. An alternative spec- negative and positive reciprocity are asked in 2005 and 2010.
14
ification using that – among others – also uses the number of brothers can be found Of course, this also means that the required overlap condition stating that some
in the Supplementary Material. randomness is needed is ensured in our model (see Heckman et al., 1998).
Table 4
Descriptive statistics according to treatment and matching status.
Treated Controls Matched controls Standardized bias
mean SD mean SD mean SD unmatched sample matched sample
(0.06) (0.03)
Stage (I): care obligations

Age of mother
∈[30, 39] 0.01 0.09 0.02 0.16 0.01 0.10 -13.31 -5.15 -2.35
∈[40, 49] 0.03 0.18 0.10 0.30 0.04 0.20 -27.00 -9.33 -3.24
∈[50, 59] 0.08 0.27 0.13 0.34 0.08 0.28 -18.15 -6.47 -2.22
∈[60, 69] 0.12 0.32 0.12 0.32 0.11 0.32 0.25 0.96 1.12
∈[70, 79] 0.09 0.28 0.06 0.24 0.08 0.28 9.16 4.00 1.71
∈[80, 89] 0.09 0.28 0.02 0.14 0.08 0.28 30.58 6.81 1.15
∈[90, 99] 0.01 0.10 0.00 0.03 0.01 0.09 12.04 3.36 2.10
Mother alive 0.46 0.50 0.48 0.50 0.46 0.50 −5.72 −3.05 −0.97
Age of father
∈[30, 39] 0.00 0.07 0.01 0.09 0.01 0.07 −5.14 −2.17 −1.03
∈[40, 49] 0.02 0.12 0.08 0.27 0.03 0.16 −30.29 −10.98 −4.58
∈[50, 59] 0.04 0.20 0.10 0.30 0.05 0.22 −20.75 −7.05 −2.12
∈[60, 69] 0.07 0.25 0.08 0.27 0.07 0.25 −6.43 −2.13 −0.39
∈[70, 79] 0.04 0.19 0.04 0.19 0.04 0.19 −0.40 −0.84 −0.87
∈[80, 89] 0.01 0.11 0.01 0.10 0.01 0.11 2.34 0.44 −0.17
∈[90, 99] 0.00 0.05 0.00 0.01 0.00 0.03 7.10 4.80 4.80
Father alive 0.19 0.40 0.34 0.47 0.21 0.41 −32.50 −11.54 −4.01
Number of sisters 1.08 1.21 1.09 1.21 1.09 1.23 −1.12 −1.28 −1.20
Partner existent 0.81 0.39 0.68 0.47 0.80 0.40 29.91 9.71 2.28
Age of partner 47.73 26.02 35.53 27.08 46.56 25.96 45.93 15.60 4.40
Stage (II): willingness to provide care

NEURO 4.53 0.67 4.37 0.72 4.51 0.71 22.53 7.58 2.05
CONSC 6.04 0.74 5.97 0.79 6.04 0.77 9.76 2.74 0.29
AGREE 5.61 0.83 5.58 0.84 5.60 0.84 3.66 1.40 0.90
OPENN 4.37 1.15 4.51 1.12 4.40 1.13 −11.92 −4.86 −2.20
EXTRA 5.02 0.91 5.04 0.95 5.02 0.94 −1.97 −1.07 −0.49
Positive reciprocity 5.66 0.95 5.55 0.99 5.67 0.96 11.40 2.92 −0.39
Negative reciprocity 2.71 1.19 2.87 1.24 2.73 1.22 −12.81 −4.20 −1.34
Acceptance of private funding 3.31 0.81 3.29 0.8 3.31 0.82 2.68 1.12 0.44
Age 56.28 12.85 49.57 16.34 55.31 13.56 45.68 16.86 6.65
Age squared 3333.01 1419.70 2724.16 1691.40 3242.65 1471.10 38.99 14.54 5.79
Married 0.80 0.40 0.63 0.48 0.79 0.41 39.14 13.06 3.67
Divorced 0.07 0.25 0.09 0.28 0.07 0.25 −7.60 −2.67 −0.77
Single 0.07 0.25 0.17 0.38 0.08 0.27 −32.80 −11.03 −3.57
Children hh 0.18 0.38 0.30 0.46 0.19 0.40 −29.50 −11.17 −4.27
Educ general 0.17 0.37 0.17 0.38 0.17 0.37 −2.31 −0.99 −0.73
Educ middle 0.55 0.50 0.49 0.50 0.54 0.50 11.66 4.11 1.45
Foreign 0.04 0.20 0.06 0.24 0.05 0.21 −9.91 −4.15 −2.07
West 0.69 0.46 0.75 0.43 0.70 0.46 −13.54 −5.43 −2.10
Full time 0.13 0.34 0.26 0.44 0.15 0.35 −33.85 −11.66 −3.85
Stage (III): ability to provide care

MCS 47.38 10.52 49.47 10.12 47.59 10.91 −20.23 −6.82 −2.01
PCS 46.44 10.01 49.02 10.14 46.79 10.47 −25.57 −9.63 −3.48
Satisfaction health 6.19 2.21 6.58 2.17 6.25 2.24 −17.62 −6.56 −2.42
Satisfaction life 6.60 1.85 6.97 1.76 6.62 1.95 −20.64 −6.56 −1.07
N 1,235 29,942 29,942
The standardized difference is calculated according to: Diff = 100 · x̄1 −x̄0
where 0.06 and 0.03 refer to the employed Kernel bandwidth. While the bandwidth of 0.06
1 ( 2 + 2 )
2 1 0
is only shown for sake of illustration, 0.03 is used in the estimations.
of observations in the control group having a similar value of the those who did care in t = −1. The confidence bands are wider for
estimated propensity score. This provides a hint that the results are care continuers, since this is a much smaller group. The weighted
not sensitive to a different choice of the matching methods. average over both effects has confidence bands comparable to the
black ones in Fig. 3.
The effects are remarkably similar for both groups. If a woman
5.2. Estimation results cares at least 2 h per day, her mental health score decreases by
2.00 units (or 20% of a standard deviation, SD)15 in the first year,
The baseline estimation results are reported in Fig. 3 for both all other things equal. Three years after treatment assignment, this
outcome variables MCS (3(a)) and PCS (3(b)). For convenience, effect reduces to 16% of a SD before settling at below 12% five and
we restrict this section to a graphical presentation of the results. seven years afterwards. That is, women who provide care in t = 0 can
Table A1 in the Appendix gives an overview of all results shown expect to have a reduced mental health score by 12% of a SD seven
in this section. The dotted lines denote 95% confidence bands for
the corresponding effect. Fig. 3 reports the results for both pre-
treatment strata separately. Care starters (black points) are those
who did not care in t = −1 and care continuers (light grey points) 15
For convenience we already report the average effect over both groups here.
(a) MCS (b) PCS
Fig. 3. Baseline results MCS and PCS. Source: SOEP. Own calculations. Note: The dotted lines indicate 95% confidence bands.
years after. The confidence bands indicate significant results at the In order to test if this drives the results, we exclude all individ-
5% level one and three years after assignment to treatment. The uals from the control group that provided care in any year between
effects five and seven years after are insignificant because the point t = 1 and t = 7. That is, we only use individuals in the lowest path in
estimates attenuate but in the first place because the numbers of Fig. 2(b). In principle, this is not a desirable specification as it bases
observations strongly drop. The magnitude of the effect after seven the control group definition on later outcomes. Thus, it should only
years, however, is still 60% the amount of the baseline effect and be regarded as a brief check whether these individuals drive the
thus, not negligible. All in all it is fair to note that, independent of results observed above. Fig. 4(b) shows that this is not the case. The
the previous care status, there is a considerable short-term effect of results are largely the same.
care provision on mental health (in line with findings from previous The results suggest a significant short-term effect of informal
studies, e.g., Coe and van Houtven, 2009) which decreases over time care-provision on mental health while there is a smaller and not
without being completely irrelevant in its extent to those who care. significant medium-term effect. Given that the vast majority of
In contrast, for PCS (right panel), there is basically a zero effect individuals provide care for about one to three years, the main
throughout all periods and for both strata, providing evidence for pathway of these effects is probably the following one. Contem-
negligible effects of informal care provision on physical health. poraneously, care provision is a mentally burdensome task. The
Given the absence of physical health effects, we restrict our analysis short-term effects are mostly generated by individuals who just
to mental health in the following. Moreover, we only report aver- stopped to provide care or who are still providing care in t = 1. As
aged effects of both strata of care provision in t = −1. Fig. 4 presents to be expected, this effect increases in care intensity. Yet, after the
the results for alternative daily care intensities and different def- care episode ceased, individuals recover and their mental health
initions of the control group.16 In Fig. 4(a) we compare the effect status approaches former levels.
when care provision of at least 2 h per day are used to define the The short-term effect is not necessarily entirely due to care pro-
treatment indicator (light grey-dashed line, the baseline specifica- vision. It might be a joint effect of care and the observation of the
tion) with 1 h per day (black line) and 3 h per day (dark grey-dashed decline of a beloved person. As most of the previous literature,
line). There are basically no differences in the effect between 1 and we cannot disentangle the family effect from the active caregiving
2 h of care as a definition. As regards 3 h of care we find a consid- effect. As results of Bobinac et al. (2010) suggest, the overall effect
erably stronger short-term effect with a reduction of MCS by 31% is a mixture of both but a caregiving effect remains after control-
of a sd. This probably reflects a higher burden of higher care inten- ling for the family effect. Yet, this does not affect the interpretation
sities. Subsequently, however, the effect does not remain on this of the medium-run effect of almost no mental health impairment
high level. It immediately drops back to regions similar to those a couple of years after care provision. Given that the effect in t = 7
for 1 and 2 h. Most notably, the qualitative result of a considerable is very small, it can be concluded that there is less evidence for
short-term effect and a much smaller medium-term effect remains a scarring effect of care provision. Moreover, since only handful
unchanged regardless of the care intensity.17 individuals in the sample care throughout the entire observation
The definition of treatment and control group only in t = 0 allows period, this result can apparently not be explained by an adaptation
for cases where individuals in the control group start to provide care effect of care providers to their new situation.
in later years. This is in fact the case for some 15% of all observations In Section 4 we mentioned that we cannot stratify the analysis
in the control group. It might be suspected that these individuals with respect to the care recipient as we do not have information
suffer from a short-term mental health drop later which, compared on who is being cared. We can, however, approach such an anal-
with the effects in the treatment group, lead to the observed rel- ysis by splitting the sample into caregivers below and above the
ative decline in the mental health drop of the treatment group. age of 60. The former group has a higher likelihood to care for a
parent while the latter should be more likely to care for a spouse.
Note that stronger restrictions such an age cutoff at 70 or groups
such as unmarried women with at least one parent alive are hardly
16
In the Supplementary Material, we also report the results for females caring 4 h
feasible due to strongly reduced numbers of observations. Fig. 5
and more. The results are comparable.
17
Although not shown here, also the PCS results are robust to these different
shows the effect over time for both subgroups. Initially, they coin-
definitions. cide nearly perfectly. Five years after care is observed, they deviate
Fig. 4. Alternative definitions of treatment and control groups (MCS only). Source: SOEP. Own calculations. Note: The dotted lines indicate 95% confidence bands.
from each other. Whereas younger carers drop back almost to the parent alive who arguably can be identified as caring for their par-
initial level, for older carers the impact on their mental score is ents.
even stronger. The results could be interpreted such that the active
caregiving effect does not depend on the care recipient. However,
a likely family effect might arguably be stronger in case of carepro- 6. Sensitivity analysis
vision for a spouse than for oldest old parents. However, the effects
come closer after seven years and due to large confidence bands Thus far, we have argued that our estimation strategy allows us
one should interpret these results cautiously. to interpret the results in a causal manner since, by fully exploiting
Altogether, the results from this section could be interpreted as the panel information in the SOEP, the conditional independence
good news. While there is a considerable negative short-term effect assumption is likely to hold. However, this inherently untestable
of contemporaneous caregiving, the scarring effect is less likely to assumption might nevertheless fail. For example, in the context of
be prevalent. One negative interpretation for these results could, care, it might be particularly challenging to properly control for
however, be an increased consumption of antidepressants as found intrinsic willingness to provide care. Yet, the conditional indepen-
by Van Houtven et al. (2005) and Schmitz and Stroka (2013) for the dence assumption is not necessarily an “all or nothing” assumption
short run. If this would hold for the long run, the mental health and there might be different degrees of its violation. To examine
score might increase over time due to drug consumption and not to what extent the magnitude and the significance of our results
due to improved health. Whether this is the case or not requires depend on the potential exclusion of a relevant variable, we follow
long-term data on care and drug consumption and is left for future an approach by Ichino et al. (2008) who refined the suggestions for
research. sensitivity analyses by Rosenbaum and Rubin (1983) and Imbens
In the Supplementary Material we report results from alterna- (2003). This analysis is also in the spirit of the one suggested by
tive specifications of the propensity score, the treatment indicator Altonji et al. (2005) without the need to make strong parametric
and a subgroup analysis for unmarried women with at least one assumptions.
Assume that the conditional independence assumption does not
hold but that the failure is due to an unobserved variable U. If
we could condition on it, we would be able to restore conditional
independence:
Y0 ⊥⊥ T |(X, U).
Hence, all the unobserved heterogeneity that results in bias is cap-

tured by U. For simplicity, Ichino et al. (2008) follow Rosenbaum
and Rubin (1983), who proposed a binary U.
We simulate U by drawing 200 times from the Bernoulli dis-
tribution for each individual and estimate the ATT 200 times,
conditioning on X as before, but also on U.18 In simulating U, we
make sure that it is both correlated with T and Y such that leaving
it out would result in a violation of the conditional independence
assumption. Taking the average over all effects provides us with
18
This section contains a non-technical and intuitive discussion of the analysis. A
Fig. 5. Alternative definitions of treatment and control groups (MCS only). Source: more detailed account is provided in the Supplementary Material published online.
SOEP. Own calculations. Note: The dotted lines indicate 95% confidence bands. For an extensive treatment, refer to Ichino et al. (2008).
robust point estimates as well as standard errors of the average

treatment effect on the treated.19
The major question is how strong and in what direction the cor-
relation between U and Y resp. T should be defined. We follow one
of the two approaches suggested by Ichino et al. (2008) and set it
such that we control the “outcome effect” (own effect of U on Y)
and the “selection effect” (effect of U on T). As an illustration, think
of U as general intrinsic willingness to provide care: U = 1 indicates
generally willing, U = 0 means not willing. This unobserved variable
certainly has a positive selection effect such that willing people are
more likely to provide care. It may also have a positive outcome
effect if the general willingness is positively correlated with health
endowment independent of treatment.
The magnitudes of outcome and selection effects could be arbi-
trarily chosen. One way to find reasonable values is to use observed
binary variables in the data set and calculate the observed selection
and the outcome effects of these variables. This gives an indica-
tion of the distribution of selection and outcome effects in the data.
To bound these effects, one could argue that the unobserved vari- Fig. 6. Results of the sensitivity analysis (MCS). Source: SOEP. Own calculations.
able U should not have much larger selection and outcome effects Note: The dotted lines indicate 95% confidence bands. Strong positive selection
than important observed variables, for which we have a long vector, assumes a selection effect of s = 0.25 and an outcome effect of d = 0.25. See the Supple-
including age, education, and initial health. mentary Material for exact definitions of s and d and justifications for these values.
Strong negative selection assumes a selection effect of s = −0.25 and an outcome
We compute these effects for all variables in the sample. Results
effect of d = 0.25.
are reported in the Supplementary Material. We then choose the
parameters to simulate U such that is has an effect on treatment and
outcome in the same magnitude as the control variable with the 7. Conclusion
highest effect (which is the potential caregiver’s age). With these
calibrations, no other confounder in the sample (except for age) This paper examines whether informal care provision affects the
features such a high effect on mental health and no other makes report of measures that indicate mental or physical strain among
people select into treatment like the simulated binary confounder women. We use the German Socio-Economic Panel that identifies
U. The first assumed selection effect reflects a positive selection informal caregivers by the daily time spent caring. We define care-
into treatment, i.e., more people with high values of U will take givers as women who care at least 2 h per weekday (but other
the treatment. Together with a positive outcome effect, we should definitions lead to a similar picture). We evaluate the impact of
underestimate effects of care provision on health. The second pair caregiving on health by help of a regression adjusted matching
of selection and outcome effects reflects a negative selection into technique. The problems of unobserved heterogeneity and reverse
treatment leading to overestimation if this was neglected by the causality are tackled by exploiting the panel structure of the data
analysis so far. set and controlling for pre-treatment outcome as well as stratifying
Fig. 6 presents the results of both specifications and the baseline by pre-treatment care status.
specification for MCS. Including a confounder U with characteristics While we do not find effects of informal care on physical health
that lead to a positive selection into treatment (the dark grey- in the short- and in the medium-run, our results suggest that there
dashed line) leads to larger effects of care provision than in the are considerable short-term effects of informal care provision on
baseline case while we find weaker effects when including a con- mental health which, however, attenuate over time. Five years after
founder that induces as negative selection (the light grey-dashed care provision the effect is still negative but smaller and insignifi-
line). The lines are parallel-shifted by the confounder. However, in cant. It seems that, contemporaneously, care provision is a mental
all three cases, we find a significant (both statistically and econom- burden but there is not a large scarring effect. The sensitivity anal-
ically) short-term effect of care provision on mental health which ysis according to Ichino et al. (2008) suggests that these results
reduces over time. After seven years, the effects are insignificant for are stable even for considerable deviations from the conditional
most specifications, marginally significant though for the one with independence assumption: the effects are still similar in magni-
a confounder inducing a positive selection into treatment. Thus, if tude even if we falsely have not incorporated a confounder that is
there are further confounders that point in the same direction as stronger than any other one that we have controlled for before.
most of the variables in our sample, our result will define a lower We contribute to the current debate on how to realign the
bound. Furthermore, this would raise the likelihood of significant care system in Germany and countries with similar demographic
medium-term effects. developments. Our results suggest that there are considerable
Thus, as long as unobserved effects that are necessarily left out in short-term health effects and although it seems to be good news
our analysis do not have a drastically higher impact than observed that the effects are abating, it should not be concluded that there is
control variables, we find that the average treatment effects we no need to improve the system as apart from health there are other
received in the main analysis are robust. Given that we control for additional effects of care provision (e.g., labour force participation
a large set of important determinants of care provision, it seems and wages) that are not analysed here. The current German govern-
unlikely that there are actually unobserved variables that have ment put the enhancement of informal care high on the agenda.20
such a drastic effect or an effect much stronger than observable In particular, the supply of low-threshold services is planned to
covariates. be expanded and increases in benefits from the long-term care
insurance are meant to be spent on those. These services are, e.g.,
additional help in the household, contact persons in case of any
19
We use a modified version of the user-written Stata command sensatt
20
(Nannicini, 2007). http://www.bmg.bund.de/ministerium/presse/english-version.html
problems, or professional short-term care (also overnight) in case health outcome in seven years (irrespective of future events that I
of short-term absence of informal care providers due to sickness, cannot control today).
obligations in the job, or holidays. Thus, while family members
will certainly continue to play an important role in care provision, Appendix A.
these measures are thought to assist them and to reduce the most
stressful aspects of care. Tables A1 and A2
The measured effect in this study is an average effect over dif-
ferent groups of care providers. Schmitz and Stroka (2013), for
instance, focus on individuals who not only provide informal care Table A1
but also work full-time. This double burden might well also have Table of results.
health effects in the longer run. This question is left for future t=1 t=3 t=5 t=7
research. The main limitations in this study arise from the imperfect
Care 2 h per day (baseline) −2.00*** −1.64*** −1.01 −1.19
data set. Both measures of care provision as well as health indica- (0.39) (0.47) (0.62) (0.86)
tors are self-reported and potentially measured with error. We do ...care starters −2.03*** −1.67*** −1.02 −1.21
not observe any characteristics of the care recipient. Hence, we can- (0.40) (0.49) (0.64) (0.88)
not distinguish between the family effect that occurs just because a ...continued care −1.42** −0.93* −0.70 −0.94
(0.61) (0.74) (0.89) (1.51)
close relative is in need of care and the caregiving effect. However, Care 3 h per day −3.02*** −1.44** −1.00 −1.64
this should not qualitatively affect the interpretation of the already (0.53) (0.68) (0.78) (1.12)
small medium-term effect. Likewise, as it is not observed whether Care 1 hr per day −1.90*** −1.59*** −0.48 −0.97
care recipients receive additional professional care or only informal (0.31) (0.39) (0.48) (0.67)
Observations 28,622 20,288 12,254 5,552
care, we cannot discriminate between cases in which the caregiver
assists professional care and in which she is the only care provider. Only never carers in control group
Moreover, due to data restrictions we are not able to identify the Care 2 h per day −2.08*** −1.56*** −1.25** −0.787
(0.27) (0.34) (0.46) (0.69)
cumulative effect of care provision for many consecutive years. This
Observations:
might go along with even long-run health impairments. However, 25,914 18,464 11,301 5,166
our representative data suggest that only a very small group of
PCS as outcome:
women is faced with the need (and willingness) to provide care Care 2 h per day 0.14 0.14 0.08 −1.01
for many consecutive years. Moreover, we argue that our approach (0.33) (0.40) (0.49) (0.84)
allows us to answer a question that is more relevant from an indi- Source: SOEP, own calculations. Note: * p < 0.1; ** p < 0.05; *** p < 0.01 indicate the cor-
vidual perspective: if I provide care today, what is my expected responding significance level. Standard errors are in parantheses.
Table A2
SF-12v2 questionnaire in the SOEP.
Very good Good Satisfactory Poor Bad
How would you describe your current health?
Greatly Slightly Not at all – –
When you ascend stairs, i.e. go up several floors on foot: Does your
state of health affect you greatly, slightly or not at all?
And what about having to cope with other tiring everyday tasks,
i.e. where one has to lift something heavy or where one requires
agility: Does your state of health affect you greatly, slightly or
not at all?
Please think about the last four weeks. How often did it occur Always Often Sometimes Almost never Never
within this period of time, . . .
♦that you felt rushed or pressed for time?

♦that you felt run-down and melancholy?
♦that you felt relaxed and well-balanced?
♦that you used up a lot of energy?
♦that you had strong physical pains?
♦that due to physical health problems
. . .you achieved less than you wanted to
at work or in everyday tasks?
. . .you were limited in some form
♦that due to mental health or emotional problems
. . .you achieved less than you wanted to
. . .you carried out your work or everyday tasks
less thoroughly than usual?
♦that due to physical or mental health problems you were limited
socially, i.e., in contact with friends, acquaintances or relatives?
Note. Source: SOEP Individual question form. Available at: http://panel.gsoep.de/soepinfo2008/.

Appendix B. Supplementary Data Lechner, M., 2009b. Sequential causal models for the evaluation of labor market
programs. Journal of Business and Economic Statistics 27, 71–83.
Lechner, M., Miquel, R., 2010. Identification of the effects of dynamic treatments by
Supplementary data associated with this article can be found, in sequential conditional independence assumptions. Empirical Economics 39 (1),
the online version, at http://dx.doi.org/10.1016/j.jhealeco.2015.03. 111–137.
002 Lechner, M., Miquel, R., Wunsch, C., 2011. Long-run effects of public sector sponsored
training in West Germany. Journal of the European Economic Association 9 (4),
742–784.
References Lee, S., Colditz, G.A., Berkman, L.F., Kawachi, I., 2003. Caregiving and risk of coronary
heart disease in U.S. women: a prospective study. American Journal of Preventive
Altonji, J.G., Elder, T.E., Taber, C.R., 2005. Selection on observed and unobserved Medicine 24 (2), 113–119.
variables: assessing the effectiveness of catholic schools. Journal of Political Leigh, A., 2010. Informal care and labor market participation. Labour Economics 17
Economy 113 (1), 151–184. (1), 140–149.
Alzheimer’s Disease International, 2013. World Alzheimer Report 2013. Journey Marcus, J., 2013. The effect of unemployment on the mental health of spouses
of Caring – An Analysis of Long-Term Care for Dementia. Technical Report. – evidence from plant closures in Germany. Journal of Health Economics 32,
Alzheimer’s Disease International (ADI), London. 546–558.
Andersen, H.H., Mühlbacher, A., Nübling, M., Schupp, J., Wagner, G.G., 2007. Compu- Marcus, J., 2014. Does job loss make you smoke and gain weight? Economica 324
tation of standard values for physical and mental health scale scores using the (81), 626–648.
SOEP version of SF-12v2. Schmollers Jahrbuch 127, 171–182. McRae, R.R., John, O.P., 1992. An introduction to the five factor model and its appli-
Augurzky, B., Reichert, A., Schmidt, C.M., 2012. The effect of a bonus program for cations. Journal of Personality and Social Psychology 60 (2), 175–215.
preventive health behavior on health expenditures. Ruhr Economic Papers 373, Meng, A., 2013. Informal home care and labor-force participation of household
Essen. members. Empirical Economics 44 (2), 959–979.
Bakx, P., de Meijer, C., Schut, F., van Doorslaer, E., 2015. Going formal or informal, Nannicini, T., 2007. Simulation-based sensitivity analysis for matching estimators.
who cares? The influence of public long-term care insurance. Health Economics Stata Journal 7 (3), 334–350.
24 (6), 631–643. Reichert, A., Tauchmann, H., 2011. The Causal Impact of Fear of Unemployment on
Bang, H., Robins, J.M., 2005. Doubly robust estimation in missing data and causal Psychological Health. Ruhr Economic Papers 266.
inference models. Biometrics 61 (4), 962–973. Rosenbaum, P.R., Rubin, D.B., 1983. Assessing sensitivity to an unobserved binary
Beach, S.R., Schulz, R., Yee, J.L., Jackson, S., 2000. Negative and positive health effects covariate in an observational study with binary outcome. Journal of the Royal
of caring for a disabled spouse: longitudinal findings from the caregiver health Statistical Society, Series B: Methodological 45 (2), 212–218.
effects study. Psychology and Aging 15 (2), 259–271. Rothgang, H., 2010. Social insurance for long-term care: an evaluation of the German
Bobinac, A., van Exel, N.J.A., Rutten, F.F., Brouwer, W.B., 2010. Caring for and caring model. Social Policy and Administration 44 (4), 436–460.
about: disentangling the caregiver effect and the family effect. Journal of Health Rubin, D.B., 1974. Estimating causal effects of treatments in randomized and non-
Economics 29 (4), 549–556. randomized studies. Journal of Educational Psychology 56 (5), 688–701.
Bolin, K., Lindgren, B., Lundborg, P., 2008. Your next of kin or your own career? Rubin, D.B., 1979. Using multivariate matched sampling and regression adjustment
Caring and working among the 50+ of europe. Journal of Health Economics 27 to control bias in observational studies. Journal of the American Statistical Asso-
(3), 718–738. ciation 74 (366), 318–328.
Budria, S., Ferrer-i Carbonell, A., 2012. Income comparisons and non-cognitive skills. Salyers, M.P., Bosworth, H.B., Swanson, J.W., Lamb-Pagone, J., Osher, F.C., 2000. Reli-
In: SOEPpapers No. 441., pp. 1–29. ability and validity of the sf-12 health survey among people with severe mental
Carmichael, F., Charles, S., 2003. The opportunity costs of informal care: does gender illness. Medical Care 38 (11), 1141–1150.
matter? Journal of Health Economics 22 (5), 781–803. Schmidt, M., Schneekloth, U., 2011. Abschlussbericht zur Studie “Wirkungen des
Coe, N.B., van Houtven, C.H., 2009. Caring for mom and neglecting yourself? The Pflege-Weiterentwicklungsgesetzes”.
health effects of caring for an elderly parent. Health Economics 18 (9), 991–1010. Schmitz, H., 2011. Why are the unemployed in worse health? The causal effect of
Colvez, A., Joel, M.-E., Ponton-Sanchez, A., Royer, A.-C., 2002. Health status and work unemployment on health. Labour Economics 18 (1), 71–78.
burden of Alzheimer patients’ informal caregivers: comparisons of five different Schmitz, H., Stroka, M.A., 2013. Health and the double burden of full-time work and
care programs in the European Union. Health Policy 60 (3), 219–233. informal care provision: evidence from administrative data. Labour Economics
Dehne, M., Schupp, J., 2007. Persoenlichkeitsmerkmale im Sozio-ökonomischen 24, 305–322.
Panel (SOEP) – Konzept, Umsetzung und empirische Eigenschaften. Technical Schneekloth, U., Leven, I., 2003. Hilfe und Pflegebedürftige in Privathaushalten in
report. Deutschland 2002.
Di Novi, C., Jacobs, R., Migheli, M., 2013. The Quality of Life of Female Informal Care- Schulz, E., 2010. The Long-Term Care System for the Elderly in Germany. DIW Dis-
givers: From Scandinavia to the Mediterranean Sea. CHE Research Paper 84. cussion Paper 1039, DIW Berlin.
Centre for Health Economics, University of York. Schulz, R., O’Brien, A., Bookwala, J., Fleissner, K., 1995. Psychiatric and physical
Do, Y.K., Norton, E.C., Stearns, S., Houtven, C.H.V., 2015. Informal care and caregiver’s morbidity effects of dementia caregiving: prevalence, correlates, and causes.
health. Health Economics 24 (2), 224–237. Gerontologist 35 (6), 771–791.
Dunkin, J.J., Anderson-Hanley, C., 1998. Dementia caregiver burden – a review of Shaw, W.S., Patterson, T.L., Ziegler, M.G., Dimsdale, J.E., Semple, S.J., Grant, I., 1999.
the literature and guidelines for assessment and intervention. Neurology 51 (1), Accelerated risk of hypertensive blood pressure recordings among Alzheimer
53–60. caregivers. Journal of Psychosomatic Research 46 (3), 215–227.
Gallicchio, L., Siddiqi, N., Langenberg, P., Baumgarten, M., 2002. Gender differences Stephen, M.A., Townsend, A.L., Martire, L.M., Druley, J.A., 2001. Balancing parent
in burden and depression among informal caregivers of demented elders in the care with other roles: interrole conflict of adult daughter caregivers. The Jour-
community. International Journal of Geriatric Psychiatry 17 (2), 154–163. nals of Gerontology, Series B: Psychological Sciences and Social Sciences 56 (1),
García-Gómez, P., 2011. Institutions, health shocks and labour market outcomes P24–P31.
across Europe. Journal of Health Economics 30 (1), 200–213. Tennstedt, S., Cafferata, G.L., Sullivan, L., 1992. Depression among caregivers of
Gill, S.C., Butterworth, P., Rodgers, B., Mackinnon, A., 2007. Validity of the men- impaired elders. Journal of Ageing and Health 4 (1), 58–76.
tal health component scale of the 12-item short-form health survey (mcs-12) Van den Berg, B., Ferrer-i Carbonell, A., 2007. Monetary valuation of informal care:
as measure of common mental disorders in the general population. Psychiatry the well-being valuation method. Health Economics 16 (11), 1227–1244.
Research 152 (1), 63–71. van den Berg, B., Fiebig, D.G., Hall, J., 2014. Well-being losses due to care-giving.
Heckman, J., Ichimura, H., Smith, J., Todd, P., 1998. Characterizing selection bias using Journal of Health Economics 35 (0), 123–131.
experimental data. Econometrica 66 (5), 1017–1098. Van Houtven, C., Wilson, M., Clipp, E., 2005. Informal care intensity and caregiver
Heitmueller, A., 2007. The chicken or the egg? Endogeneity in labour market par- drug utilization. Review of Economics of the Household 3 (4), 415–433.
ticipation of informal carers in England. Journal of Health Economics 26 (3), Van Houtven, C.H., Coe, N.B., Skira, M.M., 2013. The effect of informal care on work
536–559. and wages. Journal of Health Economics 32 (1), 240–252.
Heitmueller, A., Inglis, K., 2007. The earnings of informal carers: wage differentials Vilagut, G., Forero, C.G., Pinto-Meza, A., Haro, J.M., de Graaf, R., Bruffaerts, R., Kovess,
and opportunity costs. Journal of Health Economics 26 (4), 821–841. V., de Girolamo, G., Matschinger, H., Ferrer, M., Alonso, J., 2013. The mental
Ho, S.C., Chan, A., Woo, J., Chong, P., Sham, A., 2009. Impact of caregiving on health and component of the short-form 12 health survey (sf-12) as a measure of depres-
quality of life: a comparative population-based study of caregivers for elderly sive disorders in the general population: results with three alternative scoring
persons and noncaregivers. The Journals of Gerontology, Series A: Biological methods. Value in Health 16 (4), 564–573.
Sciences and Medical Sciences 64 (8), 873–879. Wagner, G.G., Frick, J.R., Schupp, J., 2007. The German Socio-Economic Panel
Ichino, A., Mealli, F., Nannicini, T., 2008. From temporary help jobs to permanent Study (SOEP), scope, evolution, and enhancements. Journal of Applied Social
employment: what can we learn from matching estimators and their sensitivity. Science Studies (Schmollers Jahrbuch: Zeitschrift für Wirtschafts- und Sozial-
Journal of Applied Econometrics 23, 305–327. wissenschaften) 127 (1), 139–169.
Imbens, G.W., 2003. Sensitivity to exogeneity assumption in program evaluation. Ware, J.E., Kosinski, M., Keller, S.D., 1996. A 12-item short-form health survey: con-
American Economic Review 93 (2), 126–132. struction of scales and preliminary tests of reliability and validity. Medical Care
Lechner, M., 2009a. Long-run labour market and health effects of individual sport 34 (3), 220–233.
activities. Journal of Health Economics 28, 839–854.

Tobacco control campaign in Uruguay: Impact on smoking cessation

during pregnancy and birth weight
Jeffrey E. Harris a,∗ , Ana Inés Balsa b , Patricia Triunfo c
a
Department of Economics, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
b
Departamento de Economía, Facultad de Ciencias Empresariales y Economía, Universidad de Montevideo, Montevideo 11500, Uruguay
c
Departamento de Economía, Facultad de Ciencias Sociales, Universidad de la República, Montevideo 11200, Uruguay
Article history: We analyzed a nationwide registry of all pregnancies in Uruguay during 2007–2013 to assess the impact of
Received 23 April 2014 three types of tobacco control policies: (1) provider-level interventions aimed at the treatment of nicotine
Received in revised form 13 April 2015 dependence, (2) national-level increases in cigarette taxes, and (3) national-level non-price regulation
of cigarette packaging and marketing. We estimated models of smoking cessation during pregnancy at
the individual, provider and national levels. The rate of smoking cessation during pregnancy increased
from 15.4% in 2007 to 42.7% in 2013. National-level non-price policies had the largest estimated impact
on cessation. The price response of the tobacco industry attenuated the effects of tax increases. While
I18
I12
provider-level interventions had a significant effect, they were adopted by relatively few health centers.
D12 Quitting during pregnancy increased birth weight by an estimated 188 g. Tobacco control measures had
no effect on the birth weight of newborns of non-smoking women.
Economic evaluation
Cigarette taxes
Package warnings
Advertising bans
Tobacco control
1. Introduction all advertising and promotion of tobacco products, mandated that

pictograms with warnings cover 80% of the front and back of
The tobacco epidemic continues to represent a serious public every pack, banned misleading marketing terms such as “light” and
health threat throughout the world. By one recent estimate, the “mild,” and outlawed multiple versions of the same brand such as
worldwide annual mortality burden has already reached 5 million Silver or Blue. Tobacco taxes were increased, and all healthcare
deaths from direct tobacco smoking and another 600,000 deaths providers were required to offer treatment for nicotine depend-
attributable to the effects of environmental smoke (World Health ence.
Organization, 2012). Within the next 20 years, annual deaths from In a previous report, two of us (JH and PT) found that Uruguay’s
tobacco are projected to continue to rise to 8 million, of which more comprehensive nationwide antismoking campaign was associated
than 80% will occur in low- and middle-income countries (Mathers with a substantial, unprecedented decrease in tobacco use (Abascal
et al., 2008). et al., 2012). During 2005–2011, per capita cigarette consumption
Beginning in 2005, Uruguay instituted a series of aggressive anti- decreased by 4.3% per year, while the 30-day prevalence of cigarette
smoking measures that placed this small South American country use among students aged 13–17 years and the overall population
of 3.3 million inhabitants in the forefront of tobacco control policy prevalence of current tobacco use declined at annual rates of 8.0%
worldwide. By 2012, the Uruguayan government had prohibited and 3.3%, respectively. The observed declines in each of these three
smoking in enclosed public spaces and workspaces, banned nearly indicators of tobacco use were significantly larger than those seen
in neighboring Argentina, a culturally similar country that had not
conducted a comprehensive antismoking campaign and served as
∗ Corresponding author. Tel.: +1 617 253 2677; fax: +1 617 253 6915.
a control.
E-mail addresses: jeffrey@mit.edu (J.E. Harris), abalsa@um.edu.uy (A.I. Balsa),
While our previous study contributed to the evaluation of
patricia.triunfo@cienciassociales.edu.uy (P. Triunfo). the overall impact of Uruguay’s tobacco control campaign, it did
J.E. Harris et al. / Journal of Health Economics 42 (2015) 186–196 187
not address the quantitative contributions of individual campaign 2. Background and data
components. Pursuing that objective here, we classify the inter-
ventions implemented in Uruguay during 2007–2013 into three 2.1. Nationwide anti-smoking policies
categories: (1) provider-level interventions aimed at the treatment
of nicotine dependence, (2) national-level increases in cigarette In 2005, one year after the legislature had ratified the Frame-
taxes, and (3) national-level non-price regulation of cigarette pack- work Convention on Tobacco Control, Uruguay’s newly elected
aging and marketing. We study the effects of these individual administration launched a National Program for Tobacco Control
campaign components on a critical target population – pregnant that formed the basis for a succession of progressively more strin-
women. gent tobacco control policies (Abascal et al., 2012). In March 2006,
Studying the population of pregnant women is important not all enclosed public spaces and all public and private workspaces
only for the well-recognized adverse health consequences of were declared 100% smoke-free. In June 2008, the scope of tobacco-
smoking during pregnancy (Permutt and Hebel, 1989; da Veiga free spaces was extended to taxis, buses, airplanes and other public
and Wilder, 2008; McCowan et al., 2009), but also for the nar- transport.
row nine-month window during which pregnant women have These curbs on environmental tobacco smoke were paralleled
heightened susceptibility to health-related interventions. We take by a series of advertising restrictions on tobacco products. In May
advantage of a continuous nationwide registry of all live preg- 2005 the government banned cigarette advertising on television
nancies from 2007 to 2013 to study the effects of the campaign during children’s viewing hours (before 9:30 pm) and prohibited
on two main outcomes: the probability that a pregnant smoker advertising, promotion or sponsorship by tobacco companies of all
will quit smoking by her third trimester and her infant’s birth sporting events. These restrictions were subsequently codified in
weight. March 2008, when comprehensive tobacco control legislation (Law
To identify the effect of the provider-level interventions, we 18.256) prohibited all advertising and promotion of tobacco prod-
use a difference-in-differences (DID) approach, exploiting the fact ucts except at point of sale. In October 2008, logos, trademarks and
that these policies were implemented at different health centers other tobacco-related symbols were banned on non-tobacco prod-
at different times. To assess the effect of taxes, we rely upon a ucts. In May 2014, all advertising was prohibited, even at the point
series of discrete tax increases during our study period. Finally, to of sale.
assess the effects of non-price regulation of packaging and market- In addition, the Uruguayan government promulgated warning
ing, we take advantage of the fact that these nationwide measures requirements on cigarette packages and imposed restrictions on
went into effect at different times. As an additional control, we manufacturers’ branding practices. A May 2005 ministerial decree
compare the effect of these interventions on the birth weight of banned all references to “light,” “ultra light,” “mild,” “low tar” and
children whose mothers smoked during pregnancy with the cor- other descriptors that might misleadingly imply reduced harm. The
responding effect, if any, on the offspring of mothers who did not decree also mandated a series of rotating warnings with images
smoke. covering 50% of the front and back of each cigarette pack. The
Our study contributes to an extensive literature evaluating deadline for compliance with the first round of these rotating war-
the impact of such tobacco control policies as tax increases, nings was April 2006. Subsequent rounds had respective deadlines
control of environmental tobacco smoke, cigarette pack war- of December 2007, February 2009, February 2010, January 2012,
nings, restrictions on cigarette marketing, regulation of tobacco and April 2013. A “single presentation rule,” issued as a minis-
constituents, mass media anti-smoking campaigns, and the treat- terial decree along with the third round of warnings, barred the
ment of addiction (Saffer and Chaloupka, 2000; Wakefield and marketing of multiple versions of the same brand, such as Silver
Chaloupka, 2000; Powell et al., 2005; Blecher, 2008; Carpenter and or Blue. Finally, a 2009 decree mandated that the size of the war-
Cook, 2008; DeCicca et al., 2008; Anger et al., 2011; Hammond, nings be increased to 80% of the front and back of each pack. This
2011; Hoek et al., 2011; Chaloupka et al., 2012; Emery et al., requirement was implemented with the fourth round of warnings
2012; Mons et al., 2013). Our work is distinguishable in that and became effective by February 2010.1
we exploit an extensive micro database to evaluate the relative Fig. 1 shows a timeline summarizing the major nationwide
impacts of multiple types of interventions in the context of a non-price regulatory measures from 2005 to 2013. The blue text
nationwide tobacco control campaign conducted in a developing describes each of the six rounds of package warnings, while the
country. boldface red text describes regulatory measures other than the
We find persuasive evidence on the impact of each of the three mandated warnings. The black lines point to the compliance dead-
policy categories analyzed – provider-level interventions, taxes, lines for each regulatory measure.2
and non-price policies – on the likelihood of quitting smoking Fig. 2 further describes the six rounds of rotating package war-
during pregnancy and on birth weight. In terms of the relative nings. In each round, we show only one of several mandated images.
contributions of each of these policies to the observed increase The relative sizes of the images in the figure correspond to their rel-
in quit rates, the regulation of marketing and packaging had the ative sizes on each pack, with the last three rounds reflecting the
strongest effect, accounting for 71% of the total observed varia- required increase from 50% to 80% of the front and back surfaces.
tion in quit rates during 2007–2013. While interventions to treat
nicotine dependence had a strong effect at the level of the indi- 2.2. Smoking cessation programs directed at healthcare providers
vidual provider, relatively few prenatal care centers adopted these
interventions during the study period, thus contributing little to In 2008, the comprehensive tobacco control law mandated that
the overall increase in the quit rate. Tax increases, on the other every primary care provider, whether public or private, incorporate
hand, explained an estimated 25% of the variation in quit rates dur-
ing 2007–2013. While real taxes increased 122% during that time,
the tobacco-industry passed on only a fraction of the tax increases 1
This “80% rule” was promulgated 3 months before the issuance of the fourth
to consumers, so that real cigarette price increased by only 17%. round of images. However, we have no evidence of significant compliance with the
80% rule before the deadline for compliance with the fourth round of images.
Finally, we find that smoking cessation was associated with a sig- 2
With the exception of the comprehensive tobacco control law, all measures
nificant increase in birth weight. By contrast, the tobacco control provided for a 180-day compliance period. By specifying the end of the compliance
policies under study had no effect on the birth weight of offspring period as the effective date of each measure, we assumed that tobacco manufactur-
of mothers who did not smoke. ers waited until each deadline to comply.
188 J.E. Harris et al. / Journal of Health Economics 42 (2015) 186–196
3rd round of
package warnings.
Comprehensive tobacco
Brands restricted to
control legislation.
a single presentation.
All advertising except
point-of-sale banned. 4th round of
package warnings.
Warnings must cover
2nd round of 80% of both front & back.
package warnings.
1st round of 5th round of

package warnings. package warnings.
Warnings must cover
50% of both front & back.
6th round of
Smoking prohibited package warnings.
in all enclosed public
spaces and all public
and private work spaces.
2005 2006 2007 2008 2009 2010 2011 2012 2013 2014
Fig. 1. Timeline of nationwide non-price tobacco control measures. The blue text refers to the deadlines for each of the six rounds of rotating package warnings, while the
boldface red text refers to other tobacco control measures.
the diagnosis and treatment of tobacco dependence into its menu centers that had no agreements with the FNR were required
of basic services. Pursuant to this legislation, in 2009 the Ministry to provide smoking cessation services in accordance with the
of Public Health and the National Resource Fund (“Fondo Nacional guidelines, but they were permitted to charge nontrivial copay-
de Recursos” or FNR), the governmental agency responsible for ments to patients. In what follows, we refer to these agreements
financing resource-intensive medical technologies, established between health centers and the FNR as “provider-level agree-
national guidelines for primary care providers on the diagnosis ments.”
and treatment of nicotine dependence. Through a set of agree- Among all sites providing prenatal care to pregnant women, the
ments (“convenios”) with the FNR, healthcare institutions were proportion with FNR agreements increased from 7 to 12% during
eligible to receive training and free nicotine patches and bupro- 2005–2013. Concurrently, the proportion of all pregnant women
pion in return for setting up a smoking cessation program with receiving prenatal care at sites with FNR agreements increased from
little or no patient copayments (Esteves et al., 2011). Health 13% in 2005 to 36% in 2007, but then declined to 31% by 2013.
Fig. 2. Timeline of six rounds of rotating package warnings. Each round displays only one of several mandated images. The relative sizes of the images correspond to their
relative sizes on each pack, with the last three rounds reflecting the required increase from 50% to 80% of the front and back surfaces.
80
Uruguayan Pesos per Pack of 20 Cigarettes
70
60
(Base = December 2010)
50
Real Price
40
Real Taxes
30
20 (Excise + VAT)
A B C D E F G H
10
0
2001 2003 2005 2007 2009 2011 2013

Year
Fig. 3. Real price and real taxes per pack of cigarettes, 2001–2013. The vertical lines show the timing of non-price nationwide policy measures, as described in Fig. 1.
(A) Smoking prohibited in all enclosed public spaces and all public and private workspaces. (B) 1st round of package warnings. (C) 2nd round of package warnings. (D)
Comprehensive tobacco control legislation. (E) 3rd round of package warnings; brands restricted to a single presentation. (F) 4th round of package warnings; warnings must
cover 80% of front and back. (G) 5th round of package warnings. (H) 6th round of package warnings.
2.3. Cigarette tax increases information at the level of the individual pregnancy on maternal
characteristics, self-reported smoking behavior, current and past
In addition to the foregoing policy interventions, the Uruguayan obstetric history, the timing of prenatal care, the sites of prenatal
government increased its indirect taxes on tobacco products. care and delivery, and birth outcomes including birth weight (CLAP,
Imposed solely at the national level, these taxes consist of an 2001). In 2012, the SIP covered an estimated 94% of all live births
excise tax (“impuesto específico interno” or IMESI) and a value in Uruguay.
added tax (“impuesto al valor agregado” or IVA). The IMESI, which Our analyses relied upon the following individual-level mater-
was first applied to cigarettes in 1993, underwent a series of dis- nal characteristics, derived from the SIP registry: the timing of
crete increases in June 2002, May 2003, July 2007, June 2009, and the first prenatal visit (first-trimester prenatal care); the mother’s
February 2010. The IVA, by contrast, was first applied to cigarettes age (<16, 17–19, 20–34, 35–39, and 40+ years), marital status
in July 2007 and since then has constituted 22% of the pre-tax (single, married, cohabiting, other), and educational attainment
price including the IMESI or, equivalently, 18% of the retail price. (primary, secondary, university); the number of prior deliveries
Fig. 3 shows the estimated real price and real tax on a pack of (0, 1, 2, 3, 4+); number of prior abortions; a history of diabetes
cigarettes during 2001–2013. Only 45% of the abrupt increase in or hypertension; whether any complications of pregnancy were
cigarette taxes in July 2007 was passed on to consumers in the observed, in particular, the presence of preeclampsia or eclamp-
form of higher retail prices. The construction of the real price sia; the mother’s body mass index based on her self-reported
and tax series is described in our working paper (Harris et al., height and weight prior to the pregnancy (underweight, normal
2014). weight, overweight, obese); the mother’s use of alcohol or illicit
In Uruguay, an estimated 99% of tobacco users smoke manu- drugs; the sites of prenatal care; and the newborn’s sex and birth
factured cigarettes, hand-rolled cigarettes, or both (Abascal et al., weight.3
2012). Manufactured cigarettes, in particular, make up more Prior to 2007, each individual record in the SIP database con-
than 85% of taxable cigarette consumption (Dirección General tained the pregnant woman’s smoking status only at the time of
Impositiva, 2012). During 2004–2012, by one estimate, contraband initiation of prenatal care. It did not show changes in smoking, if
cigarette sales constituted approximately 12% of total cigarette any, during the course of her pregnancy. Under a new data entry
consumption on average (Curti, 2013). With the possible excep- system beginning in 2007, the prenatal record noted the woman’s
tion of less densely populated provinces (“departamentos”) along smoking status separately in each trimester of her pregnancy. For
Uruguay’s borders with Brazil and Argentina, where contraband example, if a woman initiated prenatal care in her second trimester,
tobacco use appears more prevalent, there has been little effective the healthcare provider recorded her smoking status in the first
geographical variation in retail price. trimester, based on her recall, as well as in the current trimester.
Her smoking status would subsequently be recorded in a follow-up
2.4. Perinatal information system (SIP) prenatal visit during her third trimester. The perinatal data derived
from this new system, which we refer to as the “new SIP,” were the
Our source of micro data on the smoking practices of pregnant focus of our analysis.
women was the Perinatal Information System (“Sistema Infor-
mático Perinatal” or SIP), a mandatory nationwide electronic
registry operating in all prenatal care clinics in Uruguay since 1990. 3
To avoid loss of observations, we included dummy variables equal to 1 when
Developed and overseen by the Latin American Center for Peri- some maternal characteristics were missing. For further details on maternal and
natology (“Centro Latinoamericano de Perinatología” or CLAP) of pregnancy characteristics, including descriptive statistics, see our working paper
the Pan American Health Organization, the database contained (Harris et al., 2014).
3. Principal endpoints: smoking cessation and birth weight 3.1. Impact on quitting during pregnancy
To assess the impact of Uruguay’s tobacco control campaign, we 3.1.1. Individual-level analysis
focused our attention primarily on pregnant women who smoked We first investigate the effect of the provider-level agreements
cigarettes at any time from the first prenatal visit onward. Within and national-level tobacco control policies on the quit rate during
this target group of pregnant smokers, our principal endpoint pregnancy. To that end, we begin with a linear probability model
was smoking cessation by the third trimester. Our analysis of based upon observations at the level of the individual pregnant
this endpoint was confined to the interval from 2007 to 2013, woman:
when data on smoking habits during each trimester were avail-
yist = ci ˛0 + xst

˛1 + zt ˛2 + Ds + eist (1)
able through the new SIP system. Fig. 4 shows the progressive
increase in the annual mean quit rate among pregnant smokers where the subscript i indexes each woman, the subscript s refers
from 15.4% in 2007 to 42.7% in 2013. Our main research objec- to the health center where she received prenatal care, and the sub-
tive was to determine what proportion of this substantial rise in script t refers to the calendar date corresponding to the midpoint
quit rates could be attributed to Uruguay’s multi-component cam- of her third trimester.4 The data yist are binary variables repre-
paign. senting smoking cessation, where yist = 1 if the woman quit and
Why did not we focus on the prevalence of smoking at the onset yist = 0 if she continued to smoke through her third trimester. The
of pregnancy? Unfortunately, within the limitations of Uruguay’s vector of exogenous variables ci represents individual-level mater-
SIP system, this alternative endpoint was subject to a critical source nal characteristics, while xst represents the presence or absence of
of potential measurement bias. Starting in July 2008, the Uruguayan a provider-level agreement at health center s on calendar date t,
Ministry of Health established a new system of financial incentives and the vector zt represents those national-level policy variables
for providers to increase the completeness of data reporting on SIP including cigarettes taxes that were in effect at calendar date t. The
records. As Fig. 5 shows, the prevalence of smoking at the first parameters Ds represent health center-specific fixed effects. We
prenatal visit remained stable at about 25% from 2000 to 2005. assume that the unobserved error terms eist are uncorrelated with
Thereafter, with the onset of the tobacco control campaign, the the observed explanatory variables and have zero means.
prevalence had declined toward 15% by the early months of 2009. Since the presence or absence of a provider-level agreement
In April 2009, however, nine months after the institution of the new xst varied by health center and calendar date, the impact of these
system of financial incentives, the prevalence abruptly increased. programs was identifiable via a DID model. With respect to the
At the same time, as shown in Fig. 5, the proportion of records with national-level policies zt , however, we needed to be careful about
missing data declined dramatically from about 20% in 2009 to 1–2% what policy impacts could and could not be identified from the
by 2013. data. Successive increases in cigarette taxes during the 2007–2013
The best explanation for the abrupt break in prevalence is observation period, beginning with the inclusion of tobacco in the
that women with missing data on smoking at the onset of value-added tax on July 1, 2007 (Fig. 3), permitted us to identify
pregnancy were more likely to be smokers. Moreover, as the the impact of this policy. On the other hand, we could not iden-
Ministry of Health imposed increasingly strict goals for data com- tify the impacts of the two non-price policies that went into effect
pleteness, the effect of including these previously unreported before the start of our observation period: the prohibition of smok-
smokers in the prevalence calculation is likely to have grown ing in public places and enclosed workspaces (March 1, 2006) and
ever larger. While we did have some data on the Ministry’s the requirement that all packs contain rotating images with war-
data-completeness goals, we concluded that the task of correct- nings covering 50% of the front and back of each pack (April 18,
ing for this missing data bias and at the same time identifying 2006). Moreover, the effective dates of the second, third and fourth
the effects of post-2009 tobacco control measures would prove rounds of warnings were either close to or coincident with other
intractable. policy measures, and thus their impacts could not be separately
On the other hand, we did not detect any comparable break in identified without additional strong assumptions concerning their
the monthly time series of smoking cessation rates. We thus con- effects over time (Fig. 1).
sidered the quitting to be a more reliable and sensitive endpoint That left us with five non-price, national-level policies: the com-
than smoking prevalence. Still, to the extent that the new system prehensive tobacco legislation banning nearly all advertising (in
of data-completeness goals recorded an increasing number of hard- effect from March 6, 2008 onward); the single presentation rule (in
core smokers, our estimates in Fig. 5 of quit rate during 2009–2013 effect from February 14, 2009 onward); the increase in the warning
would be biased downward. We stress that our principal endpoint size from 50% to 80% of the front and back of each pack (in effect
represents the rate of cessation conditional upon smoking on or from February 28, 2010 onward); the fifth round of warnings (in
after the first prenatal visit. effect from January 7, 2012 through April 7, 2013); and the sixth
Smoking cessation during pregnancy is known to reduce its round of warnings (in effect from April 8, 2013 onward).
adverse health effects. Accordingly, as an additional endpoint, we Column (A) of Table 1 shows our results. We estimated the
studied the impact of Uruguay’s tobacco control campaign on parameters of Eq. (1) by ordinary least squares (OLS) with Huber-
birth weight. As shown in Fig. 6, the difference in birth weight White robust standard errors implemented at the individual level.
between non-smokers and continuing smokers was on the order of The presence of an agreement between the woman’s health center
210 g, while the corresponding difference between non-smokers and the FNR increased the probability of quitting by an esti-
and quitters was only about 25 g. Although the smaller number mated 4.8 percentage points (p = 0.024). The coefficient of the log
of quitters in the SIP database during 2007–2008 decreased the tax per pack was 0.079 (p = 0.031). At the sample mean value
precision of the mean birth weight estimates, Fig. 6 still shows a of the dependent variable equal to 0.377, the estimated elastic-
background increase in birth weight for all three groups. The anal- ity of the smoking cessation with respect to cigarette taxes was
ysis of birth weight also helped us address the concern that the SIP
registry contained self-reported data on smoking habits. If women
had falsely reported having quit smoking with increasing frequency 4
For the individual-level model of Eq. (1), we assigned a woman to calendar date
as the campaign progressed, we would have expected to see a grow- t based on the midpoint of her prenatal care visits during her third trimester of
ing reduction in the apparent favorable effects of cessation on birth pregnancy. If she had no prenatal visits, we assigned her a date t equal to 30 days
weight. prior to delivery.
50
6623 6581
Proportion of Pregnant Smokers

Who Quit by Third Trimester (%)
6845
40
6583
5888
30
2582
20
638
10
0
2007 2008 2009 2010 2011 2012 2013

Year of Midpoint of Third Trimester
Fig. 4. Increase in mean quit rate among pregnant smokers, 2007–2013. Vertical bars represent 95% confidence intervals. Adjacent to each point is the number of pregnant
smokers in the SIP database with data on smoking status in the third trimester. Each smoker was assigned to the calendar year of the mean date of all her third-trimester
prenatal visits.
0.079/0.377 = 0.21. All five of the non-price nationwide policies then all indicators of non-price policies zt equal to 0. Based on this
had significant effects, with the comprehensive tobacco control decomposition procedure, we estimated the following attributable
law assuming the dominant role. All five non-price policies com- proportions: provider-level agreements, 4.3%; tax increases, 25.2%;
bined increased the probability of quitting by an estimated 26.9 and regulations of packaging and marketing, 70.5%.
percentage points (p < 0.001).
We used the results of our individual-level analysis to com- 3.2. Health center level analysis
pute the relative contributions of each of the three categories of
tobacco control policy to the overall change in smoking cessa- It is arguable that an individual-level analysis of Eq. (1) over-
tion rates observed during the 2007–2013 study period. To that states the precision of estimated policy impacts. The central thrust
end, we first computed the predicted values ŷist derived
from Eq. of this criticism is that the agreements for smoking cessation ser-
(1) and then calculated the corresponding sum Ŷ = ŷ over
i,s,t ist vices were made with health centers rather than with individual
all observations. We then recomputed the predicted values ŷist patients. Moreover, the tax increases and non-price policies were
and corresponding sums Ŷ , successively setting all values of the carried out at the national rather than the individual level. To
provider-level agreement variable xst equal to 0, then all values of address these concerns, we performed a series of aggregate anal-
the log real tax rate equal to the initial level in January 2007, and yses at both the health center level and national level. Following
30
25
at First Prenatal Visit (%)
Prevalence of Smoking
20
% Prevalence
15
25
Proportion of Records
with Missing Data (%)
10
20
15
5
10
0
% Missing Data
0
2000 2002 2004 2006 2008 2010 2012 2014

Month of Birth
Fig. 5. Monthly prevalence of smoking at the first prenatal visit. Monthly proportion of records with missing data. Smoking prevalence is measured on the left axis, while
the proportion with missing data is measured on the right. The diameter of each data point is proportional to the number of observations.
3400
95% CI
Non-Smokers
3300
Birth Weight (gm)
3200 Smokers Who Quit
Continuing Smokers
3100
3000
2007 2008 2009 2010 2011 2012 2013

Year of Birth
Fig. 6. Annual mean birth weight among non-smoking pregnant women, continuing smokers, and smokers who quit, 2007–2013. Non-smokers were pregnant women who
did not report smoking at any prenatal visit. Continuing smokers were pregnant women who reported smoking at one or more prenatal visits, but had not quit smoking by
the third trimester. Smokers who quit likewise reported smoking at one or more prenatal visits, but had quit by the third trimester.
Table 1
Estimated effects on probability of quitting by the third trimester of pregnancy.
Independent variable (A) (B) (C) (D) (E)
Provider-level agreements 0.048** (0.021) 0.056* (0.034) 0.056** (0.025) 0.056** (0.025)
Log real tax per pack 0.079** (0.036) 0.106** (0.053) 0.105* (0.054) 0.136** (0.055)
Tobacco control law 0.094*** (0.017) 0.138*** (0.038) 0.138*** (0.026) 0.139*** (0.023)
Single presentation rule 0.075*** (0.014) 0.053** (0.023) 0.053*** (0.019) 0.039** (0.015)
80% rule 0.028*** (0.011) 0.029* (0. 015) 0.030* (0.015) 0.032* (0.016)
Fifth round of warnings 0.037*** (0.008) 0.030*** (0.011) 0.030*** (0.011) 0.036*** (0.007)
Sixth round of warnings 0.035*** (0.011) 0.0355** (0.0147) 0.035** (0.015) 0.038*** (0.010)
Five non-price measures combined 0.269*** (0.020) 0.286*** (0.044) 0.285*** (0.027) 0.283*** (0.022)
No. observations 31,230 1422 1400 1400 28
*
Significant at p < 0.10.
**
***
A. OLS estimation on individual maternal-level observations, based on Eq. (1). White-Huber robust standard errors. Coefficients of individual maternal characteristics and
health center fixed effects not shown.
B. OLS estimation on observations grouped by health center and calendar quarter, based on Eq. (3). Standard errors adjusted for clustering at level of health center. Observations
weighted by inverse of standard errors of fixed effects derived from Eq. (2).
C. FGLS estimation on observations grouped by health center and calendar quarter, based on Eq. (3). Standard errors adjusted for clustering at level of health center. Uniform
first-order serial correlation estimated to equal 0.0210. Observations weighted by inverse of standard errors of fixed effects derived from Eq. (2).
D. FGLS estimation on observations grouped by health center and calendar quarter, based on Eq. (3). Standard errors adjusted for clustering at level of health center. Uniform
first-order serial correlation estimated to equal 0.0237. Observations weighted by inverse of standard errors of fixed effects derived from Eq. (2).
E. OLS estimation on observations grouped by calendar quarter, based upon Eq. (4). Newey-West standard errors adjusted for heteroskedasticity and serial correlation.
Observations weighted by inverse of standard errors of fixed effects derived from Eq. (2).
(Amemiya, 1978; Hansen, 2007; Imbens and Woolridge, 2014) and calendar quarter t, while the data zt denote the national-level poli-
others, we specified the following two-stage model. cies in effect in calendar quarter t. The parameters Gs are S health
center-specific fixed effects. We assumed that the unobserved error
yist = ci ˇ0 + Fst + uist (2) terms uist and vist were uncorrelated with the observed explanatory
variables and with each other, and had zero means.
Fst = xst ˇ1 + zt ˇ2 + Gs + vst (3)
To estimate the model parameters, we first ran OLS on Eq.
Here, the subscript s = 1, . . ., S continues to index health centers, (2), thus obtaining estimates F̂st of the parameters Fst . In effect,
while the subscript t = 1, . . ., T now indexes calendar quarters. The these OLS estimates represented the predicted quit rate in each
subscript i = 1, . . ., Nst now indexes women who received prena- health center s and calendar quarter t of a pregnant smoker in
tal care at health center s and whose third trimester occurred in the reference category of maternal characteristics.5 We then esti-
calendar quarter t. mated the parameters of the DID model (3), where the dependent
In the first stage (Eq. (2)), the data yist are binary variables repre-
senting smoking cessation and the exogenous variables ci represent
individual-level maternal characteristics, while Fst are fixed effects 5
A woman in the reference category was married, aged 20–34 years, had less
for each of the ST combinations of health center and calendar quar- than a high school education, did not seek prenatal care in her first trimester, had
ter. In the second stage (Eq. (3)), the data xst represent the presence no prior abortions or deliveries, a pre-pregnancy body mass index 18.5–24.9 kg/m2 ,
or absence of a provider-level agreement at health center s during no history of diabetes, hypertension, eclampsia or pre-eclampsia, and gave birth to
variable Fst was replaced by its estimated value F̂st . We weighted the (5) by OLS with Newey–West standard errors with a maximum lag
observations by the inverse of the standard errors of the estimates of 4 calendar quarters to take account of possible heteroskedas-
F̂st . ticity and serial correlation of the error terms (Newey and West,
Within the context of this aggregate model, researchers have 1987). Similarly, we weighted the observations by the inverse of
expressed concerns that traditional OLS estimation of DID models the standard errors of the estimates Ĵt .
ignores serial correlation of errors and thus overstates the precision Fig. 7 motivates the logic underlying the third stage of our anal-
of the coefficient estimates (Bertrand et al., 2004; Cameron et al., ysis. The vertical axis measures the estimated fixed effects Ĵt , which
2008). In our context, serial correlation would arise when health represent calendar quarter-specific quit rates adjusted for individ-
centers were subject to common unobserved shocks that per- ual maternal characteristics, health center fixed effects, and the
sisted for more than a calendar quarter. To address these concerns, presence or absence of a provider-level agreement. The horizon-
we estimated the parameters of Eq. (3) by feasible generalized tal axis measures the corresponding calendar quarter t. Highlighted
least squares (FGLS) under varying specifications of the covariance are the dates on which the value added tax was imposed on tobacco,
matrix V = E[vst vs t ]. comprehensive tobacco legislation was passed, only brands with a
Columns (B) and (C) in Table 1 show our regression estimates of single presentation were permitted, images with warnings were
the parameters ˇ1 and ˇ2 of Eq. (3) under two different assump- mandated to cover 80% of the front and back of each pack, and the
tions concerning the covariance matrix. In column (B), we assumed fifth and sixth rounds of images went into effect.
clustering of errors within each health center. In column (C), we Columns (D) and (E) show the results of our three-stage proce-
assumed first-order temporally correlated errors with a uniform dure. Column (D) shows the estimate of 1 in Eq. (4) in the case
correlation coefficient among health centers. We found quite simi- of FGLS with serially correlated errors. As in the previous models,
lar results when we estimated Eq. (3) under the assumption that the the presence of provider-level agreement increased the probability
coefficient of serial correlation varied by health center (not shown of smoking cessation by 5.6 percentage points (p = 0.027). Column
in Table 1). In comparison with the maternal-level estimates (col- (E) shows the estimates of the policy impact parameters 2 in Eq.
umn A), we observed larger coefficients for two policies: the log real (5). The estimated policy impacts are comparable to those derived
cigarette tax and the comprehensive tobacco control law. However, from the 2-stage model (columns B and C). The point estimate of
their corresponding standard errors were also increased, so that we the impact of cigarette taxes is larger and statistically significant
could not reject the hypothesis that their impacts equalled those (p = 0.045), implying an estimated elasticity of quitting equal to
estimated at the individual-level data. Finally, in both columns (B) 0.36. Still, we could not reject the hypothesis that the estimated tax
and (C), the combined effect of the five non-price policies was indis- impact equalled that estimated from maternal-level data. Finally,
tinguishable from the effect estimated from individual-level data. the estimated effect of the five non-price policies combined was
indistinguishable from the estimates based on the individual-level
3.3. National level analysis and two-stage models.
It is arguable that the 2-stage model of Eqs. (2) and (3) still
overstates the precision of the estimated impacts of cigarette taxes 4. Impact on birth weight
and non-price regulations of cigarette packaging and marketing,
as these policies were carried out at the national level. To address We relied upon maternal-level data to assess the effect of
this criticism, we extended our two-stage model of Eqs. (2) and quitting smoking during pregnancy and birth weight. Following
(3) to three stages. In particular, we continued to specify the first- (Permutt and Hebel, 1989), we adopted a simultaneous equa-
stage model of Eq. (2), but replaced Eq. (3) with the following two tion framework in which Uruguay’s anti-smoking policies served
equations: as instruments for the endogenous variable of smoking cessa-
tion. In addition, we took advantage of the birth-weight endpoint

Fst = xst 1 + Hs + Jt + st (4) to strengthen our identification of the causal effect of Uruguay’s
tobacco control campaign. Employing the population of non-
Jt = zt 2 + t (5)
smoking pregnant women as a control group, we performed a
As before, we assumed that the unobserved error terms in (2), falsification test to determine whether the same policy instru-
(4) and (5) were uncorrelated with their respective explanatory ments had any effect on the birth weight of infants delivered by
variables. non-smokers.
The second-stage Eq. (4) differs from (3) in that Fst now depends We estimated the following equation on our sample of smoking
on the presence or absence of a provider-level agreement xst at pregnant women:
health center s during calendar quarter t, as well as health center
fixed effects Hs , calendar quarter fixed effects Jt , and an unobserved wist = ı1 yist + ci ı0 + Ks + ς ist (6)
random error st with mean zero. To estimate the policy impact
parameters 1 in (4), we replaced the fixed effects Fst with their where the variable wist denotes the birth weight of the infant deliv-
estimates F̂st from Eq. (2). As in the two-stage model, Eq. (4) was ered by mother i in health center s and the calendar date t refers to
estimated by feasible GLS under various assumptions concerning the midpoint of her third trimester of pregnancy. In Eq. (6), birth
the covariance matrix of the errors, where the observations were weight depends on smoking cessation (yist ), individual maternal
weighted by the inverse of the standard errors of the estimates F̂st . characteristics (ci ), a health center fixed effect (Ks ), as well as a
In third-stage Eq. (5), the calendar quarter fixed effects Jt in turn random error term (ς ist ) with zero mean.
depend on national level policies as well as an unobserved random If the random error ς ist is correlated with yist , then the coeffi-
error t with mean zero. To estimate the policy impact parameters cient ı1 will be biased when Eq. (6) is estimated by OLS. There is,
2 in (5), we replaced the fixed effects Jt with their estimates Ĵt in fact, good reason to believe that this is the case. For example, a
from the second stage Eq. (4). We then estimated the linear model woman with a propensity to engage in risky behaviors will tend not
to quit smoking and deliver a low-weight baby. In that case, failure
to account for the unobserved heterogeneity will overestimate the
a singleton female. For further details on maternal and pregnancy characteristics, parameter ı1 . Alternatively, a woman who runs into complications
including descriptive statistics, see our working paper (Harris et al., 2014). during her pregnancy will be under pressure to quit smoking and
.5
.4
6th Round
Quit Rate Fixed Effects

Warnings
5th Round
.3
80% Rule Warnings
Single
Presentation
.2
Rule
Advertising Ban
Anti-Smoking Law
.1
Value Added Tax

Imposed on Cigarettes
0
2007 2008 2009 2010 2011 2012 2013 2014

Calendar Quarter
Fig. 7. Trend in quit rate estimated from national-level model. The vertical axis measures the fixed effects Ĵt estimated from Eq. (4), which represent calendar quarter-specific
quit rates adjusted for individual maternal characteristics, health center fixed effects, and the presence or absence of a provider agreement. The horizontal axis measures the
corresponding calendar quarter t. Highlighted are the dates on which various national-level tobacco control measures went into effect.
tend to deliver a low-weight baby. In that case, the parameter ı1 5. Discussion and conclusions
will be underestimated.
To address the potential endogeneity of the quit variable yist , To assess the impact of Uruguay’s nationwide tobacco control
we therefore estimated Eq. (6) by two-stage least squares (2SLS), campaign, we analyzed a comprehensive nationwide registry of
where the first stage is defined by Eq. (1) and the instruments for pregnancies ending in a live birth during 2007–2013. We focused
smoking cessation are the policy variables xst and zt . As shown in sharply on smoking cessation among those women who reported
Table 2, estimation of Eq. (6) by OLS, with clustering of standard smoking at any time during pregnancy, as well as the conse-
errors by health center, gave an estimate of ı̂1 = 123.2 g (p < 0.001). quences of smoking cessation for birth weight. We observed a
By contrast, estimation of (6) by 2SLS, similarly with standard striking increase in the proportion of pregnant smokers who had
errors clustered by health center, gave an estimate of ı̂1 = 187.9 g quit by their third trimester, from 15.4% in 2007 to 42.7% in
(p = 0.028). While 2SLS increased the estimated standard error of 2013.
ı̂1 , all tests rejected the hypotheses of weak instruments or over- We employed a difference-in-differences approach to evalu-
identification. Although the 95% confidence intervals of the two ate the effects of provider-level agreements on quit rates during
estimates of ı̂1 overlapped substantially, the results still suggest pregnancy. We found that smoking cessation programs estab-
that the OLS estimate may be biased downward. lished under agreements between health centers and the National
For our falsification test, we used the parameters estimated Resource Fund increased quit rates by between 4.8 and 5.6 per-
from the first-stage Eq. (1) to predict the values of yist for those centage points. Unfortunately, no more than one-third of pregnant
women who did not report smoking during pregnancy. We then smokers received care at health centers with contractual agree-
estimated the model of Eq. (6) on all women in this comparison ments during the period of analysis, and by 2013, the proportion of
group, replacing yist with its predicted value. If the tobacco con- women exposed to such treatments had declined. As a result, these
trol policies captured in the variables xst and zt in fact improved provider-level agreements contributed little to the overall increase
birth weight by increasing the rate of quitting, then the estimate in smoking cessation observed during 2007–2013.
of ı̂1 in (6) should be indistinguishable from zero in the compari- Although real taxes per pack increased by 122% during our study
son group of non-smoking pregnant women. On the other hand, if period (Fig. 3), tax increases alone explained only about 20% of the
these policies are simply correlated with other unobserved factors overall rise in smoking cessation during pregnancy. In addition,
that improved birth weight, then the estimate of ı̂1 in (6) should be our estimates of the tax elasticity of quitting, ranging from 0.21 to
significantly positive for non-smokers as well. In fact, as shown in 0.36, were lower than those reported in the U.S. (Ringel and Evans,
Table 2, the estimate of ı̂1 was indistinguishable from zero among 2001; Colman et al., 2003). The principal explanation for this limited
non-smokers (−27.8 g, p = 0.527). influence is that manufacturers moderated their retail prices in
response to the application of the value added tax to cigarettes
Table 2 in July 2007 and other non-price regulatory policies enacted dur-
Estimated effect of smoking cessation on birth weight.a,b ing 2007–2009 (Fig. 3) (Harris et al., 2014). As a result, the real
retail price of cigarettes increased by only 17% during 2007–2013.
Population OLS 2SLS Falsification test
Endogenous responses of the tobacco industry to state and nation-
Smokers 123.2*** (10.8) 187.9** (85.5)
wide tax increases have been previously documented in the U.S.
Non-smokers −27.8 (43.8)
No. observations 31,186 31,186 126,504
(Harris, 1987; Harris et al., 1996; Chaloupka et al., 2010; Miura,
**
2010).
*** To the contrary, most of the observed increase in quit rates was
a
All estimates correspond to the parameter ı1 in Eq. (6). attributable to non-price regulation of the marketing and packaging
b
All standard errors adjusted for clustering by health center. of cigarettes. While the combined effect of all five non-price policies
was indistinguishable across the models that we tested (Table 1), the social pressure on a pregnant woman to deny smoking when
it was nonetheless difficult to identify with precision the quanti- queried by her obstetrician. While we cannot completely rule out
tative impacts of the individual component policies. We could not misreporting bias, we note that if women had falsely reported
distinguish immediate versus long-run effects of policies, nor could having quit smoking with increasing frequency as the campaign
we identify synergies between policies. Our analysis at best identi- progressed, we would have expected to see a growing reduction
fied the average impacts of the tobacco control measures over time, in the apparent favorable effect of quitting on birth weight. That is
conditional upon a specific temporal sequence of policy interven- not what we observed in Fig. 6. Moreover, if the apparent increase
tions. Thus, the increase in the size of warnings from 50 to 80% of in smoking cessation during 2007–2013 were solely the result of
the front and back of each pack of cigarettes, in effect since February increasing misreporting, then we would have observed no relation
28, 2010, was associated with a 3 percentage point increase in the between quitting and birth weight. That is not what we observed
cessation rate (Table 1). However, this estimated impact was in the in Table 2. In the context of smoking during pregnancy, a number
context of the coincident fourth round of warnings, as well as a of authors have found a strong correlation between self-reported
ban on smoking in public places and private workspaces (March 1, cigarette consumption and objectively measured levels of nicotine
2006), the comprehensive tobacco control law (March 6, 2008), the metabolites (Castellanos et al., 2000; Althabe et al., 2008; Himes
single presentation rule (February 14, 2009) and other policies in et al., 2013).
effect at the same time (Figs. 1 and 2). During 2005–2009, the prevalence of smoking during pregnancy
While we had micro data on smoking cessation at the individual declined from approximately 25% to 15% (Fig. 5). Unfortunately,
level, the tobacco control policies that we evaluated were carried missing data biases seriously complicated the interpretation of
out at either the health center level or the national level. Accord- prevalence trends thereafter. Moreover, there is evidence that some
ingly, if there were significant inter-correlation of quitting behavior women who quit during pregnancy later resumed smoking and
among women who attended the same health center or who were then quit once again during the next pregnancy (Harris et al.,
pregnant in the same calendar quarter, then the effective number of 2014). Still, if the trend observed during 2005–2009 is an accurate
observations could be far less than the approximate 31,000 shown indicator of an overall decline in prevalence, then our results on
in column A of Table 1. However, our estimates on data aggregated smoking cessation may substantially understate the overall impact
at the health center level (columns B and C) and at the national level of Uruguay’s tobacco campaign.
(columns D and E) generally confirmed our individual-level results. Our results have important implications for future research
Some economists have suggested that the impact of smoking and for the future design of tobacco control policies. Our find-
cessation during pregnancy on birth weight, as estimated from ings suggest that enhanced targeting of healthcare providers in the
cross-sectional databases, may be exaggerated by the presence of implementation of tobacco cessation programs, as well as increased
unobserved heterogeneity (Lien and Evans, 2005; Abrevaya, 2006; recruitment of patients, could have a high payoff. At the same time,
Abrevaya and Dahl, 2008; Walker et al., 2009; Juarez and Merlo, our findings strongly suggest that non-price policies, in particu-
2013). That is, a woman who tends to engage in risky behaviors lar regulation of marketing and packaging, can have an important
will continue to smoke during pregnancy and have lower weight impact in reducing tobacco use.
babies. However, the presence of unobserved heterogeneity can
also result in an underestimate of the impact of smoking cessa-
tion. Thus, a woman who encounters complications in the third Financial support
trimester, such as intrauterine growth retardation, will quit smok-
ing and have a lower weight baby. Our OLS estimate of the effect We gratefully acknowledge the financial support of the
of quitting on birth weight was 123 g (95% CI, 102–145). When we Bloomberg Foundation through an unrestricted grant to the Min-
used Uruguay’s tobacco control policies as instruments for quit- istry of Public Health (Ministerio de Salud Pública), Uruguay.
ting smoking, our 2SLS estimate was 188 g (95% CI, 20–356). While Neither the Bloomberg Foundation nor the Ministry of Public Health
our 2SLS estimate reinforces the conclusions that smoking cessa- exerted any influence on the conduct of this study or the drafting
tion during pregnancy, even in the third trimester, has a significant of this manuscript.
positive effect on birth weight, the confidence interval surrounding
that estimate was too wide to draw definitive conclusions about the
Conflicts of interest
direction of bias, if any, in the OLS estimate. Our results confirm that
even delayed cessation of smoking can reduce the adverse effects
We have no conflicts of interest to declare.
of smoking during pregnancy (Lieberman et al., 1994; Raatikainen
et al., 2007; Batech et al., 2013; Yan and Groothuis, 2013).
As we have already noted, Uruguay’s tobacco control campaign Authors’ contributions
was associated with declines in per capita cigarette consumption,
adult smoking prevalence and adolescent cigarette use that were All three coauthors contributed to the conceptualization and
significantly greater than those observed in Argentina, a country design of this study, the analysis of the data, and the writing of
with a common international border, language and culture that this report.
served as a control (Abascal et al., 2012). Unfortunately, we were
unable to locate reliable nationwide data from Argentina on the
smoking practices of pregnant women during 2007–2013, and thus Acknowledgments
could not construct a comparable external control group for this
study. Still, our falsification test took advantage of an internal We thank the Area Sistema Informático Perinatal from the
control group, namely, pregnant Uruguayan women who did not Epidemiology Division of the Ministry of Public Health (UINS)
smoke during pregnancy. We found, in particular, that the tobacco for providing us with the perinatal data. We acknowledge
control policies implemented in Uruguay during 2007–2013 had no valuable inputs from Winston Abascal, Rafael Aguirre, Wanda
effect on the birth weight of mothers in this non-smoking control Cabella, Fernando Esponda, Elba Estévez, Marinés Figueroa, Ana
group. Lorenzo, Luis Mainero, Anna Mikusheva, and Giselle Tomasso.
Our data on smoking from the SIP registry were self-reported. It The opinions expressed in this paper are ours and ours
is conceivable that Uruguay’s tobacco control campaign increased alone.
References Hansen, C.B., 2007. Generalized least squares inference in panel and multilevel
models with serial correlation and fixed effects. Journal of Econometrics 140,
Abascal, W., Esteves, E., Goja, B., Gonzalez Mora, F., Lorenzo, A., Sica, A., Triunfo, 670–694.
P., Harris, J.E., 2012. Tobacco control campaign in Uruguay: a population-based Harris, J.E., 1987. The 1983 increase in the federal excise tax on cigarettes. In: Sum-
trend analysis. Lancet 380 (9853), 1575–1582. mers, L.H. (Ed.), Tax Policy and the Economy, vol. 1. MIT Press, Cambridge, MA,
Abrevaya, J., 2006. Estimating the effect of smoking on birth outcomes using a pp. 87–111.
matched panel data approach. Journal of Applied Econometrics 21 (4), 489–519. Harris, J.E., Balsa, A.I., Triunfo, P., January 2014. Tobacco Control Campaign in
Abrevaya, J., Dahl, C.M., 2008. The effects of birth inputs on birthweight: evidence Uruguay: Impact on Smoking Cessation During Pregnancy and Birth Weight.
from quantile estimation on panel data. Journal of Business and Economic Statis- National Bureau of Economic Research, Working Paper No. 19878, Cambridge,
tics 26 (4), 379–397. MA.
Althabe, F., Colomar, M., Gibbons, L., Belzan, J.M., Buekens, P., 2008. Tabaquismo Harris, J.E., Connolly, G.N., Brooks, D., Davis, B., 1996. Cigarette smoking before
durante el embarazo en Argentina y Uruguay (Smoking during pregnancy in and after an excise tax increase and an antismoking campaign – Mas-
Argentina and Uruguay]). Medicina (Buenos Aires) 68, 48–54. sachusetts, 1990–1996. MMWR Morbidity and Mortality Weekly Report 45 (44),
Amemiya, T., 1978. A note on a random coefficient model. International Economic 966–970.
Review 19 (3), 793–796. Himes, S.K., Stroud, L.R., Scheidweiler, K.B., Niaura, R.S., Huestis, M.A., 2013. Prenatal
Anger, S., Kvasnicka, M., Siedler, T., 2011. One last puff? Public smoking bans and tobacco exposure, biomarkers for tobacco in meconium, and neonatal growth
smoking behavior. Journal of Health Economics 30 (3), 591–601. outcomes. Journal of Pediatrics 162 (5), 970–975.
Batech, M., Tonstad, S., Job, J.S., Chinnock, R., Oshiro, B., Allen Merritt, T., Page, G., Hoek, J., Wong, C., Gendall, P., Louviere, J., Cong, K., 2011. Effects of dis-
Singh, P.N., 2013. Estimating the impact of smoking cessation during pregnancy: suasive packaging on young adult smokers. Tobacco Control 20 (3), 183–
the San Bernardino County experience. Journal of Community Health 38 (5), 188.
838–846. Imbens, G., Woolridge, J.M., 2014. New Developments in Econometrics. Economet-
Bertrand, M., Duflo, E., Mullainathan, S., 2004. How much should we trust rics of Cross Section and Panel Data. Lecture 7. Cluster Sampling. Centre for
differences-in-differences estimates? Quarterly Journal of Economics 119 (1), Microdata Methods and Practice (CEMMAP), London.
249–275. Juarez, S.P., Merlo, J., 2013. Revisiting the effect of maternal smoking during
Blecher, E., 2008. The impact of tobacco advertising bans on consumption in devel- pregnancy on offspring birthweight: a quasi-experimental sibling analysis in
oping countries. Journal of Health Economics 27 (4), 930–942. Sweden. PLOS ONE 8 (4), e61734.
Cameron, A.C., Gelbach, J.B., Miller, D.R., 2008. Bootstrap-based improvements for Lieberman, E., Gremy, I., Lang, J.M., Cohen, A.P., 1994. Low birthweight at term and
inference with clustered errors. Review of Economics and Statistics 90 (3), the timing of fetal exposure to maternal smoking. American Journal of Public
414–427. Health 84 (7), 1127–1131.
Carpenter, C., Cook, P.J., 2008. Cigarette taxes and youth smoking: new evidence Lien, D.S., Evans, W.N., 2005. Estimating the impact of large cigarette tax hikes: the
from national, state, and local Youth Risk Behavior Surveys. Journal of Health case of maternal smoking and infant birth weight. Journal of Human Resources
Economics 27 (2), 287–299. 40 (2), 373–392.
Castellanos, M.E., Munoz, M.I., Nebot, M., Paya, A., Rovira, M.T., Planasa, S., Sanroma, Mathers, C.D., Boerma, T., Ma Fat, D., 2008. The Global Burden of Disease: 2004
M., Carreras, R., 2000. Validez del consumo declarado de tabaco en el embarazo Update. World Health Organization, Geneva.
(Validity of the declared tobacco consumption in pregnancy]). Aten Primaria 26 McCowan, L.M., Dekker, G.A., Chan, E., Stewart, A., Chappell, L.C., Hunter, M., Moss-
(9), 629–632. Morris, R., North, R.A., 2009. Spontaneous preterm birth and small for gestational
Chaloupka, F.J., Peck, R., Tauras, J.A., Xu, X., Yurekli, A., 2010. Cigarette Excise Tax- age infants in women who stop smoking early in pregnancy: prospective cohort
ation: the Impact of Tax Structure on Prices, Revenues, and Cigarette Smoking. study. British Medical Journal 338, b1081.
Cambridge, Massachusetts, National Bureau of Economic Research, Working Miura, M., 2010. Regulating Tobacco Product Pricing: Guidelines for State and Local
Paper 16287, August. Governments. Tobacco Control Legal Consortium, Saint Paul, MN.
Chaloupka, F.J., Yurekli, A., Fong, G.T., 2012. Tobacco taxes as a tobacco control Mons, U., Nagelhout, G.E., Allwright, S., Guignard, R., van den Putte, B., Willemsen,
strategy. Tobacco Control 21 (2), 172–180. M.C., Fong, G.T., Brenner, H., Potschke-Langer, M., Breitling, L.P., 2013. Impact
CLAP, 2001. Sistema Informático Perinatal en el Uruguay 15 Años de Datos of national smoke-free legislation on home smoking bans: findings from the
1985–1999. Centro Latinoamericano de Perinatologia y Desarrollo Humano, International Tobacco Control Policy Evaluation Project Europe Surveys. Tobacco
Publicación Científica del CLA, Montevideo, Uruguay, pp. P1485. Control 22 (e1), e2–e9.
Colman, G., Grossman, M., Joyce, T., 2003. The effect of cigarette excise taxes on Newey, W.K., West, K.D., 1987. A simple, positive semi-definite, heteroskedas-
smoking before, during and after pregnancy. Journal of Health Economics 22 ticity and autocorrelation consistent covariance matrix. Econometrica 55 (3),
(6), 1053–1072. 703–708.
Curti, D., 2013. El comercio iliı́cito en Uruguay y su relacioı́n con los impuestos: Permutt, T., Hebel, J.R., 1989. Simultaneous-equation estimation in a clinical trial of
resultados de investigacioı́n (Illicit trade in Uruguay and its relation to taxes: the effect of smoking on birth weight. Biometrics 45 (2), 619–622.
research results]). Centro de Investigación para la Epidemia de Tabaquismo Powell, L.M., Tauras, J.A., Ross, H., 2005. The importance of peer effects, cigarette
(CIET), May 9, Montevideo. prices and tobacco control policies for youth smoking behavior. Journal of Health
da Veiga, P.V., Wilder, R.P., 2008. Maternal smoking during pregnancy and birth- Economics 24 (5), 950–968.
weight: a propensity score matching approach. Maternal Child Health Journal Raatikainen, K., Huurinainen, P., Heinonen, S., 2007. Smoking in early gestation
12 (2), 194–203. or through pregnancy: a decision crucial to pregnancy outcome. Preventive
DeCicca, P., Kenkel, D., Mathios, A., 2008. Cigarette taxes and the transition from Medicine 44 (1), 59–63.
youth to adult smoking: smoking initiation, cessation, and participation. Journal Ringel, J.S., Evans, W.N., 2001. Cigarette taxes and smoking during pregnancy. Amer-
of Health Economics 27 (4), 904–917. ican Journal of Public Health 91 (11), 1851–1856.
Dirección General Impositiva, 2012. Volúmenes físicos de bienes gravados por el Saffer, H., Chaloupka, F., 2000. The effect of tobacco advertising bans on tobacco
IMESI – Series anuales (archivo xls). Montevideo Dirección General Impositiva, consumption. Journal of Health Economics 19 (6), 1117–1137.
República Oriental del Uruguay. Wakefield, M., Chaloupka, F., 2000. Effectiveness of comprehensive tobacco control
Emery, S., Kim, Y., Choi, Y.K., Szczypka, G., Wakefield, M., Chaloupka, F.J., 2012. The programmes in reducing teenage smoking in the USA. Tobacco Control 9 (2),
effects of smoking-related television advertising on smoking and intentions to 177–186.
quit among adults in the United States: 1999–2007. American Journal of Public Walker, M.B., Tekin, E., Wallace, S., 2009. Teen Smoking and Birth Outcomes. South-
Health 102 (4), 751–757. ern Economic Journal 75 (3), 892–907.
Esteves, E., Gambogi, R., Saona, G., Cenández, A., Palacio, T., 2011. Tratamiento de la World Health Organization, 2012. WHO Global Report: Mortality Attributable to
dependencia al tabaco: experiencia del Fondo Nacional de Recursos (Treatment Tobacco. World Health Organization, Geneva.
of tobacco dependence: experience of the National Resource Fund]). Revista Yan, J., Groothuis, P.A., August 2013. Timing of Prenatal Smoking Cessation or Reduc-
Uruguaya de Cardiología 26 (3), 78–83. tion and Infant Birth Weight: Evidence from the United Kingdom Millennium
Hammond, D., 2011. Health warning messages on tobacco products: a review. Cohort Study. Appalachian State University, Department of Economics Working
Tobacco Control 20 (5), 327–337. Paper, Number 13–16, Boone, NC.

Financing and funding health care: Optimal policy and political

implementability夽
Robert Nuscheler 1 , Kerstin Roeder ∗
University of Augsburg, Department of Economics, Universitätsstraße 16, 86159 Augsburg, Germany
Article history: Health care financing and funding are usually analyzed in isolation. This paper combines the corre-
Received 2 July 2014 sponding strands of the literature and thereby advances our understanding of the important interaction
Received in revised form 10 March 2015 between them. We investigate the impact of three modes of health care financing, namely, optimal income
taxation, proportional income taxation, and insurance premiums, on optimal provider payment and on
Available online 15 May 2015
the political implementability of optimal policies under majority voting. Considering a standard multi-
task agency framework we show that optimal health care policies will generally differ across financing
regimes when the health authority has redistributive concerns. We show that health care financing also
H24
I14
has a bearing on the political implementability of optimal health care policies. Our results demonstrate
I18 that an isolated analysis of (optimal) provider payment rests on very strong assumptions regarding both
the financing of health care and the redistributive preferences of the health authority.
Health care financing
Provider payment
Service quality
Cost containment
Political economy
1. Introduction Positive financing frameworks investigate how the politico econ-

omy equilibrium depends on voter heterogeneity (see, e.g., Epple
Health care funding traditionally receives great attention by and Romano, 1996a,b, and Gouveia, 1997). Surprisingly, health care
health economists. The primary topic of interest is optimal provider financing and funding are mostly analyzed in isolation: research
payment and how the environment, e.g., competition and infor- on provider payment ignores how health care is being financed
mation, shapes the optimal reimbursement system (see Chalkley and the financing literature neglects how the funds are being used.
and Malcomson, 2000, for an overview). The question of how the By simultaneously analyzing health care financing and funding our
required revenue to reimburse providers should be raised, that is, study fills this gap and thereby advances the understanding of the
how health care financing should be organized, has received less important health care financing and funding interaction. Addition-
attention. The normative literature typically asks whether a social ally, the current article analyzes both, the optimal and the political
health care system is suited to improve social welfare and if so allocation. This allows us to identify inefficiencies that are rooted in
what size of the system is optimal (see, e.g., Blomqvist and Horn, the political decision making process and to assess whether optimal
1984; Cremer and Pestieau, 1996, and Breyer and Haufler, 2000). policies are politically feasible.
We investigate the impact of three modes of health care
financing, namely, optimal income taxation, proportional income
夽 The paper gained from discussions with participants of the following work-
taxation, and insurance premiums, on optimal provider payment
shops, seminars and conferences: EHEW in Sevilla, ECHE in Zurich, Conference of the
and on the political implementability of optimal policies under
German Economic Association in Göttingen and seminars at the HCHE and the Uni- majority voting. We consider a standard multi-task agency frame-
versity of Siegen. We are particularly grateful to Mathias Kifmann, Friedrich Breyer, work where the health care provider chooses the quality of care
and Hartmut Kliemt for their comments and suggestions. and cost reducing effort. More quality in health care is beneficial to
∗ Corresponding author. Tel.: +49 821 598 4477; fax: +49 821 598 4175.
patients and to the provider who is considered (partially) altruis-
E-mail addresses: robert.nuscheler@wiwi.uni-augsburg.de (R. Nuscheler),
kerstin.roeder@wiwi.uni-augsburg.de (K. Roeder). tic. More quality increases treatment costs and cost reduction effort
1
Tel.: +49 821 598 4202; fax: +49 821 598 4232. lowers them. Individuals differ along two dimensions, i.e., risk and
198 R. Nuscheler, K. Roeder / Journal of Health Economics 42 (2015) 197–208
income. Given this heterogeneity an allocation is assessed along environment. It may relate to the quality elasticity of demand as
three dimensions: quality, effort, and the distribution of income (or, in the first three papers, or to the complementarity between the
equivalently, the numéraire commodity). If optimal income taxa- different dimensions under consideration (the latter two papers).
tion is feasible — or in the absence of redistributive concerns — and We use a simplified version of their models enabling us to integrate
if quality and effort are contractible, the first-best allocation can health care financing. There are only two articles we are aware
be implemented. When quality and effort are non-contractible, the of that consider a median voter approach to provider payment,
health authority uses a linear cost-sharing arrangement to steer namely, Gravelle (1999) and Nuscheler (2003). These papers look at
the provider’s incentives to invest in quality and to exert effort. how optimal capitation payments for physicians relate to the ones
As the health authority has two margins but only one instrument that would be implemented by majority voting. Both papers do not
the first-best allocation can no longer be implemented. The health consider a multi-task agency framework and remain silent about
authority then uses the cost-sharing parameter to optimally trade health care financing.
off the inefficiencies in quality and effort. If health care financing Second, the health care financing literature. The normative liter-
is through optimal income taxes, this tradeoff is not blurred by ature typically takes an optimal income taxation approach and asks
any redistributive consequences which the financing of health care whether there is a case for redistributive social health care finan-
provision might have. cing in the presence of progressive income taxation. Blomqvist and
The second-best allocation is then contrasted with allocations Horn (1984) and Cremer and Pestieau (1996), for instance, show
under alternative financing regimes, namely, proportional income that the desirability of social health insurance in parallel to an opti-
taxes and insurance premiums. We call the resulting allocations mal income taxation scheme crucially depends on the correlation
third-best. With proportional income taxation, income is redis- between income and health risk. For the empirically relevant case
tributed from high-income agents to low-income ones and from of a negative correlation, a redistributive public health care sys-
low-risk agents to high-risk ones. Depending on the distributional tem can improve on a purely private health care market.2 Breyer
characteristics of risk and income the third-best policy may imply and Haufler (2000) advocate for a strict separation of income redis-
more cost-sharing than the second-best policy and with it higher tribution and health care financing as this would allow for better
quality and less effort, causing health care expenses to be higher. health insurance contracts (in terms of ex post moral hazard) and
When health care financing is through insurance premiums the more efficient public financing in general (lower shadow costs of
second-best quality-effort tradeoff is affected if and only if insur- public funds). Political feasibility of optimal policies and provider
ance premiums involve some pooling. Then, premiums redistribute reimbursement are ignored. The positive literature on health care
income from low-risk agents to high-risk ones with the extent financing aims at explaining the existence of public health care,
being governed by the degree of pooling. The comparison between its size and its form of financing. Epple and Romano (1996a) and
the second-best and third-best allocations hinges on the distribu- Gouveia (1997) were the first to address these issues.3 The former
tional characteristic of risk and on the extent of pooling. paper considers agent heterogeneity in income and shows that
To complete the picture, we derive the allocations under major- there is an ‘ends against the middle’ equilibrium when public health
ity voting and contrast them with the optimal policies for both care can be topped up by actuarially fair private health insurance.
proportional income taxes and insurance premiums. While the Gouveia (1997) shows that this result continues to hold when het-
redistributive preferences of the health authority are governed by erogeneity in risk is added to the framework. Both papers derive
the distributional characteristics, the preferences of the median conditions under which a mixed health care system with pub-
voter depend on individual heterogeneity. This implies that, only lic and private health care financing arises. The mode of public
in knife-edge cases, can the optimal (third-best) policies be imple- financing, however, is taken as given. We explicitly analyze the con-
mented as political equilibria. For the case of proportional income sequences of alternate financing regimes on economic allocations.
taxes the comparison of the two allocations depends on how the Rather than taking a reduced form approach where a health good
relative inequity between risk and income compares to the relative is uniformly distributed to those who need it, we add a multi-task
distributional characteristics between these two dimensions. For provider payment setting to the model. Finally, Epple and Romano
insurance premiums it is only the inequity in risk together with the (1996a) and Gouveia (1997) offer no normative analysis. By con-
extent of pooling and its relation to the distributional characteristic trast, the current article studies normative and positive allocations
of risk that matters. and demonstrates how they compare to one another. Kifmann
Finally, it should be noted that, rather remarkably, risk-rated (2005) extends Gouveia’s analysis by introducing a constitutional
premiums imply second-best optimal health care provision for both stage where voters have a say on the mode of health care financing.
the third-best allocation and the political outcome. The reason But, again, a normative analysis is missing as well as the integration
being that risk-rated premiums preclude any form of redistribution. of provider payment.
There is then no conflict in the electorate about how to shape the Finally, our paper relates to the normative literature that ana-
health care system and second-best health care provision results. lyzes both, health care financing and funding. Zeckhauser (1970)
From the normative end it does not pay off to distort the opti- was the first to simultaneously analyze provider payment and
mal policy away from the second-best as the associated efficiency health care financing. Ma and McGuire (1997) generalized this
losses are not compensated by redistributive gains. As the resulting framework. These papers analyze optimal health insurance in an
income distribution may not be optimal the equilibrium alloca- ex post moral hazard setting. We consider a multi-task agency set-
tion may not be second-best efficient. Our results demonstrate that up instead and investigate a much richer set of financing regimes.
studies on optimal provider payment that neglect health care finan- Moreover, their frameworks are normative in nature. An analysis
cing rest on very strong assumptions regarding the redistributive
motives of the health authority, or on health care financing.
This article relates to two strands of the health economics
literature. First, provider payment. The papers of Chalkley and 2
Kifmann and Roeder (2011) extend the analysis to premium subsidies and exam-
Malcomson (1998a,b), Ma (1994), and, more recently, Eggleston ine whether this approach is superior to social health insurance from a welfare
(2005), and Kaarbøe and Siciliani (2011), show that mixed payment perspective. For a negative correlation they find that combining premium subsidies
with social health insurance is the optimal policy.
systems, i.e., a combination of capitation payments and cost- 3
Epple and Romano (1996b) is another example. In this article the authors inves-
sharing, will generally be optimal. Whether quality incentives tigate a framework where individuals can opt out the public plan and buy private
are high powered or low powered depends on the respective health insurance. As a result, preferences are no longer single-peaked.
R. Nuscheler, K. Roeder / Journal of Health Economics 42 (2015) 197–208 199
Table 1 denoted x. Assuming quasi-linear preferences, the optimization

Agent heterogeneity.
problem of individual ij can be written as6
Income
max Uij = xij + j [b(q) − L], (1)
p r xij
Health l pl rl l >0.5
where the budget constraint amounts to xij = yi − Tij , xij = (1 − t)yi ,
risk h ph rh h < 0.5
and xij = yi − [(1 − ϕ)j + ϕ]p for financing regimes T, t, and p,
p > 0.5 r < 0.5 1 respectively.
of how their outcomes relate to those that would be implemented 2.2. The health care provider
in a political process is missing. Our model is considerably richer
in this respect allowing us to analyze the important interaction When treating a patient the provider incurs treatment costs K.
between health care financing and funding in normative and pos- As health care always includes a random element, these costs are
itive settings in great detail and thereby to contribute to a rather uncertain. From an ex ante perspective only expected treatment
slim financing and funding literature. costs, c = E(K), matter. Despite this uncertainty the HCP has an influ-
The remainder of the paper is organized as follows. Section 2 ence on costs and we suggest that c is a function of the quality of
introduces the basic framework, followed by a normative analysis care, q, and the provider’s effort to keep treatment costs down, e.7
in Section 3. Political outcomes are derived and compared to the We let the expected treatment cost function c(q, e) satisfy cq > 0,
respective normative allocations in Section 4. We discuss model cqq > 0, ce < 0, cee ≥ 0 and ceq = 0. Expected treatment costs, thus,
extensions in Section 5. Section 6 concludes. increase with quality and do so at an increasing rate. More effort
implies lower treatment costs in expectation but at a decreasing
2. The model rate.8 The provider’s effort to contain costs involves a disutility v(e)
per patient, where ve > 0 and vee > 0. Finally, the HCP incurs a fixed
2.1. Individuals cost F.
Total provider reimbursement is denoted P and the actual
We consider a continuum of individuals of size one who differ payment is determined by a simple linear cost-sharing arrange-
along the two most important dimensions when it comes to health ment, P = K + . The parameter ∈ [0, 1] is the extent to which the
care financing and health care provision, namely, income and risk. health authority (HA) is willing to share treatment costs. As an addi-
There are two income types i = r, p (rich and poor) with income lev- tional compensation the HCP may receive a lump-sum payment
els yr > yp > 0. The probability of falling ill is denoted j and assumes ∈ R per patient treated. The expected reimbursement is given by
the value l ∈ (0, 1) for low-risk individuals and h ∈ (l , 1) for high-
E(P) = c(q, e) + . (2)
risk individuals. Both income and risk are exogenously given. The
two-dimensional heterogeneity gives rise to ij-types and we denote In addition to reimbursement the HCP derives utility from the
their share in the population ij ∈ [0, 12 ), where the upper bound is patients’ benefits generated through treatment. This gives rise to
introduced for the sake of interest.4 To ease notation we define the HCP’s expected payoff:
i ≡ il + ih and j ≡ pj + rj . In the following, we assume p > 0.5,
that is, median income, yp , is below average income, y = p yp + r yr . H(q, e) = [˛b(q) − c(q, e) − v(e) + E(P)] − F, (3)
In addition, the majority of individuals is exposed to a low health
risk, l > 0.5. This implies that median risk, l , is smaller than aver- where is the share of agents treated, that is, all individuals who
age risk, = l l + h h . Table 1 summarizes. are sick. The parameter ˛ captures the HCP’s degree of altruism
Sickness inflicts a disutility L > 0 on individuals. Through the towards patients’ health benefits (see, e.g., Ellis and McGuire, 1990;
receipt of care these costs can be mitigated but not eliminated. We Eggleston, 2005; Jack, 2005). For ˛ = 1, the HCP is a ‘perfect agent’
refer to this reduction as the benefit of treatment and denote it and for ˛ < 1 an imperfect one.9
b ∈ [0, L). There is one health care provider (HCP), e.g., a hospital, Substituting E(P) in Eq. (3) by Eq. (2) we arrive at the HCP’s
specialist, or general practitioner, who treats all patients in need of optimization program
care. We summarize all provider activities that aim at increasing maxH(q, e) = [˛b(q) − (1 − )c(q, e) − v(e) + ] − F. (4)
b in a single, one-dimensional quality index q.5 We let bq > 0 and q,e
bqq ≤ 0, that is, an improvement in health care quality increases the
Our assumptions about the benefit function b(q) and the cost func-
benefits from treatment but at a decreasing rate.
tions c(q, e) and v(e) ensure that the HCP’s problem is concave. The
Health care is publicly financed. We distinguish between three
first-order conditions with respect to q and e amount to
financing regimes indexed n ∈ {T, t, p}. The government either
uses individualized lump-sum transfers Tij (optimal income tax- ∂H(q, e)
ation, n = T), proportional income taxes with tax rate t ≥ 0 (n = t), = 0 ⇔ ˛bq − (1 − )cq = 0, (5)
∂q
or insurance premiums (n = p) to generate the revenue required to
reimburse the HCP. Insurance premiums can be either risk-based,
pooled, or be a mixture of both. More precisely, the insurance pre-
6
We assume that patients passively accept the quality of treatment the HCP is
mium is given by [(1 − ϕ)j + ϕ]p, where p is the price of health
willing to provide.
care and ϕ ∈ [0, 1] the extent of pooling. 7
Although we are not concerned with multiple quality dimensions we have a
In addition to the (expected) benefits from health care, agents multi-task agency problem: the HCP chooses quality and effort.
8
derive utility from consumption of a numéraire commodity The assumption that the cross derivative vanishes is made for analytical conve-
nience.
9
Note that even in the case of perfect agency, the individuals’ well-being does not
fully enter the HCP’s utility function. The reason is that even though the HCP con-
4
In our political game an ij-type with ij ≥ 12 could dictate the allocation. siders the patients’ utility from treatment, it does not take into account the financial
5
This assumption rules out multi-task quality issues when it comes to optimal costs of service delivery, that is, the taxes or premiums that are borne by the patients
provider payment (Chalkley and Malcomson, 1998a,b and (Kaarbøe and Siciliani, (see also Jack, 2005). So, the HCP’s and the patient’s valuation of health care services
2011)). may differ even if ˛ = 1.
∂H(q, e) for regimes T, t, and p, respectively. In equilibrium the participation

= 0 ⇔ −(1 − )ce − ve = 0. (6)
∂e constraint will be binding so that we can solve Eq. (9) for . Inserting
into the budget constraints (10)–(12) we obtain one condition for
The above system of equations yields the optimal quality of health
each regime that guarantees both, a balanced public budget and
services q( ; ˛) and the optimal level of cost reducing effort e( ; ˛)
participation of the HCP:
as functions of the cost-sharing parameter and the degree of altru-
ism ˛. The optimal quality of health care services is determined such T () = c(q, e) + v(e) − ˛ˇb(q) + F, (13)
that the internalized marginal benefit of treatment (the first term

F

of Eq. (5)) is equal to the marginal cost of providing quality (the t() = c(q, e) + v(e) − ˛ˇb(q) + , (14)
second term). The optimal cost reducing effort is chosen to equal- y
ize the marginal benefit of lower treatment costs (the first term of F
Eq. (6)) with the marginal disutility of effort (the second term). p() = [c(q, e) + v(e) − ˛ˇb(q)] + . (15)

The actual quality and effort levels the HCP is willing to provide
Since e = e() and q = q(), the budget balancing revenue
depend on the contract he has. Specifically, it depends on the cost-
depends on the cost-sharing parameter only indirectly, that is,
sharing component . With the help of Cramer’s rule (see the
via the influence cost-sharing has on the HCP’s optimally chosen
Appendix A.2), we get
quality and effort. With the help of the implicit function theorem
dq(; ˛) −cq and observing the first order conditions of the HCP, Eqs. (5) and (6),
q ≡ = > 0, (7)
d ˛bqq − (1 − )cqq we have
de(; ˛) ce
e ≡ = < 0. (8) dT () dt() dp()
d (1 − )cee + vee =y = = [(cq − ˛ˇbq )q + (ce + ve )e ] > 0.
d d d
The more cost-based the HCP’s payment the larger the quality he is (16)
willing to supply and the smaller the effort he exerts to reduce cost.
The first effect is due to the lower price the HCP has to pay for quality A higher cost-sharing parameter requires higher public revenues.
improvements when the cost-sharing parameter increases. The This is a direct consequence of the HCP’s response to an increase
less expensive the provision of quality the more the HCP is willing in cost-sharing: more cost-sharing implies higher quality (7) and
to offer. The second effect is a moral hazard effect. An increase in the less cost reducing effort (8) both increasing expected health care
share of reimbursed treatment costs undermines the HCP’s incen- spending and with it the revenues needed to balance the public
tives to contain costs. Additionally, q( ; ˛) positively depends on budget.
˛: the more the HCP cares about the patients’ health care benefits,
the higher the quality of health care services he delivers. 2.4. The economic equilibrium
2.3. The public health care scheme The following definition introduces the notion of economic
equilibrium into our model economy.
The government or the health authority (who is the purchaser
of health care services) faces two constraints, a participation con-
Definition 1. (Economic equilibrium)
An allocation xpl , xph , xrl , xrh , q, e with policy instruments
straint and a budget constraint. We assume throughout that the Tij /t/p, and constitutes an equilibrium of the economy if the
benefit from treatment is sufficiently large so that the HA always following conditions hold:
wants to contract with the HCP. As the HCP cannot be forced to
provide health care services, the lump-sum transfer per patient
(i) the utility of all agents is maximized, i.e., the program given in
and the cost-sharing parameter must be chosen such that the HCP
Eq. (1) is solved,
is willing to accept the contract. With a reservation utility of zero
(ii) the health care provider’s participation constraint (9) is satis-
the participation constraint reads as10
fied and its payoff is maximized, i.e., the program given in Eq.

˛ˇb(q) − (1 − )c(q, e) − v(e) + − F ≥ 0. (9) (4) is solved, and
(iii) the government’s budget constraint is balanced, i.e., for opti-
The parameter ˇ ∈ {0, 1} allows us to distinguish between two sce- mal income taxation Eq. (10) holds, for proportional income
narios. For ˇ = 0 the monetary part of the HCP’s utility needs to be taxation Eq. (11) holds, and for insurance premiums Eq. (12)
non-negative. In contrast, for ˇ = 1 the participation constraint is holds.
less demanding as the (partial) internalization of patients’ health
benefits allows for negative monetary payoffs. The functional form In an economic equilibrium the utility level obtained by type-
of (9) and the parameters ˛ and ˇ are common knowledge. ij agents can be expressed by their indirect utility function Vijn (),
As already noted above, the expected health care expenses, where
HCE = [c(q, e) + ], can either be financed by optimal income
taxes, proportional income taxes, or insurance premiums. To bal- VijT () = yi − Tij () + j [b(q()) − L] , (17)
ance the public budget, public revenues need to be equal to the
expected health care expenses. We have Vijt () = (1 − t())yi + j [b(q()) − L] , (18)
p
ij ij Tij ≡ T = HCE, (10) Vij () = yi − [(1 − ϕ)j + ϕ]p() + j [b(q()) − L] . (19)
ti i yi ≡ ty = HCE, (11) The indirect utility function represents an individual’s preferen-
ces over the cost-sharing parameter, . Inspection of Eqs. (17)–(19)
pj j [(1 − ϕ)j + ϕ] = p = HCE, (12) reveals that these preferences depend on the mode of health care
financing. This already points to the different distributional proper-
ties of the three financing regimes. With individualized lump-sum
10
A strictly positive reservation utility would simply add to the fixed costs of being transfers the government can perfectly redistribute between indi-
active in the market. viduals so as to equalize their (marginal) utilities. With proportional
income taxation, income can still be redistributed from rich to poor ∂W ∗ (Tij , q, e)
using health care as a vehicle (see, e.g., Gouveia, 1997). While any = ce + ve = 0, (24)
∂e
form of redistribution is ruled out when premiums are risk-based
(ϕ = 0), partial pooling, ϕ ∈ (0, 1], implies redistribution from low- where
is the Lagrangean multiplier on the budget constraint.
risk to high-risk agents with the extent of redistribution being The first condition states that individualized lump-sum transfers
increasing in ϕ. In Sections 3 and 4 we carefully analyze how these should be chosen such that marginal utilities are equalized. For
differences affect the allocations in normative and political econ- < 0 this implies that, for a given risk type, high-income indi-
omy environments. viduals have to pay more taxes than low-income agents: Trj > Tpj .
Similarly, for a given income type, low-risk agents have to pay more
3. Optimal financing and funding taxes than high-risk agents as the former suffer the utility loss from
illness with a higher probability than the latter: Til > Tih . We get
In this section we study the optimal interaction between the following ordering of individual lump-sum taxes: Trl > max {Tpl ,
health care financing and funding considering four different allo- Trh } ≥ min {Tpl , Trh } > Tph . Whether Tpl ≶ Trh depends on the rel-
cations: first-best, second-best, and two third-best allocations. ative inequity between income and risk. Without redistributive
To determine the first-best allocation we consider quality and concerns, = 0, all Tij that balance the public budget would be
effort contractible and let optimal income taxation be feasible, optimal.
that is, health care financing is through individualized lump-sum Eq. (23) states that first-best quality, q* , should be expanded
transfers. By maintaining the optimal income taxation assump- until the marginal benefit to patients, bq , is equal to the marginal
tion and introducing non-contractibility for both quality and effort costs of providing health care services, cq − ˛ˇbq . Obviously, the
we obtain the second-best allocation. This allocation is then com- costs of expanding quality are lower when the participation
pared to two third-best allocations where quality and effort are still constraint of the HCP includes an altruistic component (ˇ = 1)
considered non-contractible but alternative health care financing as compared to a situation where the HCP has to break even
regimes are considered, namely, proportional income taxation and in monetary terms (ˇ = 0). As a result, the first-best efficient
insurance premiums. quality level is higher in the former case and the quality differ-
Throughout our analysis we assume a HA or government who ential is increasing with ˛. The first-best cost reduction effort, e* ,
may have redistributive concerns, that is, who may aim at redis- equates the marginal disutility of exerting effort, ve , to the marginal
tributing to the double disadvantaged in society, that is, to high-risk expected treatment cost savings, −ce ; see Eq. (24).
low-income individuals. We incorporate redistributive consider-
ations into the analysis by letting the HA’s objective function be
3.1.2. Second-best

W= ij (Uij ), (20) The government can still impose optimal income taxes but both
quality and effort are non-contractible.12 Using the reimbursement
ij
system the HA can steer the HCP’s incentives to invest in quality and
where (·) is a strictly increasing and weakly concave function of to contain costs. We focus on linear contracts as given in Eq. (2)
individual utility levels, i.e. (·) >0 and ( · ) ≤ 0. This formulation that specify a lump-sum transfer per patient and a cost-sharing
comprises a utilitarian HA as limiting case, ( · ) = 0. Due to quasi- parameter .
linearity every income distribution would then be optimal. In other Again, the HA maximizes the welfare function subject to the
words, all redistributive concerns of the HA are contained in . An participation constraint (9) of the HCP and the public budget con-
immediate implication is that a necessary condition for a difference straint (10). The HA, however, no longer optimizes over Tij , q and e,
between the second-best and the two third-best allocations is the but over Tij and . In other words, she loses one degree of freedom
presence of redistributive concerns. In the following, we investigate as compared to the first-best problem. In determining the optimal
how these concerns shape optimal health policy. cost-sharing parameter the HA has to consider the HCP’s optimal
quality and effort responses to cost-sharing arrangements as given
3.1. Normative benchmarks — optimal income taxation by Eqs. (5) and (6). Specifically, the optimization problem is given
by
3.1.1. First-best

The HA maximizes (20) with respect to Tij , q and e subject to the max W T (Tij , ) = ij (yi − Tij + j [b(q) − L])
HCP’s participation constraint (9) and the public budget constraint Tij ,
ij (25)
(10). The optimization problem is11
s.t. (5), (6) and (13).

max W ∗ (Tij , , q, e) = ij yi − Tij + j [b(q) − L]

Tij ,,q,e
(21) After some rearrangements, the first order conditions with respect
ij
to Tij and can be written as
s.t. (13).
Inserting the participation constraint into the budget constraint ∂W T (Tij , )
= −ij ij +
ij = 0 ⇒ ij =
∀ ij, (26)
and taking the derivative with respect to Tij , q and e yields ∂Tij
∂W ∗ (Tij , q, e)
= −ij ij +
ij = 0 ⇔ ij =
∀ ij, (22)
∂Tij
12
While this assumption is rather plausible for cost reducing effort it is at odds
∂W ∗ (Tij , q, e) with the increasing importance of pay-for-performance schemes. Nevertheless,
= −cq + [1 + ˛ˇ]bq = 0, (23)
∂q the assumption can be justified on a number of grounds. First, the assumption is
fairly standard in the earlier literature on provider payment (see, e.g., Chalkley and
Malcomson, 1998a; Chalkley and Malcomson, 1998b). Second, even the more recent
contracting literature acknowledges that there are unobservable quality dimensions
11
Chalkley and Malcomson (1998b) argue that this welfare function includes (see, e.g., Kaarbøe and Siciliani, 2011) and we concentrate on those. Third, measur-
double counting of the patient’s health care benefits. Our results do not change qual- ing quality is a complicated endeavor and creates problems in its own right (as a
itatively when the altruistic term is dropped from the HCP’s participation constraint, response to report cards Dranove et al. (2003), for instance, find selection behavior
that is, when ˇ = 0 (see also Kaarbøe and Siciliani, 2011). on the side of providers).

∂W T (Tij , )
y
ij ij ij i
−
ij ij ij
y
ij ij i
= −cq + [1 + ˛ˇ]bq q − (ce + ve )e = 0. (27) y ≡ . (32)
∂
ij ij ij
y
ij ij i
Like in the first-best solution individualized lump-sum transfers
should be chosen such that marginal utilities are equalized. Eq. (27) These measures amount to the standardized covariances between
yields the optimal degree of cost-sharing, T , which, using the HCP’s the welfare weights the government attaches to a particular ij-
first order conditions (5) and (6), dictates qT and eT : type and health risk and income, respectively. For a utilitarian
HA, = 0, all individuals have the same welfare weight irrespec-
˛bq (qT ) = (1 − T )cq (qT ) and − (1 − T )ce (eT ) = ve (eT ). (28) tive of health and income, that is, ij = > 0 ∀ ij. The absence of
Implementation of the first-best allocation is generally impossible redistributive concerns is reflected in the distributional character-
as the HA has only one instrument, the cost-sharing parameter , istics both assuming the value zero, = y = 0. Similarly, without
but two margins, namely, q and e. It can easily be verified that a agent heterogeneity the distributional characteristics would be
cost-sharing parameter = 0 implements the efficient effort level zero: l = h implies = 0 and yp = yr implies y = 0.13
as the provider is the residual claimant for his cost savings. Quality Using Eqs. (31) and (32), Eq. (30) can be rewritten as
provision, however, will then be inefficient unless ˛(1 − ˇ) = 1. As
−cq + + ˛ˇ bq q − (ve + ce )e = 0, (33)

˛ ∈ [0, 1] and ˇ ∈ {0, 1} first-best quality requires perfect altruism
˛ = 1 and ˇ = 0 when = 0. If ˛(1 − ˇ) = / 1 we get =/ 0 and with it
where ≡ (1 + )/(1 + y ). The following proposition states how
a distortion of the HCP’s incentives to exert effort. The HA can, thus,
the third-best allocation with proportional income taxes relates to
either implement the first-best cost reducing effort or the first-best
the second-best outcome.
level of quality but not both. The optimal cost-sharing parameter,
T , optimally trades off the inefficiencies along the two dimensions. Proposition 1. (Optimal policy with proportional income taxes)
3.2. Third-best — the optimal financing and funding interaction (i) ≥ y ⇔ t ≥ T ⇔ qt ≥ qT and et ≤ eT .
(ii) The third-best allocation with proportional income taxes is identi-
In addition to the resource constraints and the non- cal to the second-best allocation if and only if y = = 0.
contractibility of quality and effort the HA now faces financing
constraints. When optimal income taxation is no longer feasible
The intuition for part (i) is as follows.14 Whenever > 1, which
the HA has to resort to alternative financing regimes, propor-
is equivalent to > y , the marginal benefit of quality provision
tional income taxation and insurance premiums being the most
is higher under proportional income taxes than under optimal
prominent arrangements. We investigate the resulting third-best
income taxation. The reason being the distributional consequences
allocations in turn.
of improvements in quality. High-risk individuals are more likely to
benefit from higher quality care but, ceteris paribus, do not pay more
3.2.1. Proportional income taxes
taxes than low-risk individuals. Additionally, rich individuals con-
With proportional income taxes the policy instruments are
tribute more to the financing of better quality without benefitting
given by the tax rate t and the cost-sharing parameter . The opti-
more than poor individuals. The increase in quality, thus, implies
mization problem reads as
more distribution towards poor and high-risk individuals. These
max W t (t, , ) = ij ((1 − t)yi + j [b(q) − L]) redistributive benefits have to be weighed against the additional
t, distortion of effort incentives and the third-best optimal degree
ij (29)
s.t. (5), (6) and (14). of cost-sharing does so optimally. For < 1 cost-sharing in the
third-best is less pronounced than in the second-best and qual-
Inserting the constraints, the optimization problem simplifies to ity and effort levels compare accordingly. Optimal cost-sharing in
the third-best and second-best optima is identical if and only if

= 1. If = y =
/ 0, then health care provision is second-best effi-
W t () = ij 1− [c(q(), e()) cient. As distributional concerns matter, however, the allocation is
y
ij not second-best efficient. As a consequence of suboptimal health
F
care financing the income redistribution is inefficient. Only in the
+v(e()) − ˛ˇb(q()) + yi + j [b(q()) − L] . absence of redistributive concerns, = 0, or when there is no need

for redistribution, l = h and yp = yr , the third-best and second-best
The optimal cost-sharing parameter, t , is implicitly determined by allocations coincide (ii).
the following first order condition One can certainly speculate about how the distributional char-
acteristics compare to one another. We conjecture that the HA

dW t () attaches a higher welfare weight to the disadvantaged, that is,
= ij ij − yi (cq − ˛ˇbq )q + (ce + ve )e to individuals with high health risks, Cov( ij , j ) > 0, and low
d y
ij income, Cov( ij , yi ) < 0. This is equivalent to > 0 and y < 0
which implies > 1 and with it more cost-sharing, higher qual-
+j bq q = 0, (30) ity and less effort under proportional income taxation as compared
to optimal income taxation.
where q and e are defined as in Eqs. (7) and (8). To gain a better
understanding of how redistributive concerns affect optimal cost-
sharing under proportional income taxation t and thus optimal 13
Interpretation of the distributional characteristics is not straightforward. Note,
quality qt and effort et , we introduce the distributional character- that we made no assumptions about the correlation between income and risk.
istics of health risk and income (see Feldstein, 1972): This implies that any assumption on the sign of the distributional characteristics
implicitly includes assumptions on the sign and size of this correlation.

ij ij ij j
−
ij ij ij

ij ij j
14
Analytically, the result follows — like in all other propositions and corollaries —
≡
, (31) from a simple comparison of the coefficients on quality in the respective first order

ij ij ij

ij ij j conditions, here Eqs. (27) and (33).
3.2.2. Insurance premiums independent of the extent of pooling ϕ. As a result, health care qual-
With insurance premiums the optimization problem of the HA ity in tax based systems is expected to exceed quality in insurance
is given by financed systems. It is the other way round for cost reducing effort,
leading to higher expected health care expenses in the former sys-
max W p (p, , ) = ij (yi − [(1 − ϕ)j + ϕ]p + j [b(q) − L])
p,
tems than in the latter.
(34)
ij
s.t. (5), (6) and (15).
4. Political implementability of optimal policies
Employing the same approach as in the previous section and using
Eq. (31), the first order condition for the optimal cost-sharing In the previous section we assumed a normative perspective
parameter p can be written as and investigated the optimal health care policies under optimal
income taxation (first-best and second-best) and contrasted them
1 +
−cq + + ˛ˇ bq q − (ce + ve )e = 0. (35) with alternative financing regimes (third-best). Implementing an
1 + (1 − ϕ)
optimal cost-sharing parameter, however, may not be feasible
This condition allows us to state our next proposition where we politically. Under majority voting, policy makers will commit to
compare the results with insurance premiums to the second-best policies that maximize the number of votes rather than the social
allocation. objective. Taking a median voter approach, this section derives the
political equilibria under proportional income taxation and insur-
Proposition 2. (Optimal policy with insurance premiums) ance premiums and contrasts them with the respective third-best
and second-best policies.
(i) When insurance premiums entail some redistribution, ϕ > 0, then: With a continuum of individuals each voting agent has zero
mass, so that no individual vote can change the outcome of the
election. We let all agents alive cast a ballot over the cost-sharing
≥ 0 ⇔ p ≥ T ⇔ qp ≥ qT and ep ≤ eT . parameter . Although it appears more natural to let voters decide
on the parameter which determines the financing of health care
(ii) For purely risk-based premiums, ϕ = 0, health care provision is rather than on one parameter of the reimbursement scheme, this
second-best efficient while health care financing is not. approach is without loss of generality. Remember that in an eco-
(iii) The third-best allocation with insurance premiums is identical to nomic equilibrium only one of the three policy instruments can
the second-best allocation if and only if y = = 0. be set freely. The other two are then residually determined by the
HCP’s participation constraint and the HA’s budget constraint. It is
The intuition is similar to the one of Proposition 1. Suppose thus irrelevant whether voters decide on , t/p or . We opt for
that insurance premiums imply some redistribution from low-risk the former as it allows for a direct comparison of the equilibrium
to high-risk individuals as stated in part (i) of the proposition. allocations of the two approaches, normative and positive.
Then an increase in quality triggered by more cost-sharing ben-
efits high-risk individuals with a higher probability than low-risk
4.1. Proportional income taxes
individuals. The resulting increase in insurance premiums, how-
ever, is not shared accordingly when premiums are partially pooled.
Individuals maximize their indirect utility function (18) with
High-risk individuals disproportionately benefit from an increase
respect to the cost-sharing parameter subject to the constraint
in quality making such improvements desirable whenever the HA
on the tax rate t given by Eq. (14). The first order condition of an
attaches a higher welfare weight to high-risk individuals than to
ij-type amounts to15
low-risk individuals, i.e., whenever > 0. In this case it pays off
to distort quality and effort away from its second-best levels as the dVijt ()
resulting improvement in the distribution of income outweighs the = (−ıi cq + [ıj + ˛ˇıi ]bq )q − ıi (ce + ve )e = 0, (36)
d
associated losses in efficiency. There is no such distributional ben-
efit when premiums are purely risk-rated, (ii). This result carries an where we defined ıi ≡ yi /y and ıj ≡ j /. The above equation
important policy message: second-best optimal health care provi- implicitly determines the most preferred cost-sharing parame-
sion can be achieved by both financing regimes, optimal income ter ijt of a type-ij individual.16 To see how the most preferred
taxation and risk-rated insurance premiums. It emphasizes that an cost-sharing parameter depends on income and risk we apply the
isolated (second-best) analysis of provider payment rests on strong implicit function theorem which, observing the first order condi-
assumptions regarding the financing of health care — unless, (iii), tions of the HCP, Eqs. (5) and (6), yields
the HA has no redistributive concerns.
Combining Propositions 1 and 2 we find that the optimal policy dijt (cq + ˛(1 − ˇ)bq )q + ce e
= < 0 and (37)
is sensitive to the mode of health care financing when redistribu- dıi SOC ij
tive concerns matter. Only in the special case of no redistributive
concerns is the redistribution of income, implied by a change from dijt bq q
=− > 0. (38)
one health care financing regime to the other, welfare neutral. We dıj SOC ij
emphasize this in the following corollary.
These inequalities allow us to order the four types according to their
Corollary 1. (Optimal policy: proportional income taxes vs. insur- most preferred cost-sharing parameter and with it to identify the
ance premiums) median voter.
(1 − ϕ) ≥ y ⇔ t ≥ p ⇔ qt ≥ qp and et ≤ ep .
In general, the comparison of cost-sharing across financing

regimes depends on the welfare weights the HA attaches to the
15
For convenience we neglect the constraint ∈ [0, 1]. Both, the ordering of types
with respect to their most preferred cost-sharing parameters as well as the com-
different types. We already suggested above, that the HA may wish
parison of allocations, normative and positive, remain valid.
to redistribute towards the disadvantaged in the society, namely, 16
In the following, we assume that the second order condition is satisfied,
to high-risk low-income agents. This implies > 0 and y < 0 and, 2
d Vijt /d 2 ≡ SOC ij < 0, implying that preferences are single-peaked (see Appendix
following the corollary, less cost-sharing with insurance premiums A.1.2).
Lemma 1. (Median voter: proportional income taxes) < ⇔ ıl < 1 + . High-risk individuals disproportionately benefit
The most preferred cost-sharing parameters can be ordered as fol- from quality improvements. As the median voter is a low-risk agent,
lows: rlt < min{pl
t , t } ≤ max{ t , t } < t . The median voter is
rh pl rh ph
he aims at limiting the redistribution towards high-risk individuals
t = t .
a pl-type agent, i.e., m and he does so by implementing a low cost-sharing rate. As the dis-
pl
tributional preferences of the HA will usually point in the opposite
Eq. (37) implies rjt < pj t . For a given risk type high-income
direction, > 0, cost-sharing is more pronounced in the third-best
agents contribute more to the financing of the health care scheme allocation than in the political outcome. These arguments extend to
than low-income agents. As the risk type is given, the bene- the case with inequities in both, income and risk. > : the cost-
fit from more cost-sharing in terms of higher quality is uniform sharing benefit accruing to the median voter is larger than in the
across income types making cost-sharing less attractive for high- social objective as, due to the sufficiently low health risk inequal-
income individuals. Similarly, Eq. (38) implies ilt < iht . For a given ity as compared to income inequality, the benefits of better quality
income, the expected benefits from quality improvements trig- care are distributed more evenly. In that sense the redistribution is
gered by more cost-sharing are higher for high-risk individuals better targeted towards the median voter. The HA’s redistributive
than for low-risk individuals. As financing costs are uniform across concerns are lower so that her cost-sharing incentives are weaker.
risk types, high-risk agents prefer more cost-sharing than low-risk = : although this is a knife-edge case it is noteworthy that
agents. Although the comparison of the most preferred cost-sharing the distributional preferences of the median voter and the HA are
parameters of pl-types and rh-types generally depends on the aligned so that the political outcome is identical to the third-best
distribution of types, identification of the median voter is straight- allocation.
forward: as neither the high-risks nor the rich can form a majority If the inequality in health risk and income is identical, = 1,
( h < 0.5 and r < 0.5, respectively), the median voter is a type-pl then the political outcome with health care financing through pro-
agent who either forms a majority with low-income high-risk indi- portional income taxation leads to second-best efficient health care
viduals or with high-income low-risk individuals. provision. The reason is that the median voter’s benefits from cost-
Inserting the median voter’s type into Eq. (36) and dividing by sharing through improvements in the distribution of income are
ıp allows a direct comparison with the third-best policy under pro- exactly compensated by the worsening of the distribution of health
portional income taxation. benefits. Finally, this allocation is second-best efficient if = y = 0.
We argued above that the HA likely attaches a higher welfare
(−cq + [ + ˛ˇ]bq )q − (ce + ve )e = 0, (39)
weight to high-risk or low-income individuals than to low-risk or
where ≡ ıl /ıp . While ıl ≤ 1 and ıp ≤ 1 measure the inequity in high-income ones. But then > 1 and the third-best can only be
health and income respectively, their fraction, , is a measure of implemented through a majority voting process if the inequality
the relative inequity between the two dimensions. in health risk is smaller than the inequality in income such that
= > 1. In general, however, the second-best cannot be achieved
Proposition 3. (Political outcome with proportional income taxes) as a political equilibrium.
(i) The relationship between cost-sharing arrangements in the third-

best optimum and the political equilibrium depends on how the 4.2. Insurance premiums
relative inequity along income and risk relates to the distribu-
tional characteristics of cost-sharing with respect to these two When health care financing is through insurance premiums,
dimensions. More precisely, we have: ⇔ t m t ⇔ individual preferences over the cost-sharing parameter are given
t t t
q qm and e em . t
by Eq. (19). The corresponding first order condition for is given
(ii) Health care provision is second-best efficient if and only if = 1. by
Second-best efficiency of the allocation requires the absence of
redistributive concerns, = y = 0. p
dVij ()
=− [(1 − ϕ)j + ϕ] cq q + ce e + ve e − ˛ˇbq q + j bq q = 0
d
Comparison to the first order condition for the cost-sharing (40)

parameter in the third-best, Eq. (33), shows that the difference j
⇔ −cq + + ˛ˇ bq q − (ce + ve )e = 0.
between these two equations depends on and . We discuss (1 − ϕ)j + ϕ
the three possible cases in turn. < : the benefits of the quality
of care receive a lower weight in the median voter’s first order con- First, we note that for purely risk-based premiums, ϕ = 0, all
dition than in the HA’s first order condition. Since quality provision individuals have identical most preferred cost-sharing rates so that
can be stimulated by cost-sharing, q > 0, the cost-sharing parame- there is no conflict in the electorate on how to shape health care
ter will be larger in the normative equilibrium than in the political funding. If premiums include a redistributive element, ϕ > 0, then
one.17 Suppose there is no inequity in risk, then < ⇔ ıp > 1 + y . high-risk individuals benefit more from health care provision and
This condition can only be satisfied if the HA aims at redistributing vote for a higher cost-sharing parameter than low-risk agents:
towards the poor, that is, if y < 0. The smaller y the more pro- p
dij /dj > 0. This allows us to order the four types by their most
nounced are the redistributive concerns of the HA. If the inequality
preferred cost-sharing parameter and, as a result, to identify the
in income is sufficiently small as compared to the distributional
median voter.
preferences of the HA, ıp > 1 + y , so are the benefits accruing to
the median voter from improvements in quality. This, in turn, Lemma 2. (Median voter: insurance premiums)
implies less cost-sharing in the political outcome than in the opti- The most preferred cost-sharing parameters can be ordered as fol-
mal one. Similarly, suppose there is no inequity in income, then p p p p
lows: rl = pl ≤ rh = ph . The median voter is an l-type agent, i.e.,
p p
m = il for i = r, p.
17
To see this formally observe As rich low-risk individuals share their cost-sharing preferences
t
dm bq q with poor low-risk individuals there will always be a majority for
=− > 0.
d SOC ij their most preferred cost-sharing parameter. Inserting l into Eq.
(40) we arrive at our next proposition.
Proposition 4. (Political outcome with insurance premiums) income taxation. Although somewhat unrealistic, the former finan-
cing arrangement offers a valuable benchmark (the second-best
(i) When insurance premiums entail some redistribution, ϕ > 0, then allocation) against which the allocations under alternative finan-
the relationship between cost-sharing arrangements in the third- cing regimes can be compared. The latter financing arrangement,
best optimum and the political equilibrium depends on how the proportional premiums, are the rule rather than the exception
inequity in risk relates to the distributional characteristic of risk. in the financing of social health insurance systems. 14 out of 16
p
More precisely, we have: 1 + ≥ ıl ⇔ p ≥ m ⇔ qp ≥ OECD countries with such a system use proportional contributions
p p p
qm and e ≤ em . to finance health care.20 The only exceptions are Hungary and
(ii) Health care provision is second-best efficient if and only if ϕ = 0. Switzerland. While the former country relies, in part, on progressive
Second-best efficiency of the allocation requires the absence of income taxes, Switzerland is an example for a regressive finan-
redistributive concerns, = y = 0. cing scheme that collects partially pooled insurance premiums.
Redistribution across risk types is limited insofar as individuals
If the HA aims at redistributing towards high-risk individuals, can choose from a set of contracts that differ in both, premiums
> 0, then the condition formulated in part (i) of the proposi- and deductibles. As individual selection is — among other things
tion always holds. High cost-sharing parameters imply generous — driven by risk, pooling is incomplete, ϕ < 1. Income is relevant
redistribution towards high-risk individuals. Other than the HA insofar as individuals below a canton specific threshold receive
the median voter, a low-risk agent, dislikes redistribution towards premium subsidies.21
high-risk types. Accordingly, cost-sharing is less pronounced in the Tax financed national health service systems tend to use pro-
political outcome than under the third-best policy. This ordering gressive taxation to finance health care. Such a system is more
can only be reversed if the HA has a sufficiently strong preference to redistributive than a proportional financing regime and likely less
redistribute towards low-risk type agents. Remarkably, for purely redistributive than a system that applies optimal income taxes.
risk-based premiums, ϕ = 0, the incentives of voters are aligned Therefore, the resulting allocation under progressive taxation will
with those of the HA in both the second-best and the third-best be some sort of mixture between the allocations under proportional
environment.18 Although this result appears somewhat surprising, income taxation and optimal income taxation. In the following, we
the intuition is relatively straightforward. The optimal allocations investigate the normative and positive allocations for a simplified
trade off inefficiencies along cost reduction effort and quality and progressive income tax scheme.
this tradeoff is not blurred by redistributive motives. Similarly, Suppose that there are two proportional income tax rates and
in the political game redistribution plays no role as risk-based that the one for the rich exceeds the one for the poor. Such a scheme
premiums preclude any form of redistribution. The third-best allo- is more redistributive than a proportional income tax system miti-
cation can thus be implemented through a majority voting process. gating the role of health care in redistributive policies. This implies
In contrast to the proportional income taxation case, this result that the HA would optimally implement a lower degree of cost-
holds independent of agent heterogeneity. Finally, note that for sharing.
purely risk-based premiums health care provision is second-best For the positive analysis of progressive income taxes we first
efficient. The resulting income distribution may still be inefficient have to identify the median voter. Rich low-risk individuals
unless there are no redistributive concerns. are still the ones who stand to benefit the least from publicly
Combining Propositions 3 and 4 we arrive at the following financed health care and poor high-risk individuals are the ones
corollary that facilitates comparison of political outcomes under who benefit the most. As rich high-risk individuals can never
alternative financing regimes. form a majority with risk types or income types of their kind,
poor low-risk individuals remain pivotal. It is, thus, the most
Corollary 2. (Political outcome: proportional income taxes vs. insur-
preferred cost-sharing level of the type-pl agent that is being
ance premiums)
t ≥ p p implemented. With progressive income taxation their net marginal
ıp − ıl ≤ ϕ(1 − ıl ) ⇔ m m ⇔ qtm ≥ qm and em t ≤
p benefit of health care provision is larger than with proportional
em .
income taxation as high-income individuals now contribute more
The comparison of cost-sharing across financing regimes hinges to the financing of the health care scheme. This implies more
on how risk pooling, as measured by ϕ, relates to the relative cost-sharing under progressive than under proportional income
inequity between income and risk. Only in the knife-edge case, taxation.
ıp − ıl = ϕ(1 − ıl ), is health care provision under majority voting We can now compare the normative to the positive outcome.
invariant to a change from insurance premiums to proportional Like with proportional income taxation the comparison hinges
income taxes. on the marginal incentives for redistribution. As argued above,
in the normative setting, these incentives are likely to be smaller
5. Discussion under progressive income taxation than under proportional income
taxation. This is the other way round in the positive setting. Conse-
One can think of many extensions of the framework analyzed quently, the set of parameter values for which the positive outcome
in this article. Here we discuss two important ones: progressive entails more cost-sharing than the normative outcome is larger
income taxation and endogenous choice of the health care financing under progressive income taxation (see Proposition 3).
regime. We discuss these extensions in turn.19
5.1. Progressive income taxation

20
Austria* , Belgium, Czech Republic* , Estonia, Germany* , Greece, Japan* , Korea* ,
When considering tax financed health care we look at two Luxemburg, The Netherlands* , Poland, Slovakia* , Slovenia, and Turkey. The aster-
polar cases, namely, optimal income taxation and proportional isk indicates that income thresholds exist above which no additional premiums are
being charged. In that sense, these schemes are proportional only up to the thresh-
old and then regressive (Source: HiT country profiles of the European Observatory
on Health Systems and Policies, available at http://www.euro.who.int/en/about-us/
18
Analytically this is the case because Eqs. (27), (35) and (40) coincide. partners/observatory.)
19 21
We thank two anonymous reviewers for suggesting to extend the analysis along Kifmann (2005) argues that the Swiss arrangement is similar to, e.g., the German
these lines. Technical details are available as an online supplement to this article. setting where premiums are proportional up to an income ceiling.
5.2. Endogenous choice of the health care financing regime
Our model considers health care financing arrangements to

be exogenous and shows how these arrangements interact with
provider payment. One may argue that this approach is incomplete
as neither the HA nor the voters have a say on the mode of health
care financing. In what follows we look at the optimal financing
regime and compare it to the one that would be implemented by
majority voting.
We start with the normative analysis be assuming that a fraction
of health care expenses are financed through proportional income
taxes and the remaining part through insurance premiums.22 Sup-
pose that health insurance contracts are (partially) pooled, i.e.,
ϕ > 0. Then, health care financing via insurance premiums implies
redistribution across risk types but not across income types. If
> 0 the redistributive effect of insurance premiums is valued
by the HA.23 This redistribution is still intact when health care
financing is through proportional income taxation: taxes do not
depend on risk but income. Proportional income taxes not only
redistribute from low-risk to high-risk agents but also from high-
income to low-income ones. If > 0 and y < 0 (as we conjectured
in Section 3.2.1) the HA values both redistributive effects rendering
proportional income taxation unambiguously more attractive than Fig. 1. Allocations under optimal policies and their political implementation under
majority voting.
insurance premiums. From the perspective of the HA this implies
that health care should exclusively be financed through income
taxes. Cost-sharing incentives are, thus, governed by , see Eq. (33) in the two frameworks. With risk rated premiums in the political
and Proposition 1. outcome and proportional income taxation in the normative one
For the political economy analysis we suggest sequential voting we get more cost-sharing in the latter if and only if > 1 (see the
where individuals first cast a ballot on the health care financing discussion following Proposition 1).25
regime (stage 1) and then on the cost-sharing parameter (stage 2). Finally, this analysis demonstrates that — for political economy
With backward induction, voting behavior at stage 1 will take the reasons — the prevailing health care financing regime may differ
voting outcome of stage 2 into account. Suppose that health care from the optimal one. This is the case when the median voter
financing is (partially) through insurance premiums. As the median prefers risk-rated insurance premiums but the HA proportional
voter is a type-pl agent, there will be no risk pooling as type-l indi- income taxes.
viduals are not prepared to subsidize type-h individuals. But then
the decision at stage 1 boils down to a comparison between propor- 6. Conclusion
tional income taxation and risk-rated insurance premiums. While
the latter scheme precludes any form of redistribution, the former Usually health care financing and health care funding are stud-
comes along with redistribution across both income and risk. The ied in isolation. This paper simultaneously analyzed the financing
median voter benefits from the redistribution across income but of health care and provider payment and thereby filled a gap in the
is negatively affected by the redistribution across risk types. If the literature. Considering a multi-task agency framework we derived
inequity in income is sufficiently large as compared to the inequity the optimal allocations under varying constraints. These included
in risk, then the type-pl agent is in favor of proportional income alternate financing regimes yielding the first-best, second-best and
taxation. In this case, type-ph agents would also be in favor of pro- third-best allocations. We then investigated whether the optimal
portional income taxes as they benefit from both redistribution policies can be implemented as political equilibria under majority
channels. The positive analysis then predicts proportional income voting. A pairwise comparison of allocations allowed us to iden-
taxation. If, however, the inequity in income is sufficiently low as tify conditions under which allocations coincide. In other words,
compared to the inequity in risk, then the type-pl agent will be in we showed when the mode of health care financing is irrelevant
favor of risk-rated premiums. He would team up with type-rl agents and when optimal policies can be implemented as a median voter
so that the political outcome predicts risk-rated premiums.24 equilibrium (Fig. 1 in Appendix A.3 summarizes the respective con-
Comparison of equilibria is now straightforward. With income ditions).
taxation in both settings, normative and positive, the difference in More precisely, when the quality of care and cost-reducing
allocations hinges on how the relative distributional characteristics effort are non-contractible but optimal income taxation is feasible,
compare to the relative inequity in income and risk. More precisely, the optimal allocation is second-best efficient. While the income
there will be more cost-sharing in the normative outcome than in distribution is optimal, the inefficiencies along quality and effort
the political one if and only if > (see Proposition 3). More inter- are, using supply-side cost-sharing, optimally traded off against
esting is the case where the mode of health care financing differs one another. If optimal income taxation is not feasible, and if the
health authority has redistributive concerns, health care financing
has redistributive consequences which, in turn, will affect optimal
provider payment.
22
Optimal income taxation is considered infeasible. So we are in the third-best With proportional income taxation health care financing implies
environment. a redistribution from high-income to low-income individuals and
23
For a social health insurance system an immediate implication is that the HA
would optimally choose flat premiums that is full pooling (ϕ = 1).
24
These results are similar to the ones obtained in the premium risk framework
25
by Kifmann (2005), who identifies conditions under which the electorate will unan- Recall that risk-rated premiums imply second-best health care provision, see
imously vote for proportional income taxes. Proposition 3 (ii).
from low-risk to high-risk individuals. The extent to which the public health care program be invariant to health care financing
health authority distorts provider payment away from its second- and be independent of individual heterogeneity and redistributive
best (and with it health care provision) depends on individual preferences. More precisely, if no additional redistribution over and
heterogeneity in risk and income and on the welfare weights the above the redistribution via the transfer scheme was beneficial (the
health authority attaches to the different types. We show that transfer scheme is financed by optimal income taxes26 ) or if the
Feldstein’s (1972) distributional characteristics can be used to public health care program and the transfer scheme have identi-
characterize the third-best allocation. If the distributional charac- cal redistributive properties (e.g., if both programs are financed by
teristics of income and risk are identical, so are second-best and proportional income taxes and transfers within the income scheme
third-best health care provision. As income distributions might are perfectly correlated with risk).
differ, the third-best allocation is not second-best efficient unless Our results make very clear that an isolated analysis of provider
both distributional characteristics are nil. This is the case when the payment rests on very strong assumptions regarding the mode
health authority has no redistributive concerns, or in the absence of of health care financing (optimal income taxation or risk-based
individual heterogeneity. The political outcome is governed by the premiums) or the redistributive concerns of the health author-
preferences of the median voter who happens to be a poor, low- ity (none). This has a bearing on the applicability of the results
risk agent. With proportional income taxation the median voter derived in the optimal provider payment literature. The quality-
contributes less to the financing of the health care scheme than the cost containment tradeoff will generally differ internationally. In
average voter. But, at the same time, he needs health care with a principle, our framework could be used to derive testable hypoth-
lower probability than the average voter. The median voter’s pre- esis regarding the relationship between health care financing and
ferences for cost-sharing are thus driven by the relative inequity supply-side cost-sharing. Actual hypothesis testing would require
between risk and income. These preferences will likely differ from accurate data on the degree of cost-sharing, Feldstein’s distribu-
the ones of the health authority so that the third-best allocation tional characteristics, and individual heterogeneity. To the best of
can generally not be implemented as a median voter equilibrium. our knowledge, such data is currently not available so that we leave
Fig. 1 below gives the condition under which the two are identical. the empirical analysis for future research.
It should be noted that the political outcome may yield second-best
health care provision. This is the case whenever the inequality in Appendix A.
risk is identical to the inequality in income. Only in the absence of
individual heterogeneity or redistributive concerns is this alloca- A.1. Second-order conditions
tion second-best efficient.
When considering insurance premiums as a mode of health care A.1.1. The health care provider
financing, redistributive consequences are driven by the extent Since the two leading principal minors alternate in sign:
of risk pooling. In the extreme case of risk-based premiums (no
pooling) the third-best allocation yields second-best health care |H
qq | = |˛bqq −(1 − )cqq | < 0
provision. The reason is that risk-based premiums preclude any Hqq Hqe ˛bqq − (1 − )cqq 0
= >0
form of redistribution so that the health authority refrains from Heq Hee 0 (1 − )cee + vee
distorting health care provision away from its second-best. But
this also implies that there is no conflict in the electorate how due to bqq < 0, cqq > 0, cee ≥ 0 and vee > 0, the HCP’s chosen quality
to shape health care provision. The third-best allocation can thus level, q( ; ˛), and cost-reducing effort, e( ; ˛), are a maximum of
be implemented as a political equilibrium and health care pro- the objective function H(q, e).
vision is second-best efficient. As the income distribution might
not be optimal the resulting allocations will generally not be A.1.2. Individuals
second-best efficient. In the event of (partially) pooled premiums, The second-order condition of individual-ij amounts to
health care financing through insurance premiums involves a
2
redistributive element. Income is redistributed from low-risk to d Vijt ()
2
= (−ıi cqq + (ıj + ˛ˇıi )bqq )(q ) + (−ıi cq + (ıj + ˛ˇıi )bq )q
high-risk individuals. Redistribution from high- to low-income d 2
individuals is ruled out. Consequently, the third-best allocation <0
2
≷0
−ıi (cee + vee )(e ) −ıi (ce + ve )e .
and the political outcome will depend on the distributional char-
acteristic of risk and the inequity in risk, respectively. Only in the <0 <0
event of aligned cost-sharing preferences of the health authority
and the median voter can the third-best allocation be imple-
mented by majority voting. Again, the respective condition is given A.2. Comparative statics
in Fig. 1.
Finally, we derived conditions under which health care finan- Taking the total derivative of the HCP’s first order conditions (6)
cing is irrelevant in third-best and political economy environments. and (5) yields
It turns out that health care financing will generally matter unless
(˛bqq − (1 − )cqq )dq = −cq d − bq d˛ (41)
risk pooling relates to the distributional characteristics and indi-
vidual heterogeneity in the way laid out in Fig. 1. ((1 − )cee + vee )de = ce d (42)
The framework we adopted only considers public health care
and abstracts from additional public programs like, most impor- as we assume cqe = 0. The above system of equations can be written
tantly, an income transfer scheme. An immediate implication is as
that the former program implicitly assumes the redistributive role
dq 1 (1 − )cee + vee 0 −cq d − bq d˛
of the latter program (Cremer and Gahvari, 1997). Our results are = , (43)
de D 0 ˛bqq − (1 − )cqq ce d
largely robust to adding an income transfer scheme in the sense that
individual heterogeneity, redistributive preferences, and health
financing would still matter. The reason is that, apart from risk-
based premiums, health care financing is inherently redistributive. 26
This is, of course, equivalent to assuming health care financing through optimal
Only in rather extreme and unrealistic cases would the size of the income taxes.
where D = (˛bqq − (1 − )cqq )((1 − )cee + vee ) > 0. Hence, we Chalkley, M., Malcomson, J.M., 2000. Government purchasing of health services.
have Handbook of Health Economics 1a, 847–890.
Cremer, H., Gahvari, F., 1997. In-kind transfers, self-selection and optimal tax policy.
dq −((1 − )cee + vee )cq −cq European Economic Review 41, 97–114.
= = >0 (44) Cremer, H., Pestieau, P., 1996. Redistributive taxation and social insurance. Interna-
d D ˛bqq − (1 − )cqq
tional Tax and Public Finance 3, 281–295.
de (˛bqq − (1 − )cqq )ce ce Dranove, D., Kessler, D., McClelland, M., Satterthwaite, M., 2003. Is more information
= = < 0. (45) better? The effects of “Report Cards” on health care providers. Journal of Political
d D (1 − )cee + vee Economy 111, 555–588.
Eggleston, K., 2005. Multitasking and mixed systems for provider payment. Journal
Additionally, we have of Health Economics 24, 211–223.
Ellis, R.P., McGuire, T.G., 1990. Optimal payment systems for health services. Journal
d2 q cq cqq of Health Economics 9, 375–396.
= >0 (46)
d 2 (˛bqq − (1 − )cqq )
2 Epple, D., Romano, R.E., 1996a. Ends against the middle: determining public service
provision when there are private alternatives. Journal of Public Economics 62,
297–325.
d2 e ce cee
= < 0. (47) Epple, D., Romano, R.E., 1996b. Public provision of private goods. Journal of Political
d 2 ((1 − )cee + vee )2 Economy 104, 57–84.
Feldstein, M.S., 1972. Distributional equity and the optimal structure of public prices.
American Economic Review 62, 32–36.
A.3. The health care financing and funding interaction Gouveia, M., 1997. Majority rule and the public provision of a private good. Public
Choice 93, 221–244.
Appendix B. Supplementary data Gravelle, H., 1999. Capitation contracts: access and quality. Journal of Health Eco-
nomics 18, 315–340.
Jack, W., 2005. Purchasing health care services from providers with unknown altru-
Supplementary data associated with this article can be found, in ism. Journal of Health Economics 24, 73–93.
the online version, at http://dx.doi.org/10.1016/j.jhealeco.2015.04. Kaarbøe, O.M., Siciliani, L., 2011. Multitasking, quality and pay for performance.
Health Economics 20, 225–238.
003 Kifmann, M., 2005. Health insurance in a democracy: why is it public and why are
premiums income related? Public Choice 124, 283–308.
Kifmann, M., Roeder, K., 2011. Premium subsidies and social health insurance:
References substitutes or complements? Journal of Health Economics 30, 1207–1218.
Ma, C.A., 1994. Health care payment systems: cost and quality incentives. Journal of
Blomqvist, A., Horn, H., 1984. Public health insurance and optimal income taxation. Economics & Management Strategy 3, 93–112.
Journal of Public Economics 24, 353–371. Ma, C.A., McGuire, T.G., 1997. Optimal health insurance and provider payment.
Breyer, F., Haufler, A., 2000. Health care reform: separating insurance from income American Economic Review 87 (4), 685–704.
redistribution. International Tax and Public Finance 7, 445–461. Nuscheler, R., 2003. Physician reimbursement, time consistency, and the quality of
Chalkley, M., Malcomson, J.M., 1998a. Contracting for health services with unmon- care. Journal of Institutional and Theoretical Economics, 302–322.
itored quality. The Economic Journal 108, 1093–1110. Zeckhauser, R., 1970. Medical insurance: a case study of the tradeoff between
Chalkley, M., Malcomson, J.M., 1998b. Contracting for health services when patient risk spreading and appropriate incentives. Journal of Economic Theory 2,
demand does not reflect quality. Journal of Health Economics 17, 1–19. 10–26.

JHE Vol. 42 (July 2015)

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

JHE Vol. 42 (July 2015)

Transféré par

Droits d'auteur :

Formats disponibles

Volume 42, July 2015 ISSN: 0167-6296

The Journal of Health Economics has no page charges

Contents lists available at ScienceDirect

Journal of Health Economics

Editorial statement on negative ﬁndings

Contents lists available at ScienceDirect

Journal of Health Economics

Information disclosure and peer effects in the use of antibiotics

(a) Clinic Peer Effect (b) Consumer Learning Effect

Fig. 1. Effect of information disclosure and interaction with competition.

4. Data Our empirical analysis is based on the quarterly individual clinic

N Mean SD Min Max

Antibiotic Antibiotic prescription (%) 42,020 52.97 28.72 0 100

Clinic Medical speciality

Township (dong) # of clinics 42,020 8.74 3.75 1 19

(1) (2) (3) (4) (5)

Disclosure −9.6725*** −9.6682*** −9.5442*** −9.3494*** −8.9836***

Observations 42,020 41,160 41,286 42,020 28,035

(b) balanced sample

change in prescription rate

Fig. 4. Distribution of the changes in prescription rates.

5.2. Clinic heterogeneity

Even though the average antibiotic prescription rates have

Fig. 5. Change in prescription rate: by each clinic in a township.

(a) 0<prescription<25th (b) 25th<prescription<50th

-100 -50 0 50 100 -100 -50 0 50 100

(c) 50th<prescription<75th (d) 75th<prescription<100

-100 -50 0 50 100 -100 -50 0 50 100

Fig. 6. Change in prescription rates: by prescription rate before information disclosure.

(1) (2) (3) (4)

P50 −0.2753 ***

Observations 2272 1538 2272 1538

(1) (2) (3) (4)

Disclosure −9.9817*** −9.6624*** −4.0114*** −19.7177***

Observations 41,878 27,949 49,405 28,044

(1) (2) (3) (4) (5) (6)

Disclosure −0.1591*** −0.0872***

Observations 17,220 14,473 9550 4923 9550 4923

(1) (2) (3) (4) (5) (6) (7)

Disclosure −0.0872 ***

Observations 14,473 14,251 13,410 14,473 14,425 14,473 13,206

(1) (2) (3) (4) (5) (6) (7)

D (=disclosure) −9.3494 ***

Observations 42,020 41,189 38,956 41,878 41,878 41,878 38,311

Contents lists available at ScienceDirect

Journal of Health Economics

Recessions, healthy no more?

1. Introduction the state unemployment rate was estimated to decrease total

∗ Tel.: +1 434 243 3729.

many recent investigations of macroeconomic variations in health

Type of mortality 1976–1995 1991–2010 Difference

All −0.0043 (0.0009) ***

Cause of death 1976–1995 1991–2010 Difference

Diseases −0.0033 (0.0010)*** −0.0014 (0.0010) 0.0019 (0.0013)

External causes −0.0148 (0.0020)*** 0.0078 (0.0028)*** 0.0226 (0.0023)***

Other accidents −0.0173 (0.0033)*** 0.0086 (0.0049)* 0.0259 (0.0048)***

Fig. 5. Unemployment coefﬁcients for disease versus external of deaths.

Type of mortality Share of deaths Predicted in no. of deaths

Point estimate 95% Conf. Interval

All deaths 1.00000 7253*** 1882–12,637

Diseases 0.9293 3988 −1221–9210

External causes 0.0707 3582*** 2845–4322

Other accidents 0.0243 1418*** 899–1942

Fig. 7. Trends in accidental poisoning mortality, by age.

Disclosure −9.6725* −9.6682* −9.5442* −9.3494* −8.9836***

Disclosure −9.9817* −9.6624* −4.0114* −19.7177*

Disclosure −0.1591* −0.0872*

External causes −0.0148 (0.0020)* 0.0078 (0.0028)* 0.0226 (0.0023)***

Type of mortality Share of deaths Predicted in no. of deaths