Natural and Quasi-Natural Experiments To Evaluate Cybersecurity Policies

Benjamin Dean
Natural and Quasi-natural Experiments to Evaluate Cybersecurity Policies
Natural and Quasi-natural Experiments to

Evaluate Cybersecurity Policies
Benjamin Dean
25 September 2016
Published in:
The Journal of International Affairs
Volume no.: 70, Issue no.: 1, Page range: 139-160, Year: 2016
GALE-476843510; ISSN: 0022-197X
1
Benjamin Dean
Abstract
Over the past decade, numerous countries across the world have developed and implemented national
cybersecurity strategies. Each strategy comprises a set of objectives and various programs to achieve those
objectives. Tens of billions of dollars in taxpayer funds have been diverted from other purposes to pay for
these strategies. A number of countries recent strategies are reviews of previous ones.
Unfortunately, there are still no definitive answers to questions such as: Have these strategies achieved
their overall objectives? Which programs contributed the most to these objectives? By how much (or little)?
Where have funds been most effectively spent? Where improvements might be made?
By most accounts, the cybersecurity situation globally is getting worse, in spite of the many measures
being taken. There is a real need to improve assessment and evaluation of cybersecurity policies so as to
inform and guide policy change.
With a new generation of cybersecurity strategies now being rolled out, it is timely to consider what
evaluation techniques might be employed at the outset, so as to better track the performance of programs
and the cost-effectiveness of funds spent. In doing so, public policies might better address the present state
of cybersecurity nationally and globally in the future.
One promising technique for the evaluation of some cybersecurity programs is the use of natural and
quasi-natural experiments. These broad groups of research designs and methods avoid the potentially high
cost, possible ethical issues, and impracticality of randomized control trials in a domain like cybersecurity.
At the same time, they provide relatively robust measures of the counterfactual and net social/economic
impact of policy decisions.
This paper will start with a background on national cybersecurity strategies. This will be followed by
an explanation of common evaluation techniques with a special emphasis on natural and quasi-natural
experiments. Finally, the paper will identify instances in which such research designs and methods might
most effectively be used to evaluate certain programs that commonly comprise cybersecurity strategies and
how such evaluations might be done in practice.
2
Benjamin Dean
Introduction
Over the past decade, at least 20 countries have developed and implemented national cybersecurity
strategies1,2. At least five of them have updated their strategies since their first edition (Australia, Czech
Republic, Estonia, Netherlands and the United Kingdom).
The objectives within these strategies are broadly similar across countries. According to the European
Union Agency for Network and Information Security (ENISA) in 2014, the most commonly recurring
objectives of the strategies in Europe include: developing cyber defense policies and capabilities; achieving
cyber resilience; reducing cybercrime; supporting industry on cybersecurity; and securing critical
information infrastructures. These objectives are broadly similar to those in the strategies of other non-
European countries.
Countries fund and implement various activities or programs to achieve these objectives. Some
activities involve implementing new or revised legislation. Others involve discrete programs such as
research and development grants, training, awareness campaigns, or capacity building of target segments
such as small and medium enterprises.
Large amounts of public funds are being reallocated from other policy areas, such as education or
healthcare, to fund these cybersecurity strategies. For instance, in the United States, an estimated $19 billion
is expected to be allocated to cybersecurity measures in the 2017 White House budget proposal 3. While not
strictly a national cybersecurity strategy, this level of funding nonetheless amounts to a great deal given the
annual totals have exceeded $10 billion annually for the past five years. Australias most recent
cybersecurity strategy involves a more modest AU$57.5 million per annum over four years (Australian
Government 2016). Some strategies, unfortunately, do not mention how much will be spent. Examples
include the Canadian Cybersecurity Strategy of 2010 and European Unions Cybersecurity Strategy of 2013.
Program Evaluation: A Primer
Program evaluation is a mainstay of the evidence-based policymakers toolkit. In their review of policy
evaluation in innovation and technology, Papaconstantinou and Polt (1997) define evaluation as a process
that seeks to determine as systematically and objectively as possible the relevance, efficiency and effect of
an activity in terms of its objectives, including the analysis of the implementation and administrative
management of such activities.
Policy decisions should lead to outcomes where the social and economic benefits of a policy
intervention outweigh the related costs. In the event that the policy intervention does not deliver a net
economic and social benefit at least equal to the long-term bond rate, then public resources are not being
used in an optimal way4. The task of evaluation methods is to determine these costs and benefits in a
statistically robust way.
There are many benefits to be gained from program evaluation. For instance, in the United States in
2013, a Government Accountability Office (GAO) survey found that of the 37 percent of federal managers
who had undertaken evaluations in the past, 80 percent reported that, those evaluations contributed to a
moderate or greater extent to improving program management or performance and to assessing program
effectiveness or value5. Referring specifically to cybersecurity policy evaluations, ENISA (2014) claims
1
(OECD 2012)
2
(ENISA 2014)
3
(White House 2016)
4
(OECD 2011)
5
(GAO, 2013)
3
Benjamin Dean
that such evaluations lead to benefits in terms of greater accountability of public action, increased credibility
domestically and with international partners, providing an evidence-based input to long-term planning, and
providing support for outreach and enhancement of public image, amongst many others.
In short, program evaluation leads to better policy decisions and improved policy outcomes over time.
Given the current state of cybersecurity, and the explicit objective of cybersecurity strategies to improve
this state, program evaluation plays an integral role in achieving these objectives over time.
Yet in the United States, evaluation of public programs is not widely used. The same GAO study found
that only 37 percent of federal managers had completed an evaluation of any program, operation, or project.
Of those who had undertaken evaluations, the factor most commonly cited as having hindered evaluations
to a great or very great extent was lack of resources to implement evaluation findings (33 percent). More
attention to and funding for the undertaking of evaluations and implementation of evaluation findings would
go a long way towards improving policy outcomes.
While in-depth financial audits have been undertaken of cybersecurity strategies in the United States
and United Kingdom, these serve a different purpose than that of program evaluations. Rather than
examining the impacts of the strategies along the stated objectives, these audits tend to focus on verifying
the correctness of financial statements, and assessing how economically, effectively and efficiently the
funds are spent 6.
The European Commission undertook an extensive ex ante impact assessment of its cybersecurity
strategy in 2013. The impact assessment is notable for its clear problem definition, identification of drivers
behind the problem, identification of shortcomings of the status quo, clear justification for policy
intervention, clear objectives, policy alternatives, and the identification of indicators/metrics to eventually
evaluate the progress in achieving the stated objectives. This is a solid foundation on which to conduct future
policy evaluation. However, the cost-benefit calculations of the assessment belie the dearth of robust
evidence on which to base policy decisions. Many assumptions were made to estimate even the costs of
policy options. The benefits are extremely difficult to estimate for a number of reasons, including the
difficulty to assess to what extent enhanced NIS would mitigate the negative impact of security incidents 7.
There is much to be done to reliably measure what these benefits might be in the future.
Evaluation of cybersecurity strategies has been urged by various organizations in the past. The Business
Industry Advisory Committee (BIAC) to the Organisation for Economic Co-operation and Development
(OECD) recommended that, efficient national cybersecurity strategies and policies should be periodically
evaluated and updated so that improvements can be implemented to face new security threats (OECD
2012). As far back as 2009, the GAO suggested that there were opportunities to enhance federal
cybersecurity in the United States through many measures including, enhancing independent annual
evaluations. These recommendations were not acted upon. In a recent example, in January 2016 the GAO
reported that although the Department of Homeland Security (DHS) had developed metrics for measuring
the performance of the National Cybersecurity Protection System (NCPS, also known as EINSTEIN), they
do not gauge the quality, accuracy, or effectiveness of the systems intrusion detection and prevention
capabilities. After spending more than $6 billion in funding over the last 10 years, DHS was thus unable
to describe the value provided by NCPS.
While the majority of European countries have provisions in their cybersecurity strategies for
monitoring and evaluation, the listed evaluation methods include progress reports, updates, and
questionnaires amongst stakeholders, among some others8. What are not mentioned, however, are evaluation
6
(ENISA 2014)
7
(EC 2013)
8
(ENISA 2014)
4
Benjamin Dean
methods of an advanced enough level to estimate the counterfactual, i.e. what would have happened were
the policy intervention not to have been pursued. This is important because without such an estimation, we
are unable to determine the impacts of the policy intervention(s) or the cost effectiveness of funds have been
spent on said intervention(s).
A number of past efforts have attempted to evaluate the cost effectiveness of cybersecurity approaches
by testing them with Red Cell attacks. More commonly referred to as penetration testing (or simply
pen testing) in cybersecurity, these exercises involve the simulation of an attack on a system, network,
piece of equipment or other facility, with the objective of proving how vulnerable that system or target
would be to a real attack9. One notable past example of such an exercise at the national level in the United
States was Eligible Receiver in 1997, which involved a red team from the NSA infiltrating the Pentagons
systems10. Such exercises are intended to identify then address existing vulnerabilities in systems and to
better develop incidence responses or risk mitigation measures in the future. However, such exercises do
not provide an understanding of the cost-effectiveness, counterfactual or net social/economic benefits of the
security measures or programs put in place. They are thus not able to be used for the evaluation of many of
the programs or policies that appear in current cybersecurity strategies.
A number of additional approaches and techniques have been suggested for the evaluation of
cybersecurity strategies in the past. In the impact assessment for the EU Cybersecurity Strategy, the final
section of the report covers monitoring and evaluation. Of the methods proposed to evaluate the strategys
three objectives, the recurring methods include surveys of competent authorities, comparative
implementation reports, progress reports, and outcome assessments. The monitoring indicators provided
only cover input metrics, such as the number of Member States having appointed a NIS competent authority
which is adequately staffed and equipped to carry out EU-level cooperation, rather than output metrics
such as reduction in breach incidents or reductions in economic losses from cybercrime.
The BIAC to the OECD recommended that evaluation of cybersecurity strategies might be achieved
through the compilation of periodic comparisons between the strategies of different nations, the production
of country reports to share information about security incidents and the level of damages, and the planning
of recurrent cybersecurity risk assessments11. Unfortunately, these recommended measures lacked specific
techniques for evaluation. In the same report, the Civil Society Internet Society Advisory Council (CSISAC)
to the OECD suggested that assessment might be achieved by measuring cybersecurity strategies against
their impacts on metrics such as values recognised by democratic societies, such as freedom of expression,
privacy, due process, and transparency. These values however are not metrics in a strict definition sense of
the word metrics 12.
ENISA (2014) has previously advocated for a logical framework (log frame) model to evaluate
cybersecurity policies. This model involves identifying the program objectives, inputs (financial and
personnel), activities, outputs, outcomes and impacts, and final evaluation. The final evaluation then flows
back into the future policymaking process. However, none of these approaches advocates the of use a
research design that employs the rigor that is required to accurately and reliably determine the net social
and economic costs of a cybersecurity policy intervention.
Complicating Factors for Evaluation of National Cybersecurity Strategies
Inherent in any attempt at evaluating national cybersecurity strategies are the complications that arise
from overlapping and sometimes contradictory outcomes at different levels of cybersecurity. One could
9
(Henry 2012)
10
(Beidleman 2009)
11
(OECD 2012)
12
A standard of measurement being a commonly accepted definition of metrics.
5
Benjamin Dean
conceive of cybersecurity as existing across an international level (inter-state relations), national level,
organizational level (broken-up into public and private organizations then again by size of the organization
[by headcount or revenue] and the individual level. Improving cybersecurity for entities or actors at one
level may come at the expense of cybersecurity for those at other levels.
This has been very clearly demonstrated in the decades-long tug-of-war between national law
enforcement/intelligence services and civil liberty/private sector entities over encryption (also known as the
Crypto-wars)13. The most recent chapter of this story is the F.B.I/Apple case regarding the phone of one
of the perpetrators of the San Bernardino shootings in the United States On the one hand, permitting law
enforcement/intelligence services to insert backdoors into encryption standards, so as to permit the
decryption of communications by designated bad actors, may result in greater overall security (e.g. in the
form of thwarted terrorist attacks). On the other hand, such a measure comes at the known expense of the
(cyber)security of an individuals communications and exchanges with private sector entities through, for
example, online shopping or banking (ibid). Thus, implementing measures to improve (cyber)security at
one level comes at the expense of (cyber)security at other levels.
Such conflicts or contradictions can also be seen at the international level. For instance, all countries
might benefit on a net basis from international cooperation around increasing cybersecurity and fighting
cybercrime. Yet such cooperation does not occur organically, even between countries or regions with
commonly held values and interests (e.g. the European Union and United States). To some extent, this
divergence is driven by the interests of the nation states, or organizations within those nation states such as
intelligence agencies, which may have an interest in continuing espionage that is facilitated by poor
cybersecurity.
Finally, it is difficult to simulate relatively more beneficial alternative courses of actions based purely
on the evaluation of past empirical experience. At a national level, it may make sense to pursue policies that
in the past have effectively addressed safety concerns brought on by drastic technological change (e.g.
United States automobile safety and product liability in the 1950s and 1960s). Likewise, it may be relatively
more beneficial to transition away from old cybersecurity paradigms, such as firewalls (building higher
walls), towards a cloud-based paradigm that removes the need for security at an individual or user level.
Yet such efforts may not be considered due to strong interests in maintaining the status quo at an
organizational level particularly in the private sector.
Overcoming contradictions and trade-offs such as these at different levels of cybersecurity policy
analysis is the challenge that faces policymakers today. Evaluation of the outcomes of policies at different
levels will thus be essential so as to develop policies that maximize net social and economic benefits, across
some or all levels, in the future.
Randomized Controlled Trials
A number of methodologies and research designs might be deployed to evaluate policy interventions
and thereby determine the counterfactual and the net social and economic costs or benefits of a policy option.
The gold standard for research design to evaluate policies is the randomized controlled trial 14.
The simplest form of this method involves randomly allocating a population into two groups: a control
group and a treatment group. As the names suggest, the treatment group will receive the intended
intervention (in the case of cybersecurity policy, this might be training or a health check assessment). The
control group will receive nothing. The two groups will be tracked across certain variables. Once the study
13
(Abelson et al. 2015).
14
(Sanson-Fisher et al 2007, Lee & Lemieux 2013)
6
Benjamin Dean
has ended and the results have been analyzed, the difference between the variables for the two groups will
represent the impact of the intervention. The control group represents the counterfactual what would have
happened without the intervention. This allows for an estimation of the effects of the policy intervention,
which in turn permits an estimation of its cost-effectiveness.
To date, randomized controlled trials have not been proposed, much less attempted, for cybersecurity
policies. This could be due to the various constraints of this research method. Writing on health care
interventions, which share some characteristics with cybersecurity policy interventions, Sanson-Fisher et al
(2007) cite these limitations as the limited availability of sufficient populations, the time available for
follow-up, threats to external validity, contamination of the control or treatment groups, cost, and ethical
concerns
One underexplored set of methods, which overcomes these limitations and provides a more rigorous
analysis than simply recording inputs or asking a competent authority for progress reports consists of
natural or quasi-natural experiments 15.
Natural and Quasi-natural Experiments
A natural experiment involves identifying a random event that has an impact on the subject of study but
no impact on the variable one intends to measure. Put more formally, in a natural experiment, the treatment
(the independent variable of interest) varies through some naturally occurring or unplanned event that
happens to be exogenous to the outcome (the dependent variable of interest) (ibid). Naturally occurring
events might be extreme weather like a hurricane or outbreak of an epidemic. As such, many studies have
been conducted in New Orleans following Hurricane Katrina in 2005. These studies have examined
outcomes ranging from recidivism rates 16 to labor markets and wages17. An unplanned event might be mass
migration, such as the Mariel boatlift, in which large numbers of Cubans arrived in Florida in 1980.
Natural experiments compare the outcomes of two groups of subjects that are separated into groups
because of the introduction of the exogenous variable. They resemble randomized controlled trials except
that the researcher has no control over the random assignment characteristic. A key point is that the groups
do not self-select into behavior. Their reception of the treatment is random or close to random. Natural
experiments require that the two groups be broadly comparable (at least with regard to the characteristics
pertinent to the study), along with a way to record relevant metrics18.Quasi-natural experiments, by contrast,
do not involve random application of a treatment. Instead, a treatment is applied due to social or political
factors, such as a change in laws or implementation of a new government program. The recipients of the
treatment are thus not randomly but intentionally chosen according to some predetermined criteria. The
group that receives the treatment in a quasi-natural experiment is called the comparison group instead of the
control group19.
By virtue of the fact that natural and quasi-natural experiments are conducted in the real world, that
is, outside of a laboratory or in an artificial setting, their generalizability and relevance for policy and
decision making is enhanced (ibid). However, they suffer from a lower ability to attribute causation to a
treatment relative to randomized controlled trials. Moreover, if policy changes constitute responses on the
part of political decision makers to changes in a variable of interest, analysis of these changes as a natural
15
(Remler & Van Ryzin 2015)
16
(Kirk 2015)
17
(De Silva et al 2010)
18
Remler & Van Ryzin 2015)
19
(ibid)
7
Benjamin Dean
or quasi-natural experiment may yield biased impacts of the estimates of the policy20,21. Nevertheless, these
elements can be controlled for using statistical methods.
Natural and quasi-natural experiments can be conducted using a number of methods and designs.
Shadish et al. (2002) identify 18 such methods. The table below provides a non-exhaustive overview of the
more well-established methods and some of their pros and cons. As a general though not absolute rule of
thumb, as one proceeds down the list, the robustness and generalizability of the results increases at the
expense of practicality and/or cost. The methods can be broadly differentiated according to the following
characteristics/configurations:
Prospective/retrospective studies, which either look forward on some phenomenon (sometimes

termed a pretest) or backward (posttest) on some past phenomenon.
Use of a control group, a comparison group or neither. The configuration of such groups can be
done in a multitude of ways, though the key difference between a control and comparison group
is random selection in the formers case.
Use of longitudinal (or panel) data to study phenomena and their impact over time or use of
cross-sectional data to compare one or more groups across specific variables or characteristics
at a single point in time.
20
(Besley & Case 2000, Kubik & Moran 2001)
21
Consider for example the case of police numbers and crime rates. If cities are increasing police numbers, they are
likely doing this in response to rising crime rates. A nave analysis of such an environment might conclude that
increasing the number of police officers increases the crime rate (Levitt 1997).
8
Benjamin Dean
Table 1. Methods for natural and quasi-natural experiments
Design Description Pros Cons

Designs without control Intuitive findings at a low Because such experiments
groups cost. are not conducted in a lab,
researchers do not have the
ability to hold all relevant
surroundings constant.
One-group posttest-only One posttest observation
on respondents who
experienced a treatment,
but there are neither
control groups nor
pretests.
One-group pretest-posttest A single pretest
(before-after studies) observation is taken on a
group of respondents,
treatment then occurs, and
a single posttest
observation on the same
measure follows.
Removed-treatment This design adds a third
posttest to the one-group
pretest-posttest design and
then removes the
treatment before a final
measure is made. The aim
is to demonstrate that the
outcome rises and falls
with the presence or
absence of treatment, a
result that could be
otherwise explained only
by a threat to validity that
similarly rose and fell over
the same time.
Repeated-treatment Introduce, remove, and
reintroduce treatment over
time so as to study how
treatment and outcome co-
vary over time.
Designs with control Increased robustness of In a case-control study, the
group but no pretest findings vis--vis no outcome of interest is
control group. often rare, and so studies
typically start with an
available set of cases. This
may introduce selection
bias.
Posttest-only with Add a control group to the
nonequivalent groups one-group posttest-only
design
Constructing contrasts Construct contrasts to
other than with mimic the function of an
independent control independent control
groups group. Three such
contrasts could be (1) a
9
Benjamin Dean
regression extrapolation
that compares actual and
projected posttest scores,
(2) a normed comparison
that compares treatment
recipients to normed
samples, and (3)
secondary data that
compares treatment
recipients to samples
drawn from previously
gathered data, such as
population-based surveys.
Case-control One group consists of
cases that have the
outcome of interest,
and the other group
consists of controls that do
not have it. Cases and
controls are then
compared using
retrospective data to see if
cases experienced the
hypothesized cause more
often than controls.
Designs that use both
control groups and
pretests
Untreated control group Frequently called the non- The external validity, Added cost associated
with dependent pretest and equivalent comparison robustness or accuracy of with each additional
posttest samples group design, this adds a the estimated impacts of element.
pretest, posttest and treatment, etc.
comparison group to make
it easier to examine certain
threats to validity.
Difference-in-differences Two before-after Greater internal validity In real-world applied
comparisons, one for the than before-after or cross- social policy research, the
treatment and another for sectional comparisons. perfect comparison group
the comparison group. Works well even if the two is rarely available.
groups are not perfect
matches.
Matching Comparison of two groups If the two groups were A researcher can only
where individuals are truly identical in all ways match on variables that are
identified who are as close other than the RMC measured and available. If
as possible to those in the program, then these it turns out that the groups
treatment group. differences would give an were not comparable
unbiased estimate of the across the desired
treatment effect. variables, conclusions
about the effect of the
treatment will be less
certain. Group-level
matching can be imprecise
because of all the unique
characteristics that often
crop up within groups.
10
Benjamin Dean
Cohort-control Use of cohorts, successive
groups that go through
processes (e.g. classes in
schools), as the matching
group.
Designs that combine A wide variety of methods Depends on the elements Added cost associated
many elements that might add any added. The external with each additional
combination of untreated validity, robustness or element.
matched controls with accuracy of the estimated
multiple pretests and impacts of treatment, etc.
posttests, non-equivalent
dependent variables, and
removed and repeated
treatments
Interrupted time series Time series refers to a An improvement over Time-series analysis can
large series of before-after studies as it be complicated because
observations made on the helps answer the question what happens in one
same variable of what the trend in the period may be driven by
consecutively over time. outcome variable looked what happened earlier (so-
like before the called 'autocorrelation').
intervention.
Simple interrupted time One treatment group with
series many observations before
and after a treatment.
Panel data for difference- Two before-after Panel data can also Generalizability: It is
in-differences comparisons, one for the represent schools, difficult to determine what
treatment and another for organizations, population the studys
the comparison group. neighborhoods, cities, findings apply to. The time
Repeated measurements states, or even nations over scale might not be
on the same individuals time. sufficient for all the
over several time periods. independent variables to
affect the dependent
variable. The potential
endogeneity of the
changes observed.
Variants A wide variety of methods
that involve adding a
control group, dependent
variables, multiple
replications or switching
replications.
Regression discontinuity A cut-off point for a single An especially strong quasi
quantitative assignment experiment, although it
variable is used two create applies only in rather
two groups: one on either specific circumstances.
side of the cut-off point.
Instrumental variable Introduction of a variable Estimates may not
that is induces change in generalize to the entire
the explanatory variable study population.
but not in the independent
variable.
Source: Designs and descriptions from Shadish, Cook & Campbell (2002). Pros and cons from Remler & Van Ryzin
2015.
11
Benjamin Dean
This wide range of possible methods demonstrates the flexibility of natural and quasi-natural
experiments. This flexibility means that the evaluation method can be customized to accommodate the
specific nature of the policy intervention. This flexibility thus makes such research methods a promising
avenue for evaluation of a variety of cybersecurity policies. Yet each of these methodologies has relative
weaknesses and strengths. Some are thus more suitable for evaluating certain types of activities or policies
over others.
Applying natural or quasi-natural experiments to cybersecurity policies
This section examines some of the methods used for natural or quasi-natural experiments, recounts past
studies in other domains where the methods have been used effectively, then identifies cybersecurity
activities or programs for which these methods might be applied for future evaluation.
The examples below are meant to be illustrative of the possibilities for natural or quasi-natural
experiments in evaluating cybersecurity policies already in place within major cybersecurity strategies
worldwide. It is by no means an exhaustive list.
There are countless other ways in which these proposals could be reconfigured based on cost, ethical,
or political constraints to allow for certain methods to be used. There are also countless other programs or
activities within these strategies that could be evaluated with such methods. Finally, a number of natural
experiments arise in the form of real intrusions into key information systems. Two high-profile past
incidents linked to state-sponsored attackers include Moonlight Maze and Titan Rain (Pawlak et al. 2015).
These are the dogs that didnt bark in a sense. 22 While they did not aim at disruption, these incidents could
have caused disruption. They are thus observationally equivalent to more damaging attacks and could be
used as a part of potential control or comparison groups.
Evaluating Computer Emergency Response Teams in the EU
Among many proposed activities, the EU Cybersecurity Strategy calls for the establishment of national
computer emergency readiness teams (also sometimes referred to as computer emergency response teams)
in countries that do not already have one (e.g. Cyprus, Ireland and Poland) 23. It is estimated that each of
these teams will cost 2.5 million euros to set up 24. This creates conditions that might permit a natural or
quasi-natural experiment to evaluate the impact of this policy option.
One way to do this would be through a difference-in-differences method. This involves having two
groups, monitoring certain characteristics of these groups, then comparing the groups before and after the
policy intervention. The difference between the two periods shows the impact of the intervention (for the
treatment group) and the impact of maintaining the status quo (for the control group) 25.
This method was used by Colman, Joyce and Kaestner (2008) in a study evaluating the effect of a
parental notification law on abortion and birth rates in Texas. Shortcomings of this approach include the
difficulties in ensuring that the findings can be applied to a certain population and the difficulty in measuring
outcomes that only accrue in the long-term26. These shortcomings can be limited through adjustments to the
experimental design but should be kept front of mind in any attempt to utilize this evaluation method.
22This sentence was included in the final, published version. It is barred in this manuscript because it is not
correct. The dog that didnt bark is in fact a clue that leads to a problem (or mystery) being solved.
23
(EC 2013)
24
(ibid)
25
26
12
Benjamin Dean
If one wanted to evaluate whether the creation of a computer emergency readiness team achieves its
objectives (e.g. reducing the frequency of detected intrusions on government networks annually, reducing
the days that companies cannot earn revenue due to an information security failure, etc.), one might stagger
the creation of the computer readiness teams over time across three countries. One country would be chosen
at random to receive a computer emergency response team. The remaining two, for a certain time period,
would not receive such a team. The order in which the countries are chosen for the intervention could be
done at random. This would create a situation where one country is the treatment group and two remaining
countries are a control group. The output metrics in each of the three countries, such as number of days that
companies are disabled after incurring an information security incident, could be tracked and compared over
time to see whether or not the computer emergency readiness team has an impact on incidents as compared
with countries that do not possess one. Note that this approach does not preclude the control group from
eventually receiving a computer emergency readiness team. It just delays the implementation of the
initiative so as to determine the likely net costs or benefits of the intervention.
One could relax the randomization requirement, thereby creating a quasi-natural experiment, if needed.
This would reduce the internal validity of the evaluation but, given political constraints, might be the only
feasible option. Regardless, the evaluation would certainly yield superior results when compared to the
proposed methods in the EU impact assessment for its cybersecurity strategy, which go no further than
measuring input metrics through imprecise methods such as surveys of those interested parties who receive
the funding. Such an approach could also be used to evaluate similar initiatives in other countries 27.
Evaluating Cybersecurity Health Checks in Australia
The Australian Cybersecurity Strategy of 2016 calls for two activities to bolster the cyber defenses of
Australian companies. The first is the introduction of national voluntary Cybersecurity Governance health
checks to enable boards and senior management to better understand their cybersecurity status. These
health checks are to be modelled on the United Kingdoms FTSE 350 governance health checks. In time, it
is intended that they will be available for public and private organizations, tailored to size and sector. The
second is to provide support for small businesses to have their cybersecurity tested by certified
practitioners.
Scant detail is given on the budgets for these activities, the metrics for their monitoring, and potential
methods for evaluation. Fortunately, policy interventions such as these create the ripe conditions for natural
or quasi-natural experiments.
A cross-sectional comparison method could be used to estimate the impact of these health checks or
cybersecurity testing. Such a method simply involves separating the participants into two comparable
groups, delivering the intervention to only one of these groups, monitoring the two groups across certain
metrics, then measuring the difference between outcomes of the two groups following the intervention.
The U.S. Department of Housing and Urban Development (HUD) used such a research design to
determine the impact of a grant program to encourage resident management of low-income public housing
projects in the 1990s 28. The key part of this study was that the participants self-selected into the study by
requesting the grant; allocation was not random. This made it a quasi-natural experiment. The researchers
thus had to find a comparable group of public housing projects that did not receive the grant, track those
housing projects with the same metrics, then determine the difference between the two so as to identify the
27
E.g. in the U.S., the plan for, the Department of Homeland Security to increase the number of Federal civilian cyber
defense teams (White House 2016a).
28
(Van Ryzin 1996)
13
Benjamin Dean
treatment effect. This requirement could potentially be overcome if recipients of the grant are chosen at
random.
There will be a limited budget available for the two Australian interventions. Presumably, the companies
that wish to take advantage of these programs will have to apply for them. One way to implement this
intervention would be to open up a limited number of spots for firms and accept applications in excess of
the number of spots. A lottery could be held to determine which companies receive the intervention and
which do not. This randomly allocates companies into a control and treatment group. However, this does
not address the issue of self-selection, given that companies have come forward to apply for the program,
but this can be controlled for in the subsequent analysis. To monitor the two groups, as a condition of
applying for the intervention, applicants could be required to report on a variety of metrics regardless of
their ultimate selection for the program. This would allow monitoring and subsequent analysis of the
companies that are not chosen for the intervention (the control group). The companies that are randomly
chosen would be the treatment group. The difference between the outcomes for the two groups would be
the treatment effect.
As with the HUD Housing Project, the requirement for random allocation of intervention receipt could
be relaxed if a comparable comparison group of companies can be found. This could be accomplished
similarly by requiring applicants, regardless of their selection into the program to report certain metrics over
time. The carrot in this scheme for those who participate but do not receive the intervention will, at a point
in the future, receive it. This would introduce bias which would need to be accounted. , though this would
introduce bias into the results that would have to be corrected for 29. Such an approach could also be used to
evaluate similar initiatives in other countries 30.
Mandatory Data Breach Notification Laws in the EU and the United States
In the absence of a federal law, U.S. data breach notification laws differ from state to state. While there
are federal laws concerning the protection of consumer data, these differ across industries. This environment
creates opportunities for the conduct of natural or quasi-natural experiments.
Using a difference-in-differences methodology, one could conduct a quasi-natural experiment to

determine the impact of mandatory data breach notification laws and regulations in the U.S.
Card and Krueger (1994) used such a research design to estimate the impact of changes in the minimum
wage in New Jersey. In 1992, New Jersey increased its minimum wage from $4.25 to $5.05 per hour. To
evaluate the impact of the law on employment, the scholars surveyed fast food restaurants in New Jersey
and neighboring Pennsylvania (where the minimum wage was kept constant). Fast food restaurants in New
29
If companies know that they will receive future interventions for cybersecurity health checks or testing by certified
practitioners, they might not purchase these services when they otherwise might have and thereby expose
themselves to greater risk of information security failure. This would, in turn, influence the results because the control
group may be subject to information security failures that they would not otherwise have incurred had they purchased
the services privately. Moreover, theres a possibility that a silent graveyard, that is, companies that go bankrupt due
to an information security failure in the meantime, never end up receiving the treatment later on.
In such a scenario, when comparing the control and treatment groups, the treatment effect would be greater than it
might otherwise have been had the promise of future support not been made.
30
E.g. in the U.S., the plan for the Small Business Administration (SBA), partnering with the Federal Trade
Commission, the National Institute of Standards and Technology (NIST), and the Department of Energy, to offer
cybersecurity training to reach over 1.4 million small businesses and small business stakeholders through 68 SBA
District Offices, 9 NIST Manufacturing Extension Partnership Centers, and other regional networks (White House
2016a).
14
Benjamin Dean
Jersey were thus the treatment group and those in Pennsylvania the comparison group31. A key part of this
study is that, although the participants in each group were not randomly allocated, the two groups were
broadly comparable.
Data breach notification laws change frequently in states across the United States 32 . The changes
typically involve tightening the requirements for notification, such as the length of time before a company
must notify interested parties of a breach or the civil penalties for non-compliance. Comparing the impact
of changes to these laws would be a simple case of surveying firms in a concerned industry in one state and
surveying a similar cohort of firms in the same industry in another state where the laws are broadly similar,
then comparing the differences in outcomes between the two groups after a specified period of time.
Outcomes that could be measured the incidence of breaches on the firms themselves or the number of
affected parties due to data breaches.
Such a design could also be modified to determine the impact of new mandatory data breach notification
rules to be introduced as a part of the EU General Data Protection Regulation in 2018. The key here would
be to compare cohorts across EU countries with similar cohorts in countries where the rules will not be
implemented (e.g. Switzerland or perhaps the U.K.).
Conclusion
There are substantial benefits to program evaluation. There have been consistent calls from numerous
organizations for evaluations of cybersecurity policies to be undertaken. Yet, in spite of substantial
investment of public funds in cybersecurity strategies over the past decade, very few of the programs that
comprise these strategies have been evaluated.
Natural and quasi-natural experiments provide a cost-effective, robust, and flexible set of methods for
the evaluation of the programs and activities that comprise cybersecurity strategies worldwide. The very
fact that cybersecurity programs involve the introduction of an exogenous intervention means that, at a very
minimum, quasi-natural experiments can be feasibly and affordably undertaken.
This paper has highlighted three examples where natural or quasi-natural experiments might be used to
evaluate existing cybersecurity programs: computer emergency readiness teams in the E.U., health checks
in Australia, and data breach notification laws in the United States These are just a few examples of the
many possible options that could be pursued in the future if policymakers wish to determine the
effectiveness of their cybersecurity policy interventions and thus improve these interventions in the future.
Moreover, methods that would be conducive to the evaluation of cybersecurity policies beyond natural or
quasi-natural experiments also exist. Simulations, commonly used in war gaming, or permissioned testing
of network infrastructure, already commonly used in the form of penetration testing, also hold great
potential.
Given the present state of cybersecurity worldwide, and the strong similarities between policy
interventions across countries, there are enormous benefits to be had here. Even marginal improvements in
cybersecurity policies would lead to much-needed improvements in overall cybersecurity. Lessons learned
in one country could be readily applied to other countries where the same or similar interventions are
pursued. With so many countries presently implementing new or revised cybersecurity strategies, now is an
ideal time to begin undertaking robust evaluations of the policies that comprise these strategies using natural
or quasi-natural experiments.
31
Card and Krueger also separated their samples into restaurants that were already paying relatively high wages i.e.
>US$5. They found no impact from the raising of the minimum wage.
32
(Shen & Eiser 2014)
15
Benjamin Dean
16
Benjamin Dean
References
Abelson H., Anderson R., Bellovin S. M., Benaloh J., Blaze M., Diffie W., Gilmore J., Green M.,
Landau S., Neumann P. G., Rivest R. L., Schiller J. I., Schneier B., Specter M., Weitzner D. J. (2015), Keys
Under Doormats: Mandating insecurity by requiring government access to all data and communications,
Massachussetts Institute of Technology, Computer Science and Artificial Intelligence Laboratory, Report
number: MIT-CSAIL-TR-2015-026.
Commonwealth of Australia (2016), Australias Cybersecurity Strategy: Enabling Innovation, Growth

and Prosperity, Prime Minister and Cabinet, ISBN 978-1-925238-62-4.
Beidleman S. W. (2009), Defining and deterring cyberwar, Thesis dissertation: U.S. Army War College,
available from: http://www.dtic.mil/cgi-bin/GetTRDoc?AD=ADA500795 (accessed 25 September 2016).
Besley, T, & Case A. (2000), Unnatural Experiments? Estimating the Incidence of Endogenous
Policies., The Economic Journal 110, pp672-694.
Card & Krueger (1994), Minimum Wages and Employment: A Case Study of the Fast-Food Industry
in New Jersey and Pennsylvania, American Economic Review, pp772-793.
Colman, Joyce and Kaestner (2008), Minors' Behavioral Responses to Parental Involvement Laws:
Delaying Abortion Until Age 18, Perspectives on Sexual and Reproductive Health, Vol 41(2), pp199-126.
De Silva D. G., McComb R. P., Moh Y., Schiller A. R. & Vargas A. J. (2010), The Effect of Migration
on Wages: Evidence from a Natural Experiment, The American Economic Review, Vol 100(2), pp 321-
326.
ENISA (2014), An Evaluation Framework for National Cybersecurity Strategies,
European Commission (EC) (2013), Impact Assessment: Proposal for a Directive of the European
Parliament and of the Council Concerning measures to ensure a high level of network and information
security across the Union, Commission Staff Working Document, COM(2013) 48 final.
GAO (2009), Continued Efforts Are Needed to Protect Information Systems from Evolving Threats,
GAO-10-230T.
GAO (2013), Strategies to Facilitate Agencies Use of Evaluation in Program Management and Policy
Making, GAO-13-570.
GAO (2016), DHS Needs to Enhance Capabilities, Improve Planning, and Support Greater Adoption
of Its National Cybersecurity Protection System, GAO-16-294.
Henry K. M. (2012), Penetration Testing: Protecting Networks and Systems, IT Governance Ltd., ISBN:
9781849283731.
Lee D. S. & Lemieux T. (2013), Regression Discontinuity Designs in Social Sciences in Best H. &
Wolf C. (2015) The SAGE Handbook of Regression Analysis and Causal Inference, SAGE Publications
Ltd, ISBN: 9781446252444.
Levitt, S. D. (1997), Using Electoral Cycles in Police Hiring to Estimate the Effect of Police
on Crime. American Economic Review, 87. pp270-290.
17
Benjamin Dean
Kirk D. S. (2015), A natural experiment of the consequences of concentrating former prisoners in the
same neighborhoods, Proceedings of the National Academy of Sciences of the United States of America,
Vol 112(22), pp 69436948.
Kubik J. D. & Moran J. R. (2001), Can Policy Changes be Treated as Natural Experiments?: Evidence
from State Excise Taxes, Center for Policy Research Paper No. 39, Center for Policy Research, Maxwell
School of Citizenship and Public Affairs, Syracuse University, ISSN : 1525-3066.
OECD (2012), Cybersecurity Policy Making at a Turning Point: Analysing a new generation of national
cybersecurity strategies for the internet economy, OECD Publishing: Paris.
OECD (2011), OECD Studies of SMEs and Entrepreneurship: Thailand, OECD Publishing: Paris.
Papaconstantinou, G. and Polt, W. (1997), Policy Evaluation in Innovation and Technology: An

Overview, in Policy Evaluation in Innovation and Technology: Towards Best Practices, OECD, Paris.
Pawlak P. and Petkova (2015), State-sponsored hackers: hybrid armies?, European Union Institute for
Security Studies, January 2015, available from:
http://www.iss.europa.eu/uploads/media/Alert_5_cyber___hacktors_.pdf (accessed 25 September 2015).
Remler D. & Van Ryzin G. G. (2015), Research methods in practice: strategies for description and
causation, 2nd edition, SAGE: Los Angeles.
Sanson-Fisher R. W., Bonevski B., Green L. W. & D-Este C. (2007), Limitations of the randomized
controlled trial in evaluating population-based health interventions, American Journal of Preventative
Medicine, 33(2), pp155-162.
Shadish W. R., Cook T. D., and Campbell D. T. (2002), Experimental and quasi-experimental designs
for generalized causal inference, Boston: Houghton Mifflin.
Shen L. & Eisner R. (2014), United States: New and Proposed U.S. Data Breach Notification Laws,
Mayer Brown, available from:
http://www.mondaq.com/unitedstates/x/326416/Data+Protection+Privacy/New+and+Proposed+US+Data
+Breach+Notification+Laws (accessed 4 August 2016).
Van Ryzin, G. G. (1996), The impact of resident management on residents satisfaction with public
housing: A process analysis of quasi-experimental data, Evaluation Review, 20 (June), pp485-505.
The White House (2016a), Fact Sheet: The Cybersecurity National Action Plan, Office of the Press
Secretary, available from: https://www.whitehouse.gov/the-press-office/2016/02/09/fact-sheet-
cybersecurity-national-action-plan (accessed 5 August 2016).
The White House (2016b), The President's Budget for Fiscal Year 2017, available from:
https://www.whitehouse.gov/omb/budget/ (accessed 12 February, 2016)
18
Benjamin Dean
19

Natural and Quasi-Natural Experiments To Evaluate Cybersecurity Policies

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Natural and Quasi-Natural Experiments To Evaluate Cybersecurity Policies

Transféré par

Droits d'auteur :

Formats disponibles

Benjamin Dean

Natural and Quasi-natural Experiments to Evaluate Cybersecurity Policies

Natural and Quasi-natural Experiments to

The Journal of International Affairs

GALE-476843510; ISSN: 0022-197X

Program Evaluation: A Primer

Complicating Factors for Evaluation of National Cybersecurity Strategies

Randomized Controlled Trials

Natural and Quasi-natural Experiments

Prospective/retrospective studies, which either look forward on some phenomenon (sometimes

Table 1. Methods for natural and quasi-natural experiments

Design Description Pros Cons

Applying natural or quasi-natural experiments to cybersecurity policies

Evaluating Computer Emergency Response Teams in the EU

Evaluating Cybersecurity Health Checks in Australia

Using a difference-in-differences methodology, one could conduct a quasi-natural experiment to

Commonwealth of Australia (2016), Australias Cybersecurity Strategy: Enabling Innovation, Growth

ENISA (2014), An Evaluation Framework for National Cybersecurity Strategies,

Papaconstantinou, G. and Polt, W. (1997), Policy Evaluation in Innovation and Technology: An

Vous aimerez peut-être aussi