Vous êtes sur la page 1sur 23

AControlledHistoryofRandomizedAssignmentinSocialScience*

JulianC.Jamison
August2015

Abstract:
Althoughtheconceptofrandomizedassignmentinordertocontrolforextraneousfactorsreachesback
hundredsofyears,thefirstknownempiricaluseoccurredin1884inanexperimentonpsychophysicsby
CS Peirce. Meanwhile, the first use of a control group in order to test the effect of an intervention
occurredinmedicine,informallyin1768andthenmorecarefully(butwithoutrandomization)in1898.
Remarkably, the combination of the two a randomized control trial was first instantiated in four
differentdomainsbetween1924and1931,likelyindependently.Thesefieldswereagriculturalscience
(withthecelebratedcontributionsofRAFisher);clinicalmedicine;educationalpsychology;andperhaps
mostsurprisinglypoliticalscience,inavotingexperimentbyHFGosnell.Althoughthisapproachdidnot
immediately become popular within social science broadly, there was a resurgence of interest in the
1950sand1960sforbothsocialpolicyexperimentsandformallaboratoryexperiments.

Keywords:randomization,RCT,fieldexperiment,labexperiment,selectionbias,causality,historyof
economicthought,psychologyandeconomics
JELcodes:B16,C91,C93,C18,D04

IthankArtBoylston,DonGreen,DeanT.Jamison,DeanKarlan,ChrisLysy,JackMolyneaux,AndreasOrtmann,
LiorPachter,AlRoth,andespeciallyCharliePlottforhelpfuldiscussionsandinput.Myinterestwasfirstpiqued
whenIreadanarticlebyDruinBurchinNaturalHistory(June2013)thatmentionedvanHelmontsearly
contributiontothetopic.Theusualcaveatsapply,includingthefactthatthesearenotnecessarilytheviewsofthe
CFPBoroftheUnitedStatesgovernment.

OfficeofResearch,ConsumerFinancialProtectionBureau;emailjulison@gmail.com

Development of Western science is based on two great achievements: the


invention of the formal logical system (in Euclidean geometry) by the Greek
philosophers,andthediscoveryofthepossibilitytofindoutcausalrelationships
bysystematicexperiment(duringtheRenaissance).
AlbertEinstein(1953)

1.Introduction

The quote above is taken from Pearl (2000), a comprehensive reference on the statistics of
causality. In an entertaining history of the art and science of cause and effect, Pearl refers to the
randomizedexperimentastheonlyscientificallyprovenmethodoftestingcausalrelationsfromdata,
and to this day, the one and only causal concept permitted in mainstream statistics. Interestingly,
although Einstein dates the idea of causal experiments the question of randomization goes
unanswered by him to the Renaissance, Pearl claims that it waited upon Fisher in the 1930s. As we
shallsee,Einsteinwasmorenearlycorrect.

Einstein and Pearl agreed upon the central role of rigorous experiments in determining
causality,whichhaslongbeenunderstoodandacceptedinthephysicalandbiologicalsciencesbuthas
undergoneamorerecentriseinthesocialsciences1.Thebasicideaisstraightforward:Supposeyouwish
totesttherelativeeffectoftreatment(orintervention,broadlyconstrued)AvstreatmentBoneof
which could be a null treatment or status quo. Take a large number of subjects (individuals, schools,
firms,villages,etc.)anddividethemrandomlyintotwogroups.ThefirstgroupgetsAandthesecond
group gets B; other than that their experiences are identical. Since the division was random and the
sample size was large, we can be highly confident that the two groups started out with the same
average levels of all relevant characteristics, both observable and unobservable. Therefore any
aggregate differences between the groups measured after the experiment can be causally identified
withthecorrespondingtreatments.

Naturally there are assumptions to be made, and there are many complications that arise in
specificinstantiationsofthisapproach.Someoftherisks,costs,anddrawbackswillbediscussedbriefly
insection7below.Randomizedexperimentsarethemostcompellingformofevidenceavailable,andas
such they occupy a special, if hotly contested, position in the pantheon of science. They are not,
however,alwaystherighttoolforthejob.Imaginetheideaoftestingtheefficacyofparachutesversusa
controlgrouponmortalityrateswhendisembarkinganairplaneataheightof10000feet(3000meters)
above ground level. Not only would a randomized experiment be unethical, it would be completely
unnecessary.Thatexperimenthasneverbeenundertaken(seeSmithandPell2003forareviewofthe
literature),andyetweareconvincedthatweknowtheactualrelativeefficacyofthetwoapproaches.

ForexampleBanerjeeandDuflo(2011)enthusiasticallyespousethisapproachindevelopmenteconomics,with
manyexamples.Deaton(2010)disagreeswiththestrongconclusionsbutacknowledgesthebasicpremiseandthe
trendinimportance.

There are sometimes other equally compelling ways to acquire knowledge; in fact one of the
themesofPearlsbookoncausalityisthatinaBayesianframeworkscientistscanandoughttouseprior
beliefs, subjective judgments, and nonexperimental data in addition to rigorous experimentation in
order to draw their final conclusions as long as they are transparent about each step. Even in the
naturalsciencesnotallrigorousexperimentationinvolvesrandomization.Atypicalexamplealsodealing
with falling bodies is parameter estimation. Suppose that we have already established that all objects
fallatthesamerate(nearthesurfaceoftheEarthandinavacuum),andwewishtoestimatethisforce
ofgravitation.Thereisnoneedforrandomization,sincewearenotcomparingtwotreatmentsortwo
competingtheories;wesimplydropmanyobjectsandperformthecalculation.

Atthispointitisworthsteppingbackforamomentinordertobecarefulaboutwhatwedoand
donotmeanbyrandomizationinthepresentcontext.Wearenotreferringtostimulithatareinherently
random,suchasmonetarylotteriesorgamblesthatmaybeusedtoassesssubjectsriskattitudes.We
are also not referring to the random sampling of subjects from a larger population in order to draw
conclusions that are representative of the entire population. This can rightly be called randomization,
andittoohasanimportantplaceinsocialscience,buttherationaleandhistoryaredistinctfromthe
focus here; see Fienberg and Tanur (1987). Although this may seem quite different in purpose and
application,therehasbeensomeconfusionovertheyearsbothinthephilosophyofscienceliterature2
andinchanginghistoricalusage,asmentionedinsection5below.

Another use of randomization, more closely tied to the one we are considering, is for the
purpose of ensuring the validity of specific statistical tests. For example many tests are predicated on
certain assumptions regarding the datagenerating process which can only be satisfied, or areatleast
moreeasilysatisfied,whentherehasbeenrandomallocationintotreatments.Thisisconceptuallyquite
distinctfromthe useofrandomized assignmentforthe purposeofensuringthat eachdatapoint(i.e.
eachconstellationofsubjectplustreatment,plusanyotherrelevantattribute)isaprioricomparableto
every other data point, with no bias. We use the term randomized rather than simply random in
order to emphasize that the experimenter has consciously performed the randomization, and we use
the broad term assignment rather than e.g. allocation in order to allow for the randomization to
enter along various dimensions. Often randomization for statistical integrity will result in randomized
assignment, so in practice the two notions are highly correlated and overlapping, but they are truly
conceptually different and indeed they are far from identical in practice.3 Again to be clear:
randomization in our sense serves a deeper purpose than simply impartially dividing a sample into
subsamples,anditmayapplyevenwhencausalityisnotacentralconcern:itguaranteesusthatanytwo
observations we collect are entirely comparable in every dimension other than those we know about
andvaryinacontrolledmanner.

Urbach(1985)claimsthatfromaBayesianperspective,randomizationcanbeofnouseintestingstatistical
hypotheses;Papineau(1994)rightlyrespondsthatthatonlyholdstrueforrandomsamplingandnotfor
randomizedexperimentationofthetypeconsideredhere.
3
Athirdconceptualrationaleforrandomizationthatisalsocorrelatedinpracticewithrandomizedassignmentis
theideaoffairness:inmanycircumstanceslimitedresourcesshouldbedistributedrandomlyinordertogive
everyonethesamechance.Althoughthisisrightfullyimportantinthemedicalandsocialpolicycontexts,itis
beyondthescopeofthepresentdiscussion.Theoriginspresumablygobacktotheearliesthumans,orbeyond.


Theprimaryfocusofthispaperisrandomizedassignment,butsincenohistoryofrandomization
would be complete without mention of Ronald Fisher who is the originator and proselytizer of
statistical randomization let us pause for a moment to summarize his contributions. Fisher took a
positionin1919asstatisticianatRothamstedagriculturalresearchstation,wherehismainjobwasto
analyzethepiles(literally)ofexistingdatafrompreviousexperiments.Howeverhestartedtodevelop
theoriesofhisownabouthowtooptimallyrunexperiments,culminatinginthepublicationofhisclassic
bookonthedesignofexperiments(Fisher1935).Althoughhehadfirstadvocatedrandomization(inthe
senseofe.g.randomlyallocatingdifferentseedsorfertilizerstodifferentplotsofland)asatheoretical
conceptby1925,hisfirstempiricalpublicationthatusedrandomizationasatechniquewastwoyears
later(EdenandFisher1927).Hewasfarfromthefirstonthetheoreticalside,buthewasclosetobeing
thefirsttoapplyrandomizedtreatments(thoughnotrandomizedassignmentmorebroadly)toactual
data. Indeed the late 1920s saw several disciplines independently and nearly simultaneously put this
pathbreakingideaintopractice,asweshallexplore.

Apart from Fishers statistical contributions, his main role in our story is that he explicitly and
tirelesslyadvocatedforrigorousexperimentationandevaluation,includingrandomization,andthathe
gaveothermoreappliedresearchersthetoolsandtechniquestheyneededtomakethishappen.Given
that the basic idea of randomized allocation had arisen many years earlier, his efforts clearly made a
difference.4Infact,noteveryonelikedtheideaofrandomizationandthesubjectswereoftenplants,
so the objections were hardly ethical in nature! Fishers great statistical contemporary Gosset (who
publishedasstudentduetorestrictionsimposedbyhisemployer,theGuinnessbrewery)feltthatit
wasbettertomatchdatapointsonasmanyobservablecharacteristicsaspossible,withrandomization
simply adding unwanted and unnecessary noise to the data. For small sample sizes this may well be
true5,butFishersapproachhascarriedthedayalbeitmorequicklyinsomefieldsthaninothers.

The remainder of the paper proceeds as follows. Section 2 provides an early historical
background tracing some of the various building blocks that eventually gave rise to rigorous
experimentation.Section3brieflydiscussesthehistoryofrandomizedassignmentinclinicalmedicine,
thefieldwithwhichitismostcloselyassociated.Thenweturntosocialscienceproper,beginningwith
psychology in section 4; social policy and field experiments in Section 5; and finally experimental
economicsinSection6.Section7providesconcludingremarks.

2.Prelude

IntheBible,Proverbs18:18readsThelotcausethdisputestocease,anditdecidethbetween
themighty.Ifonly!Butitsanicethought:althoughpresumablythisreferstorandomizationasaway
to solve disputes directly rather than as a way to help determine who is actually right, rigorous
experimentationlaterhelpedtoendseveralcontroversies.Meanwhile,in theBookofDaniel(1:816),

AsRoth(1993)notes,weshouldperhapspaymoreattentiontothelastpersontoinventordiscoversomething
ratherthantothefirstpersontodoso.
5
Forinstance,seeRoss(1934)forastateoftheartapproachatthetimetomatchingpairs.

KingNebuchadnezzarwishesforDanielandhisthreefriendstoconsumetheroyalfoodandwine,but
Danieldoesnotwishtodefilehimselfthatway.TheofficialinchargeisworriedthatifhefeedsDaniel
anything lesser, Daniel will weaken and the king will be upset. So Daniel suggests a test: he and his
friendswilleatonlypulse(i.e.legumes;thismayalsohaveincludedvegetablesandfruits)anddrinkonly
water for ten days, after which the official can compare their health to those of the young men
consuming the royal fare. Naturally they ended up stronger and better nourished than the others, so
theywereallowedtocontinuewiththeiralternatediet.

This story has been suggested as the first controlled trial, and it does contain some of the
hallmarks: a control group and a predefined outcome measure. Of course one worries about
endogeneityandselectionbias,andthepowersthatbeseemtohaveatleastimplicitlyunderstoodthis,
becausealthoughtheyallowDanieletal.tocontinuewithaspartandiettheydonotactivelysuggest
that others take it up. Randomization would have solved many of these problems nicely, although
heterogeneityisalwaysanissue.Furthermore,intermsofchronology,althoughCharlesDarwindidnot
advance his theory of natural selection until the mid 19th century (Darwin 1859), Nature had
conveniently begun to experiment via randomization in the context of allopatric speciation after
vicariancetotesthistheorysomemillionsofyearsearlier6.

Itislikelythatscholarsinantiquityunderstoodthebasicideaofcomparingtwosimilargroupsin
ordertoreliablytestinterventions.However,thefirstwrittendocumentationincirculation7isPetrarch
(1364)inalettertoBoccaccio:
Isolemnlyaffirmandbelieve,ifahundredorathousandmenofthesameage,
same temperament and habits, together with the same surroundings, were
attacked at the same time by the same disease, that if one half followed the
prescriptionsofthedoctorsofthevarietyofthosepracticingatthepresentday,
andthattheotherhalftooknomedicinebutreliedonNaturesinstincts,Ihave
nodoubtastowhichhalfwouldescape.
Although there is no mention of randomization and no concrete suggestion to collect data, it is clear
that the goal was to devise two groups that were as similar as possible. It is also clear what Petrarch
thoughtofdoctors.

And so we arrive at the uncontested first surviving instance of a (theoretical) randomized


assignmentmechanism,duetoFlemishchemistandphysicianJanBaptistvanHelmont.Everyoneatthe
time,includingvanHelmont,believedthatbloodlettingwasafantasticcureformostailments.However
hebelievedthatevacuation(i.e.inducingvomitinganddefecation)wasanevenbetterapproach,andhe
proposedasimplewaytosettletheargumentonceandforall:
LetustakeoutoftheHospitals,outoftheCamps,orfromelsewhere,200or500
poorPeople,thathaveFevers,Pleurisies,etc.Letusdividetheminhalfes,letus

6
7

LiorPachterbroughtthisprehistoricexampletomyattention.
Thanksgotohttp://www.jameslindlibrary.org,whichisawonderfulresourcefortherelevantmedicalhistory.

castlots,thatonehalfofthemmayfalltomyshare,andtheothertoyours;I
willcurethemwithoutbloodlettingweshallseehowmanyFuneralsbothofus
shallhave.
Forbetterorworse,thereisnoevidencethatthistestwaseverputintopractice,buttheidea(ifnot
quite the ethics) is up tomodern standards8.Unfortunatelyitwould notberesurrectedforcenturies.
When was this written? Nobody knows precisely. Most articles cite van Helmont (1662), but that is
simplythefirstEnglishtranslation(fromwhichtheabovequoteistaken)oftheoriginalLatinpublication
(vanHelmont1648).Eventhatisclearlytoolate,sincevanHelmontdiedin1644;someofhiswritings
were controversial, so the corpus did not see the light of day until his son brought them out
posthumously.Thebestguessforthishistoricalmomentappearstobesometimeinthe1630s.

DespitevanHelmontsmistaken(buttypical)viewsonclinicalpractice,hewasaninquisitiveand
thoughtful researcher, a Renaissance man befitting Einsteins quote above. This will be a theme for
manyofthosewhointersecttheoriginsofrandomization,suggestingthateachsuccessivedevelopment
wasnotnearlyassimpleasitappearsinretrospect.Alongthoselines,weproceedbymentioningtwo
morenotablesinthehistoryofclinicaltrials,albeitunrandomized.

James Lind was a Scottish naval surgeon (i.e. doctor) who was an early believer in the theory
thatcitrusfruitscouldhelpcurescurvy,whichisindeedcausedbyadeficiencyofvitaminC.Heprovided
a partial test of this claim on a voyage in 1747 (published in Lind 1753), when he divided 12 afflicted
sailorsintosixpairsandgaveeachpairadifferenttreatmentoneofwhichwastwoorangesandone
lemon daily9. He made a point of the fact that the men were similar to begin with and were treated
identicallyinallwaysapartfromtheexperimentalvariation:
TheircaseswereassimilarasIcouldhavethem.Theyallingeneralhadputrid
gums,thespotsandlassitude,withweaknessoftheirknees.Theylaytogetherin
one place, being a proper apartment of the sick in the forehold; and had one
dietcommontoall,viz.watergruelsweetenedwithsugarinthemorning;fresh
muttonbrothoftentimesfordinner;atothertimespuddings,boiledbiscuitwith
sugar,etc.;andforsupper,barleyandraisins,riceandcurrants,sagoandwine,
orthelike.
WhileLinddidnotincludeanuntreatedcontrolgroup,Watson(1768)didexactlythatinastudy
ofsmallpoxinoculation:asheputit,itwasproperalsotobeinformedofwhatnatureunassisted,not
tosayundisturbed,woulddoforherself.Althoughbothmenexplicitlyattemptedtoperformtheirtests
on a homogeneous population, as well as to maintain parity apart from the treatments of interest,
neitherofthemsuggestsrandomization.

Itsnotperfectbecauseheappearstosuggestthatthepatientsfirstbedividedintotwogroups,without
specifyinghowthatistobedone,afterwhichrandomizationoccurs.Hemaywellimplicitlybeimaginingarandom,
oratleastasystematic,approachtothedivision.Butevenifnot,itconstitutesrandomizingatwhatisnowreferred
toastheclusterleveladmittedlylowpoweredinthiscase,butstillmorerigorousthanmanymodernpapers.
9
Othertreatmentarmsincludedseawater,sulfuricacid,andspicypasteplusbarleywater.Thesedidnotprove
effective.


Finally, it is worth mentioning a somewhat flamboyant experiment performed by famous
microbiologist Louis Pasteur in 1881. He was attempting to publicly prove that he had developed an
animal anthrax vaccine, so he asked for 60 sheep and split them into three groups: 10 would be left
entirelyalone;25wouldbegivenhisvaccineandthenexposedtoadeadlystrainofthedisease;and25
wouldbeuntreatedbutalsoexposedtothevirus.Itisunclearwhethersheephavemoreorlessnatural
variationthanFishersplotsofland,butthereisnomentionofrandomizationorselectionbiasinthe
paper(Pasteur1881).Perhapsthiswasnotamajorissuegiventhestarkresults:alloftheexposedbut
untreatedsheepdied,whileallofthevaccinatedsheepsurvivedhealthily.

3.Medicine

ManypeopleassociatetheRCT(randomizedcontroltrial)withmedicine,whereithascometo
be viewed as the gold standard10. As described above, it was primarily clinicians who took the first
steps in this direction, and by the 18th century they had already devised most of the necessary
ingredientsbuthadyettoputthemalltogetherinoneserving.Thenextstepintheprocesswastaken
by Fibiger (1898), a young Danish doctor who was studying diphtheria11. He too understood the
importance of avoiding any sort of imbalance or selection bias between treatment groups, and his
approachwastoalternatetreatmentsdependingonthedaythatthepatientarrivedatthehospital.As
hewrote:
Inmanycasesatrustworthyverdictcanonlybereachedwhenalargenumberof
randomly selected patientsaretreatedwiththenewremedyand,at thesame
time, an equally large number of randomly selected patients are treated as
usual.12
Of course alternation does not technically produce random selection, as it is subject to strategic or
accidentalbiases,e.g.doctorssuggestingthatcertainpatientsbeadmittedonaparticularday.Inthis
case not only was the underlying idea there (indeed Fibiger appears to have believed that he was
randomizing),buttheresultantoutcomewasalmostcertainlyrandomgrouping.

Three decades later, Colebrook (1929)13 used drawing lots to decide which kids would get
irradiated (its not as bad as it sounds), but if the parents refused consent then those children were
addedtothecontrolgroup,whichundoesmuchofthepointofrandomization.Theauthoralsonotes
that subjects in the treatment group, knowing that they would have to strip to receive therapy, were
also more likely to bathe at home the night before a classic instance of confounding. Shortly
thereafter,Ambersonetal.(1931)dividedtuberculosispatientsintotwogroupsandacoinwasflipped

10

Thistermwasapparentlyborrowedfrommonetaryeconomics,whereitreferstotheactualmetalgold.The
analogyisthatbothapproachesdescribethebestmeasureormethodofcomparisonavailableatagiventime.See
Claassen(2005).
11
FibigerwontheNobelPrizein1927forhisworkoncancer,continuingourthemeofillustriouscontributors.
12
TranslationanddiscussionmaybefoundinHrbjartssonetal.(1998).
13
Colebrookissadlyunusualinourlistofpioneersonaccountofhergender.

todeterminewhichonegotthenewtreatment.Sincethiswasnotattheindividuallevel,theoutcome
inpracticemaywellhavebeenlessvalidthanFibiger(1898).

Itseems,therefore,thatthefirstfullyrigorousindividuallevelRCTinmedicineisDoull(1931),a
studyoftheeffectofultravioletlightonthecommoncold.DoullworkedattheJohnsHopkinsSchoolof
PublicHealthandneededtofigureouthowtoallocatehissubjectsintothreegroupsinamannerthat
would allow for valid comparisons and analysis. According to Marks (2008), he consulted with a local
biostatisticianwithadoctorateinmathematics,whosuggestedusingcoloreddicetorandomlyallocate
thepatients.Notethesimilartimingforearlyrandomizedevaluationsinclinicalmedicine(19291931)
asinagriculture(EdenandFisher1927).

The final piece of the medical puzzle falls into place with the famous streptomycin trial for
tuberculosis (Medical Research Council 1948)14. This is probably the most famous RCT in history, and
manypeoplehaveerroneouslyclaimedthatitwasinfactthefirstRCTinhistory.Thedesignforitwas
the brainchild of Austin Bradford Hill, whose degree was in economics (earned while recovering from
tuberculosis himself) but who worked as a biostatistician and epidemiologist15. In addition to the
important step of highlighting the need for randomization and of promoting it he later wrote down
influentialformalcriteriaforimputingcausalityBradfordHilldidintroduceonekeyaspectinthe1948
paper:theexplicitideaofalsousingrandomizationtoconsciouslyconcealforeknowledge,i.e.toblind
theexperimentertotreatmentstatuswheneverpossible.SeeChalmers(2001)forfurtherdiscussionof
thispoint,althoughnotethatasweshallseeinthenextsectionthisideahadalreadyexistedforseveral
decadesinadifferentcontext.

4.Psychology

Humansensationwasformostofhistorynotconsideredadomainsusceptibletoquantitative
scientificanalysis.ThatbegantochangewiththeworkofGustavFechnerinthemid19thcentury,who
initiatedthefieldofpsychophysics(Fechner1860)alongwithErnstWeber.InparticularFechnerstudied
sensitivityofphysicalperception:e.g.howfinelycanasubjectdistinguishtwomasses,asafunctionof
the base weight and the marginal difference between them? Although he deserves much credit for
introducing concepts such as empirical experimentation and mathematical data analysis to this entire
field,hismethodswerefarfromperfect.InparticularFechnerexperimentedonhimself;forexamplein
the perception experiments he knew all the relative weights in advance. He believed that he could
consciouslycontrolforanyresultingbias.

Mller (1879) took the next step, splitting the roles of subject and experimenter. He
concurrentlyemphasizedthenotionofpresentingstimuliinanirregularorder(inbuntemWechsel;see

14

TheexperimentsforwhoopingcoughreportedinMedicalResearchCouncil(1951),althoughpublishedthree
yearslater,usedanexactlyanalogousexperimentalapproachandwereactuallybegunseveralmonthsearlierthan
thoseintheclassicpaper.
15
BradfordHilllaterearnedfameforhis[nonrandomized]workexhibitingalinkbetweensmokingandlung
cancer;hewasknightedin1961.

Dehue 1997), but neither he nor Fechner employed randomization although Mller did eventually
start to promote the use of explicit randomization around the turn of the century. Meanwhile
randomizationwasusedbyRichet(1884)butonlyasaninherentcomponentofthestimulusitself.This
isbecausehewastestingtelepathy,atopicthatwasalltherageinEuropeatthetimeandwhichwas
eminently suitable for rigorous evaluation.16 Randomly chosen playing cards were studied intently by
one person, who tried to mentally pass the information to another. Thus, the randomization was not
carried out in order to compare different treatments (indeed nobody hypothesized varying levels of
successdependingonthespecificcarddrawn),althoughitwasofanentirelyappropriateforthetaskat
hand.

We turn now to one of the main protagonists in our drama, Charles S. Peirce. According to
Stigler(1992),Peircewaseducatedathomebyhisfather,amathematicsprofessoratHarvard.Hewas
ambidextrousandhadthehabitofwritingquestionswithhislefthandwhilewritingtheanswerswith
hisrighthand.ByDecember1883,whenhebegantheseriesofexperimentsdescribedbelow,hewason
thefacultyatJohnsHopkins,wherehewasprimarilyknownasaphilosopherbutalsoworkedinphysics,
mathematics,cartographyandpsychology.

Fechnerhadpostulatedthatforanygivenbaseweight,therewasaminimumadditionalweight
below which it was impossible to perceive any difference, i.e. where the two felt exactly the same.
Peircedisagreed,believingthatevenforverysmalldifferences,ifsubjectswereforcedtochoosewhich
onetheythoughtwasheavier17,theywouldbecorrectslightlymoreoftenthantheywerewrong.Along
withastudentofhisnamedJastrow,heproceededtotesthishypothesisinaseriesofexperimentsin
1884. They took turns as experimenter and subject, with the experimenter drawing playing cards to
determine which weight came first on any given trial: if red the base weight came first; if black the
supplementedweightwasfirst.AsPeirceandJastrow(1885)noteintheirpaper:
A slight disadvantage in this mode of proceeding arises from the long runs of
oneparticularkindofchange,whichwouldoccasionallybeproducedbychance
andwouldtendtoconfusethemindofthesubject.Butitseemsclearthatthis
disadvantage was less than that which would have been occasioned by his
knowingthattherewouldbenosuchlongrunsifanymeanshadbeentakento
preventthem.
ThisispreciselythetypeofconcernthatFisherandGossetwouldargueaboutalmost50yearslaterina
very different context: the tradeoff between either forcing regularity where possible and thereby

16

Hacking(1988)providesilluminatinghistoricaldetailsonthisdevelopment.Asfarasresultsgo,Richetwasthe
firstofmanyauthorsnottofindevidenceforsupernaturalpowers.
17
Forcedchoicewasaninnovationalongwithrandomization,albeitnotasmomentous.Additionally,subjects
wereaskedtoexpressconfidenceintheirchoiceonascaleof03,whichwasafurtherinnovationthatisstill
underutilizedtoday.

reducing noise, versus using randomization to equalize absolutely everything but only in expectation.
Westillworryaboutsuchthingstoday.18

Wasthisanexampleofrandomizedallocationintotreatmentandcontrolgroups?Clearlynot.
Forsetlund et al. (2007) argue that Peirces randomization served only to blind the subject and not to
assesstheeffectofaninterventiononanoutcome.But thisseemslikeafalsedichotomy:Peircewas
randomizing notmerelytoblindthesubject(asRichethad)but alsotoallowforcomparisonsoflike
withlike,inthephrasefavoredbyChalmers(2001).Becauseofthestructureoftheexperiment,there
were two possible conditions (base weight first or supplemented weight first), and Peirce wanted to
ensure that the two corresponding sets of observations were identical on average. This certainly
requiredthesubjectnottoknowwhichonecamefirst;butevenifthesubjectdidntknow,itcouldhave
beenthecasethatoneconditionwassystematicallydifferentfromtheother(e.g.maybeitiseasierto
perceive increasing than decreasing weights) or that there is differential learning over time.
Randomization solves this problem neatly in a way that no deterministic ordering, however carefully
balancedandthoughtout,cando.19
Speaking of experiments, consider the following thought experiment: take a large number of
subjects,randomlydividethemintotwogroups,andaskeachsubjecttorespondtoonlyonestimulusof
the type above. One group would (unknowingly) receive the base weight first, while the other group
wouldreceivethesupplementedweightfirstnotbecausetheexperimenterwasnecessarilyinterested
in directly comparing the two sequencings, but in order to be sure that the sequencing itself did not
affecttheestimateofinterest.Instead,foravarietyofnaturalreasons,Peircemaintainedcomparability
acrossobservationsbyrandomizingoverthestimuliratherthanrandomizingoverthesubjects.20Thisis
partoftherationaleforusingthetermrandomizedassignmentinthispaperratherthanrandomized
allocation, and Peirce therefore seems to have implemented the first documented instance of
randomizedassignment.Hemaynothavebeenconsciousofallthenuances,butfromhisbackground
and description it is clear that he understood the power of randomization; unfortunately it did not
immediately catch on with others. Peirces hypothesis was confirmed in the data, and the paper
concludesinpart(withacertaindegreeofextrapolationrequired):
Th[is] general fact has highly important practical bearings, since it gives new
reason for believing that we gather what is passing in one anothers minds in

18

ProminentbehavioraleconomistMatthewRabinhasevensuggestedthatsometimesitmightbebetternotto
randomizeoverlongsequences,preciselyinordertoconvincetypicalsubjectsthatthesequenceisrandom,since
mostpeopledonotexpectlongrunsinnature.SeeJamisonetal.(2008)forrelateddiscussion.
19
OnecouldarguethatRichetsexperimenthadsimilarfeatures,butintentionsmatter.Richetusedrandomization
withinthestimulibecausetherewasquiteliterallynoalternative,whereasPeircewasbreakingwiththestandard
protocolofFechnerandevenMller.Heconsciouslyintroducedrandomizationinordertobecertainthat
everythingotherthanwhathewasinterestedinstudyingwouldbecontrolledfor,justasFisherdidmanyyears
later.
20
InmodernparlancePeircewasessentiallyutilizingawithinsubjectsdesign(admittedlywithapaucityof
subjects),wheretheorderoftreatmentapplicationisrandomized,ratherthanabetweensubjectsdesign.The
latterisoftentheonlyfeasiblechoiceinclinicalandfieldexperiments,butbotharecommonlyusedinlab
experiments.

large measure from sensations so faint that we are not fairly aware of having
them, and can give no account of how we reach our conclusions about such
matters.

5.FieldExperiments

Earlyeffortstoapplyexperimentaltechniquesinappliedsettingsoutsidethelabalsolaywith
psychologists,althoughinthiscaseitwaseducationalpsychologyattheforefront.Startingaroundthe
turnofthecenturythereweremanystudiesoflearninginclassrooms,andabookbyMcCall(1923)on
experimental design in education highlights randomization as a particularly efficient approach for
avoiding selection bias and other spurious influences. However, no empirical studies from that era
involvingactualrandomizationhavebeenfound;allextantsourcesareeithersilentonthematteroruse
some form of matching to create a control group for comparison, which was itself an important
innovation. Somewhat later Remmers (1928) used alternation ( la Fibiger (1898) in Section 3) as a
technique,andfinallyWalters(1931)appearstobethefirsttorandomizeinthisdomain.Inauniversity
studyontheeffectivenessofusingolderstudentstocounselandmentoryoungerones,hewrites:The
220delinquentfreshmenweredividedintotwogroupsbyrandomsampling.Amoremodernbutstill
impressivelyearly,especiallyforbeingbothcarefulandlarge,randomizedinterventionineducationis
thePerrypreschoolprogram,whichbeganin1962andisstillthesubjectofactiveresearch.

AmajorandfascinatingearlyexperimentinindustrialpsychologytookplaceattheHawthorne
factoryoftheWesternElectricCompany,nearChicago.Fromthemid1920stotheearly1930s,various
environmentalfactors(suchaslightinglevel)weresystematicallynotrandomlyvariedandanalyzed
intermsofproductivity.Inanelegantlytitledbook,Mayo(1933)reportssomeearlyresults,whichare
often described as consisting of increases in productivity every time an external factor is varied
whatever the nature of the change! This pattern has been interpreted as arising from the novelty of
beingstudied,whichisnowreferredtoastheHawthorne(orobserver)effect,althoughreanalysisof
theoriginaldatacastsdoubtonwhetherthatconclusionwasaccuratefortheoriginalexperiments21.

Turningtosocialpolicyexperiments,asurprisingcandidateforthepositionoffirstRCTinhistory
comes from the field of political science. Leading up to the US presidential election of 1924, Harold
Gosnell worked on a project whose goal was to increase voting rates in Chicago. The primary
intervention was a mailed postcard describing the necessity of registration prior to voting, and the
resultswereencouraging.However,therehavebeenconflictingopinionsinthescholarlyliteratureasto
whether he used randomization to achieve those results. In the full report (Gosnell 1927), he himself
writes:
The second step in the process of sampling was the division of the citizens in
each of the districts canvassed into two groups, one of which was to be
experimented upon while the other was not. It was assumed that the non

21

SeeListandRasul(2011)fordiscussion.

experimental groups could be used as a sort of control. [] In order to avoid


possiblecontactsbetweentheexperimentalandthecontrolgroups,thedividing
linesbetweenthetwogroupswereassharplydrawnaspossible.
Thisstronglysuggeststhateachofthe12districtswherethestudywascarriedoutwasdividedintotwo
parts, one of which was somehow chosen as treatment and one as control. Forsetlund et al. (2007)
acknowledge that Gosnell mentioned using random sampling as a method to control for non
experimentalvariables,buttheyconcludefromthedescriptionabove(andfromthelackofanyexplicit
affirmativediscussionofhowrandomizationwasintroduced)thatrandomallocationisveryunlikelyto
havebeenusedtocreatethecomparisongroups.Indeedthereisnoproof,butthatconclusionseems
overly pessimistic. In particular, they and others may have been unaware of Gosnells original short
reportontheproject(Gosnell1926),inwhichhestates:
In order to set up this experiment it was necessary to keep constant, within
reasonablelimits,allthefactorsthatenterintotheelectoralprocessexceptthe
particularstimuliwhichweretobetested.[]Themethodofrandomsampling
wasusedtocontrolthesefactorsduringthetestingoftheparticularstimuliused
intheexperiment.
Althoughthephraserandomsamplingrefersinmodernparlancetochoosingarepresentativesubset
ofapopulation,whichisverydistinctfromrandomizedallocation,thiswasnottrueatthetime.There
are multiple examples of random sampling being used in the context of randomly dividing a group of
subjectsintotwoparts,includingWalters(1931)discussedabove.IndeeditisclearfromGosnellsown
descriptionthathewasnotreferringtosamplinginthemodernsense:Specialeffortsweremadetolist
alltheeligiblevotersintheseareas.(Gosnell1926)Themostlikelyconclusion,albeitcircumstantial,is
that Gosnell did indeed randomize but at the cluster level, i.e. within each district, in order to
determinewhichhalfofthematchedpairwouldreceivetheinterventionandwhichwouldnot.

The remainder of the study is remarkably forwardlooking as well. Gosnell, for whom an
important prize for excellence in political methodology is named, checked for baseline balance across
treatmentandcontrolbyusingthepercentagedistributionsofpotentiallyrelevantdemographicand
other observable characteristics. He performed multiple heterogeneity analyses of the data. And he
caredaboutthepolicyimplications,inparticularunderstandingthatitwasimportanttosendthecard
notjustinEnglishbutalsoinPolish,Czech,andItalian.Aproposofmoderndebates,withintheresults
section we find this: In other words, the nonpartisan getoutthevote canvass had great influence
uponthenegroesandtheforeignborncitizenswhocouldnotreadEnglish.(Gosnell1926)

Fromawiderperspective,twopointsbecomeclear.Firstisthatsomethingwasintheairinthe
midtolate 1920s, given the first randomized control studies we have seen in agriculture, medicine,
educational psychology, and (almost certainly) political fieldwork. Although these were most likely
formally independent of one another, the intellectual milieu must have been fertile for this particular
innovationinexperimentaldesign.Secondisthedivergenceinpathsafterward:whereasagricultureand
medicineforthemostpartacceptedthetechniqueandsawitbecomethegoldstandardforresearch

(albeit never without controversy or detractors), social science did not follow the same path. Only
recentlyhasrandomizedassignmentflourishedinfieldexperiments,althoughlabexperimentsinboth
psychology (as we saw in Section 4) and in economics (as we shall see in Section 6) were faster to
incorporatethisidea.

Withinpoliticalscience,Gosnellhimselfdidnotpursuethismethodologicalapproach.Eldersveld
(1956) explicitly randomizes in a similar getoutthevote experiment, which still deserves credit for
beinganearlyadoptionofRCTsinthepolicyspace,butitdidnotreallybecomepopularormainstream
inpoliticalscienceuntiltheturnofthe21stcentury(seeGreenandGerber2003).Notethatsomelab
experimentalists(e.g.FiorinaandPlott1978)hadstudiedpoliticalissuessuchasmajorityrulerelatively
earlyinthisarc,usingrandomassignmentacrossconditionsandevenwithinpositionsonacommittee.

Many attempts have been made to analyze the development of rigorous experimentation in
socialpolicy22,andsomeofthisworkpointstorandomizedevaluationsgoingbackwellintothefirsthalf
ofthe20thcentury.Unfortunately,aswiththefieldofeducationalpsychology,mostsuchclaimsturnout
to be incorrect (typically involving instead careful but nonrandomized choice of the control group) or
simplyunverifiable.
ThefirstclearlyrandomizedsocialexperimentwastheCambridgeSomervilleyouthstudy.This
was devised by Richard Clarke Cabot, a physician and pioneer in advancing the field of social work.
Running from 194245, the study randomized approximately 500 young boys who were at risk for
delinquency into either a control group or a treatment group, the latter receiving counseling, medical
treatment, and tutoring. Results (Powers and Witmer 1951) were highly disappointing, with no
differences reported. Sociology and criminology continued to be early adopters in the use of random
experimentation,withstudiesbyReimerandWarren(1957)onparolecaseloadlevels,byHansonand
Marks (1958) on interviewer accuracy in the 1950 US Census, and by Ares (1963) on the largescale
Manhattanbailproject.
AnothercreativeandearlyuseofrandomizationwascarriedoutbyMahalanobis(1946)inIndia
while surveying factory workers. He had five enumerators and five areas from which he desired to
obtaindata.Insteadofassigningoneenumeratortoeacharea,aswouldhavebeennatural,hedivided
each area into five independent random samples and thus had each enumerator work in every area.
Thisisaniceexampleofembeddingexperimentaldesignintosurveydesign23,andinthiscaseitallowed
himnotonlytogetlessnoisyandmoreconsistentestimatesoftherelevantconditionsineachareabut
alsotopotentiallycompareenumeratoreffects.
Meanwhilepublichealthfollowedmedicinesleadandoccasionallyusedrandomizationevenin
largescaleinterventions.AnoteworthyearlyexamplewastestingtheeffectivenessofJonasSalkspolio
vaccine in the early 1950s, when there was a debate about whether to implement comprehensive

22

SeeforexampleLogan(1973),Boruchetal.(1978),Farrington(1983),Oakley(2000),andGreenbergand
Shroder(2004).
23
TheintersectingdevelopmentofrandomizedassignmentandrandomsamplingisdiscussedinFienbergand
Tanur(1987),fromwhichthisexampleistaken.

vaccination. In order to evaluate its efficacy (and its potential risks), given that the disease had a
relativelylowincidencerate(especiallyforsocalledparalyticpolio),alargesamplewasneeded:inthis
caseoveramillionchildren.Somelocalhealthdepartmentswerehesitanttorandomizeandpreferred
anapproachinwhichallsecondgraderswouldbevaccinated,withthefirstandthirdgradersservingas
controls. Other health departments felt that this would not be sufficiently rigorous and therefore not
sufficientlycompelling,therebywastingallthemoneyandeffortspent,sotheypreferredarandomized
(doubleblind)placeboapproach.Intheendabouthalfoftheparticipantsendedupusingeachmethod,
whichillustratessomecommondifficultiesofusingRCTsinthefield.Results(seeFrancis1955butalso
Meier 1972 for a broader perspective) were highly encouraging, and polio vaccination has been
standardeversince24.Unfortunatelyevenconvincingevidenceofaneffectiveandinexpensivevaccine
hasnotledtotheeradicationofpolio.Now60yearsafterthetrialabove,Nigeriamayhaveseenitslast
caseofthediseaseleavingonlyAfghanistanandPakistanwithendemicareas.
Another very impressive early randomized experiment in the realm of public health involved
familyplanninginTaiwan(seePopulationCouncil1963fortheexperimentaldesignandTakeshita1964
for results). The city of Taichung was divided into three roughly matched sectors, each of which
consistedofhundredsofneighborhoods(of2530families).Individualneighborhoodswererandomized
into either a control treatment; a treatment involving information by mail only; or one of two more
exhaustive treatments (which included group meetings and personalized home visits), either with the
wifeonlyorwithbothspouses.Therelevanceofthesectorsisthatthepercentagerandomizedintothe
exhaustivetreatmentsdifferedacrosssectors,from20%upto50%.Thisallowedtheresearcherstolook
at intensity of treatment and to examine what they called circulation effects (i.e. spillovers), an
extraordinarilysophisticatedprotocolforthetime.Aslightlylaterfamilyplanningexperiment(Changet
al. 1972), also in Taiwan, randomized ten experimental counties in which field workers received a
monetary bonus for every woman who accepted birth control25, versus ten control counties. Testing
marginalfinancialincentivesforhealthorsimilarworkersisnowonceagainanactiveareaofresearch.
Economics, on the other hand, lagged somewhat behind most of these other disciplines.
HeatherRoss,anMITgraduatestudentatthetime,initiatedinthe1960swhatisconsideredtobethe
first field experiment in economics. She proposed to study the effects of a negative income tax (i.e.
phased income supplementation by the government for very low incomes) in what became the New
JerseyIncomeMaintenanceExperiment.Theexperimentrandomized,atthehouseholdlevel,boththe
levelofguaranteedminimumincomeandthe(negative)taxrate.Ross(1970)findslittleevidenceofa
concomitantreductioninlaborsupply,althoughlateranalysisofthedatasuggeststhatitdoesinfact
exist.Thisprojectwasfollowedbyseveralothermajorrandomizedsocialexperimentsineconomics,for
instanceBrookeetal.(1983)comparingoutcomesoffreeasopposedtomerelylowcosthealthcarein
theRANDHealthInsuranceExperiment.Butenthusiasm(orperseverance)wanedoverall.
The one arena in which economists can perhaps claim to have been at the forefront of
randomization,andwhichcontinuestobeoneofthemostfruitfulareasofapplication,isinthefieldof

24
25

Modernvaccines,however,uselivevirusasopposedtoSalkskilledvirus.
$0.50ifpillsorcondoms;$2.50ifloop(IUD).

internationaldevelopment.TheRadioMathematicsProjectinNicaraguabeganin1974asaneffortto
study the efficacy of teaching math skills via radio, initiated by education economists. The first
publicationfromthisstudy,comparingtestscoresandfindinggenerallypositiveeffects,wasSearleetal.
(1978). A later paper vividly highlighted the importance of randomized evaluation, this time in the
context of students repeating school years, by showing that the full (rigorous) results ran contrary to
early results reported using only the original pilot data which had not been randomized; see Jamison
(1980).

6.ExperimentalEconomics

Unlike their more traditional brethren, laboratory experimentalists in economics took to


randomization very quickly, as had their counterparts in psychology. Although somewhat late to the
game in the grand scheme of things, these researchers tended to be deeply careful about their
hypothesesandassumptions,whichledtomultipledistinctusesofrandomizationsomebutnotallof
which can be termed randomized assignment. In addition, like some of the early agricultural
experimenters,theytendedtofocusontheroleoftheoryintheirmodelsandanalysis;sometimesthere
wasnoneedforacontrolgroupbecausetheoreticalpredictionsprovidedthepointofcomparison.

Chamberlin (1948) is often considered the first laboratory experiment in economics, although
Roth(1993)pointstoanevenearlierpaperbyThurstone(1931),publishedinapsychologyjournal,in
which indifference curves were studied by asking subjects hypothetical questions about consumption
tradeoffsbetweeneverydaygoods.Thurstonedidnotrandomize,becausehiscentralgoalwastotest
for withinsubject consistency of choices26. Similarly, Mosteller and Nogee (1951) estimated utility
functions for individuals from observing decisions between risky gambles (lotteries) involving real
money.Outcomeswererandom,althoughnotinthesenseofrandomizedassignment;thiswastypical
of individual choice experiments where there was no particular concern about selection and no
comparisonofconditions.However,wefindacreativeandearlyuseofrandomizationinDavidsonetal.
(1955):inthecontextofmeasuringutilitiesandsubjectiveprobabilities,theymadetheirowndicewith
nonsensesyllables(suchasZEJ)onwhichsubjectswereaskedtobet.Inordertobeabsolutelycertain
thattheresultswerentdrivenbypeoplechoosingonthebasisofe.g.innatepreferenceorfamiliarity
foraparticularsequenceofletters,thechoiceofwinningnonsensesyllablewasrandomized.Thisis
preciselytheideaofrandomizationinordertocontrolforunobservablefactors.

Meanwhile Chamberlin (1948) reported on a market experiment with demand and supply
curves induced by assigning separate values to individuals who served as either buyers or sellers.
Implicitinhisprocedurewasthatthiswasdonerandomly;Smith(1962)reportsonaseriesofmarket
experimentsfromthelate1950sinwhichtheseparationisexplicit:Thegroupofsubjectsisdividedat
random into two subgroups, a group of buyers and a group of sellers. This certainly constitutes

26

RousseasandHart(1951),whoimproveduponThurstonesdesigninpartbyusingfoodastheobjectof
decisionsandrequiringsubjectstoconsumewhatevertheychose,diduserandomizationinonerespect:Soasto
avoidthepossibilityofproductdifferentiation,eachindividualwastoldthathiseggsweretobescrambled.

randomized assignment, but note that the purpose was not to compare buyers against sellers or to
avoid selection bias. In many ways it is reminiscent of Peirce and Jastrow (1885): randomization is
consciously used to control for any potential bias or asymmetry, including on the part of the
experimenter,butitisnotusedtospecificallycomparetreatmentsorinterventions.

Thethirdmajortopicwithinearlyexperimentaleconomics,inadditiontoindividualchoiceand
competitive markets, was game theory: models of strategic interaction. Kalisch et al. (1952) studied
multiplayer games of cooperation, comparing the predictive ability of various equilibrium solution
concepts.Theywereinterestedinoneshotgamesratherthantheeffectsofrepeatedcoalitions,sothey
rotatedtheplayersaftereachtrial;thiswasntquiterandomizationbutitservedarelatedpurpose.In
termsofdisciplinarybackground,thiswasacollaborationofmathematiciansturnedgametheorists.A
few years later Atkinson and Suppes (1957), also not economists by training27, analyzed different
learning models in twoperson zerosum games, and they explicitly randomly assigned pairs of
subjectsintooneofthreedifferenttreatmentgroups.Thisistheearliestinstanceofrandomassignment
inexperimentaleconomics,forpurposesofcomparingtreatments,thathasbeenfoundtodate.

The mix of disciplines in the early years of experimental economics was broad and clearly
invigorating. In addition to mathematicians and philosophers (with both Davidson and Suppes in the
latter camp) bringing experience in mathematical decision theory, there were importantly the
psychologists such as Atkinson and especially Sidney Siegel, a coauthor in the Davidson et al. (1955)
paper. Economics was more often interested in testing the implications and predictions of specific
theories,whichdoesnotnecessarilyrequireanycomparisonatall,orincomparingandcontrastingthe
fidelity of various theories to data. In order to optimally organize all these experiments, there were a
large number of methodological procedures that came from psychology. Siegel was a proponent of
many of them, although with no special focus on randomization, and he worked hard to make these
new techniques available to the world of economics, including a fruitful collaboration with economist
LawrenceFourakeronstudiesofbargainingandcooperation(SiegelandFouraker1960).

Although Siegel and others were publishing in psychology journals, most of the economics
papers discussed here ended up as unpublished manuscripts or book chapters. Chamberlin (1948)
appearedinaneconomicsjournal,butdoesnotexplicitlymentionrandomization.Ontheotherhand,
Smith (1962) is in an economics journal, discusses randomized assignment, and became highly
influentialinthedevelopmentofthefield.28AlthoughSmithalwaysgavemuchgeneralmethodological
credittoSiegel(seeSmith2008),whounfortunatelydiedprematurely,itisnotclearwhetherthenotion
ofrandomizedassignmentwasdirectlyborrowedfrompsychologyorwasinstitutedindependentlyasa

27

Remarkably,PatrickSuppeswasalsoacoauthorinSearleetal.(1978),thefirstRCTindevelopmenteconomics,
whichwasdiscussedpreviously;hewasalsoacoauthorintheDavidson(1955)articlementionedjustabove.
Suppeswasananalyticphilosopherwhoworkedinfieldsasdiverseasquantummechanics,decisiontheory,and
psychology.
28
SuppesandCarlsmith(1962)cameoutslightlyearlierthatyear,albeitinalesswidelyreadeconomicsjournal,
andalsoexplicitlyrandomizedsubjectsintooneoftwoexperimentalgroups.Partlybecausethetopicofthatpaper
andrelatedonesabovedidnotflourishtothesameextent,andpartlybecausetheauthorswentontootherwork,
ithasnothadthesameimpactwithinexperimentaleconomicsastheoeuvreofSmith,whowentontowina
NobelPrizeforhiscontributionsinthisarea.

natural reaction to the environment. What is clear is that he and the rest of the first generation of
economistswhowerefulltimeexperimentalists,suchasCharlesPlott,continuedtouserandomization
notonlyforbasicdivisionintotreatmentgroupsbutalso(asmanyothersmentionedinthissurvey)to
controlforanythingunexpectedthatmayhavecauseddifferentoutcomesindifferenttrials.29

7.Conclusion

This paper has argued that a single notion of randomized assignment captures not only the
usual application of random allocation into treatment and control groups, but also more broadly any
randomization meant to control for observable and unobservable factors that may influence the
outcome. This allows for the legitimate comparison and/or pooling of different observations (data
points).Underthisdefinitionthefirstuseofrandomizedassignmentwasinthefieldofpsychology,and
inparticularpsychophysics,in1884.Oneofthenovelobservationspresentedhereisthatnotonlydidit
takeanother40yearsbeforerandomallocationintogroupswasfirstdocumented,butthatwhenthat
didhappenitoccurredinfourdifferentdisciplineswithinafewyearsofeachother:agriculturalscience,
medicine,educationalpsychology,andpoliticalscience.Infact,thefieldexperimentonvotingbehavior
(Gosnell 1926), which contrary to some of the previous scholarly literature we claim was indeed
randomized, appears to be the very first RCT (randomized control trial). While medicine continued to
userandomizedassignmentasstandardpractice,thiswasnotthecaseinthesocialsciences.Amodest
waveofpolicyrelevantexperimentsinsociology,criminology,publichealth,politicalscience,education,
andeventuallyeconomicsdidsweepthroughfromthe1940stothe1970s,butthiswasnotsustainedin
manyofthefields.Inthelast15years,sinceapproximatelytheturnofthecentury,wehavestartedto
observe a second wave of randomized experiments in economics, political science, global health, and
especiallyaidanddevelopment.
Although there has been much discussion over the decades of potential drawbacks from
research using randomized assignment, which may explain some of the sputtering timeline above,
economists have been particularly vocal while attempting to maintain a balanced perspective.30 An
immediatequestionsurroundsethics:ontheonehandnothingcouldbefairerthanrandomization;on
theotherhandnothingcoulduselessinformationaboutwheretooptimallyallocatelimitedresources
thanrandomization.Ifthereisrelativelygoodinformationaboutinterventions,andthedefaultbehavior
involvesallocationthatisrelativelyefficientfromasocialperspective,thenrandomizationisharderto
defend.Oneresponsethatiscommoninmedicaltrialsistousethecurrentbestpracticeforthecontrol
group (rather than e.g. a placebo). If there is less information in the first place, or if the default
allocation mechanism is suboptimal (e.g. via corruption and nepotism), then randomized assignment
startstolookmuchmoreattractiveonitsownmerits.

29

Myuseofrandomnesswasoftentoprotectmefrommyselformaybeagradstudent[]tomakesurethatthe
resultswerenotaconsequenceofsomesubtleexperimentalprocedure.(CharlesPlott,personalcommunication)
30
SeeforinstanceHarrisonandList(2004),Deaton(2010),andListandRasul(2011).

A more practical less philosophical concern involves selection bias into the experiment in the
first place, potentially casting doubt on external validity. Randomization absolves the researcher from
anyselectionissuesbetweentreatmentgroups,butitmayintroducebiaspriortothatpointifsubjects
knowthatitwilloccur,asistypicallythecasewithclinicaltrials,socialexperiments(butnotnaturalfield
experiments), and laboratory experiments. Subjects themselves may be worried about ethical
considerations, or they may not like the idea of being experimented upon, or they may believe that
theytendtobeunlucky,oranyofanumberofpossibleconcerns.KramerandShapiro(1984)claimthat
it is much harder to recruit subjects for randomized than for nonrandomized drug trials. By definition
thispossibleeffectisdifficulttotest,althoughonecantrytocomparecharacteristicsofsubjectswho
respond to varied recruitment approaches; see Gazzale et al. (2013) in the context of laboratory
experimentsineconomics.
Ofcourse,therearealsomanyenvironmentswhererandomizedassignmentissimplyinfeasible:
imaginenationwidehealthsystemsalthougheveninsuchsettingscreativeexperimentaldesignscan
begintonibblearoundtheedges.Despitetheseoccasionalshortcomings,randomizedtrialsarehereto
stayandremainthegold standardintheeyesofmany.Theirhistoricaldevelopmentappearstohave
beensomewhatslow,inthesensethatthegapbetweentheoryandpracticehaslaggedoften,butithas
been inexorableandithasshownconvergenceacrossmultiple fieldsofenquiry.Weshouldexpectto
seeacontinuedriseinthenumbersandtypesofrandomizedexperimentsovertime,andacontinued
imperialismtowardthedomainsinwhichtheycanbeapplied.

References
AmbersonJB,McMahonBT,andPinnerM(1931)AClinicalTrialofSanocrysininPulmonary
TuberculosisAmericanReviewofTuberculosis24,40135.
AresCE,RankinA,andSturzH(1963)TheManhattanBailProject:AnInterimReportontheUseofPre
trialParoleNewYorkUniversityLawReview38,6795.
AtkinsonRCandSuppesP(1957)AnAnalysisofTwoPersonGameSituationsinTermsofStatistical
LearningTheoryOfficeofNavalResearchContractNR171034TechnicalReport8,April25.
BanerjeeAVandDufloE(2011)PoorEconomics:ARadicalRethinkingoftheWaytoFightGlobal
PovertyPublicAffairs,NewYork.
BoruchRF,McSweenyAJ,andSoderstromEJ(1978)RandomizedFieldExperimentsforProgram
Planning,Development,andEvaluationEvaluationQuarterly2:4,65595.
BrookRHandothers(1983)DoesFreeCareImproveAdultsHealth?ResultsfromaRandomized
ControlledTrialNewEnglandJofMedicine309,142634.

ChalmersI(2001)ComparingLikewithLike:SomeHistoricalMilestonesintheEvolutionofMethodsto
CreateUnbiasedComparisonGroupsinTherapeuticExperimentsInternationalJof
Epidemiology30,115664.
ChamberlinEH(1948)AnExperimentalImperfectMarketJofPoliticalEconomy56:2,95108.
ChangMC,CernadaGP,andSunTH(1972)AFieldworkerIncentiveExperimentalStudyStudiesin
FamilyPlanning3:11,2702.
ClaassenJAHR(2005)TheGoldStandard:NotaGoldenStandardBritishMedicalJ330,1121.
ColebrookD(1929)IrradiationandHealthMedicalResearchCouncilSpecialReport131,413.
Darwin,Charles(1859)OntheOriginofSpeciesbyMeansofNaturalSelectionJohnMurray,London.
DavidsonD,SiegelS,andSuppesP(1955)SomeExperimentsandRelatedTheoryontheMeasurement
ofUtilityandSubjectiveProbabilityOfficeofNavalResearchContractNR171034Technical
Report1,August15.
DeatonA(2010)Instruments,Randomization,andLearningaboutDevelopmentJofEconomic
Literature48,42455.
DehueT(1997)Deception,Efficiency,andRandomGroups:PsychologyandtheGradualOriginationof
theRandomGroupDesignIsis88:4,65373.
DoullJA,HardyM,ClarkJH,andHermanMB(1931)TheEffectofIrradiationwithUltravioletLighton
theFrequencyofAttacksofUpperRespiratoryDiseaseAmericanJofHygiene13,46077.
EdenTandFisherRA(1927)StudiesinCropVariation,IV.TheExperimentalDeterminationofthe
ValueofTopDressingswithCerealsJofAgriculturalScience17,54862.
EldersveldSJ(1956)ExperimentalPropagandaTechniquesandVotingBehaviorAmericanPolitical
ScienceReview50,15465.
FarringtonDP(1983)RandomizedExperimentsonCrimeandJusticeCrimeandJustice4,257308.
FechnerG(1860)ElementederPsychophysikvonBreitkopf&Haertel,Leipzig.
FibigerJ(1898)OmSerumbehandlingafDifteriHospitalstidende6:30925.
FienbergSEandTanurJM(1987)ExperimentalandSamplingStructures:ParallelsDivergingand
MeetingInternationalStatisticalReview55:1,7596.
FiorinaMPandPlottCR(1978)CommitteeDecisionsunderMajorityRule:AnExperimentalStudy
AmericanPoliticalScienceReview72:2,57598.
FisherRA(1935)TheDesignofExperimentsOliverandBoyd,London.

ForsetlundL,ChalmersI,andBjrndalA(2007)WhenWasRandomAllocationFirstUsedtoGenerate
ComparisonGroupsinExperimentstoAssesstheEffectsofSocialInterventions?Economicsof
InnovationandNewTechnology16:5,37184.
FrancisTJrandothers(1955)AnEvaluationofthe1954PoliomyelitisVaccineTrialsAmericanJof
PublicHealth45:5(pt2),163.
GazzaleR,JamisonJ,KarlanA,andKarlanD(2013)AmbiguousSolicitation:AmbiguousPrescription
EconomicInquiry51:1,100211.
GosnellHF(1926)AnExperimentintheStimulationofVotingAmericanPoliticalScienceReview20:4,
86974.
GosnellHF(1927)GettingouttheVote:AnExperimentintheStimulationofVotingUniversityofChicago
Press,Chicago.
GreenDPandGerberAS(2003)TheUnderprovisionofExperimentsinPoliticalScienceAnnalsofthe
AmericanAcademyofPoliticalandSocialScience589,94112.
GreenbergDandShroderM(2004)TheDigestofSocialExperiments(3rded)UrbanInstitutePress,
WashingtonDC.
HackingI(1988)Telepathy:OriginsofRandomizationinExperimentalDesignIsis79:3,42751.
HansonEHandMarksES(1958)InfluenceoftheInterviewerontheAccuracyofSurveyResultsJof
AmericanStatisticalAssociation53:283,63555.
HarrisonGWandListJA(2004)FieldExperimentsJofEconomicLiterature42,100955.
vanHelmontJB(1648)OrtusMedicin.IdEst,InitiaPhysicInauditaElsevier,Amsterdam.
vanHelmontJB(1662)Oriatrike,orPhysickRefined:TheCommonErrorsThereinRefutedandtheWhole
ArtReformedandRectifiedLodowickLoyd,London.
HrbjartssonA,GtzschePC,andGluudC(1998)TheControlledClinicalTrialTurns100Years:Fibigers
TrialofSerumTreatmentofDiphtheriaBritishMedicalJ317,12435.
JamisonDT(1980)RadioEducationandStudentFailureinNicaragua:AFurtherNoteinRadio
MathematicsinNicaragua(FriendJ,SearleB,andSuppesP,eds)InstituteforMathematical
StudiesintheSocialSciences,StanfordCA,22536.
JamisonJ,KarlanD,andSchechterL(2008)ToDeceiveornottoDeceive:TheEffectofDeceptionon
BehaviorinFutureLaboratoryExperimentsJofEconomicBehavior&Organization68,47788.
KalischG,MilnorJW,NashJ,andNeringED(1952)SomeExperimentalnPersonGamesRAND
ResearchMemorandum948,Aug25.

KramerMandShapiroS(1984)ScientificChallengesintheApplicationofRandomizedTrialsJofthe
AmericanMedicalAssociation252:19,273945.
LindJ(1753)ATreatiseoftheScurvy.InThreeParts.ContaininganInquiryintotheNature,Causesand
Cure,ofthatDisease.TogetherwithaCriticalandChronologicalViewofwhathasbeen
PublishedontheSubjectKincaidandDonaldson,Edinburgh.
ListJAandRasulI(2011)FieldExperimentsinLaborEconomicsinHandbookofLaborEconomicsVol
4a(AshenfelterOandCardD,eds)NorthHolland,Amsterdam,103228.
LoganCH(1973)EvaluationResearchinCrimeandDelinquency:AReappraisalJofCriminalLawand
Criminology63:3,37887.
MahalanobisPC(1946)RecentExperimentsinStatisticalSamplingintheIndianStatisticalInstituteJ
oftheRoyalStatisticalSociety109,32578.
MarksHM(2008)JamesAngusDoullandtheWellcontrolledCommonColdJoftheRoyalSocietyof
Medicine101:10,1179.
MayoE(1933)TheHumanProblemsofanIndustrialCivilizationMacmillan,NewYork.
McCallWA(1923)HowtoExperimentinEducationMacmillan,NewYork.
MedicalResearchCouncil(1948)StreptomycinTreatmentofPulmonaryTuberculosis:AMedical
ResearchCouncilInvestigationBritishMedicalJ2,76982.
MedicalResearchCouncil(1951)PreventionofWhoopingcoughbyVaccination:AMedicalResearch
CouncilInvestigationBritishMedicalJ1,146371.
MeierP(1972)TheBiggestPublicHealthExperimentEver:The1954FieldTrialoftheSalkPoliomyelitis
VaccineinStatistics:AGuidetotheUnknownHoldenDay,SanFrancisco,213.
MostellerFandNogeeP(1951)AnExperimentalMeasurementofUtilityJofPoliticalEconomy59:5,
371404.
MllerGE(1879)berdieMaassbestimmungendesOrtsinnesderHautMittelsderMethodeder
RichtigenunFalschenFlleArchivfrdieGesammtePhysiologiedesMenschenundderThiere
19,191235.
OakleyA(2000)AHistoricalPerspectiveontheUseofRandomizedTrialsinSocialScienceSettings
Crime&Delinquency46:3,31529.
PapineauD(1994)TheVirtuesofRandomizationBritishJforthePhilosophyofScience45:2,43750.
PasteurL(1881)CompterenduSommairedesExpriencesFaitesPouillyleFortprsMelun,surla
VaccinationCharbonneuseComptesRendusdelAcademiedesScience92:137883.

PearlJ(2000)CausalityCambridgeUniversityPress,Cambridge.
PeirceCSandJastrowJ(1885)OnSmallDifferencesofSensationMemoirsoftheNationalAcademyof
Sciencesfor18843,7583.
PetrarchJ(1364)LettertoBoccaccio(V.3)RerumSeniliumLibri.LiberXIV:Epistola1.
PopulationCouncil(1963)TheTaichungProgramofPrePregnancyHealthStudiesinFamilyPlanning
1:1,1012.
PowersEandWitmerH(1951)AnExperimentinthePreventionofJuvenileDelinquency:The
CambridgeSomervilleYouthStudyColumbiaUniversityPress,NewYork.
ReimerEandWarrenM(1957)SpecialIntensiveParoleUnitNationalProbationandParole
AssociationJ3,2229.
RemmersHH(1928)ADiagnosticandRemedialStudyofPotentiallyandActuallyFailingStudentsat
PurdueUniversityBulletinofPurdueUniversity:StudiesinHigherEducation9:29,#12.
RichetC(1884)LaSuggestionMentaleetleCalculdesProbabilitsRevuePhilosophiquedelaFrance
etdeltranger18,60974.
RossH(1970)AnExperimentalStudyoftheNegativeIncomeTaxChildWelfare49:10,5629.
RossRT(1934)OptimumOrdersforthePresentationofPairsintheMethodofPairedComparisonsJ
ofEducationalPsychology25:5,37582.
RothAE(1993)OntheEarlyHistoryofExperimentalEconomicsJoftheHistoryofEconomicThought
15,184209.
RousseasSWandHartAG(1951)ExperimentalVerificationofaCompositeIndifferenceMapJof
PoliticalEconomy59:4,288318.
SearleB,MatthewsP,SuppesP,andFriendJ(1978)FormalEvaluationofthe1976FirstGrade
InstructionalPrograminTheRadioMathematicsProject:Nicaragua197677(SuppesP,Searle
B,andFriendJ,eds)InstituteforMathematicalStudiesintheSocialSciences,StanfordCA,97
124.
SiegelSandFourakerLE(1960)BargainingandGroupDecisionmaking:ExperimentsinBilateral
MonopolyMcGrawHill,NewYork.
SmithGCSandPellJP(2003)ParachuteUsetoPreventDeathandMajorTraumaRelatedto
GravitationalChallenge:SystematicReviewofRandomisedControlledTrialsBritishMedicalJ
327,145961.
SmithVL(1962)AnExperimentalStudyofCompetitiveMarketBehaviorJofPoliticalEconomy70:2,
11137.

SmithVL(2008)DiscoveryAMemoirAuthorHouse,Bloomington.
StiglerSM(1992)AHistoricalViewofStatisticalConceptsinPsychologyandEducationalResearch
AmericanJofEducation101,6070.
SuppesPandCarlsmithJM(1962)ExperimentalAnalysisofaDuopolySituationfromtheStandpointof
MathematicalLearningTheoryInternationalEconomicReview3:1,6078.
TakeshitaJ(1964)TheTaichungProgramofPrePregnancyHealthStudiesinFamilyPlanning1:4,10
12.
ThurstoneLL(1931)TheIndifferenceFunctionJofSocialPsychology2,13967.
UrbachP(1985)RandomizationandtheDesignofExperimentsPhilosophyofScience52,25673.
WaltersJE(1931)SeniorsasCounselorsJofHigherEducation2,4468.
WatsonW(1768)AnAccountofaSeriesofExperiments,InstitutedwithaViewofAscertainingtheMost
SuccessfulMethodofInoculatingtheSmallpoxJNourse,London.

Vous aimerez peut-être aussi