Lower Bounds on Sample Size in Structural Equation Modeling

1
LOWERBOUNDSONSAMPLESIZEINSTRUCTURALEQUATIONMODELING

Published in Electronic Commerce Research and Applications, forthcoming Dec 2010, PII:S15674223(10)000542, DOI: 10.1016/j.elerap.2010.07.003 (with software downloadable from
Elsevier)
J.ChristopherWestland
Professor,Information&DecisionSciences
UniversityofIllinois,Chicago
601S.MorganStreet,Chicago,IL606077124
(312)8600587email:westland@uic.edu
JULY 2010
ABSTRACT
LOWERBOUNDSONSAMPLESIZEINSTRUCTURALEQUATIONMODELING
Computationallyintensivestructuralequationmodeling(SEM)approacheshavebeenindevelopmentovermuchofthe20
th
century,initiatedbytheseminalworkofSewallWright. Tothis
day,samplesizerequirementsremainavexingquestioninSEMbasedstudies. Complexitieswhichincreaseinformationdemandsinstructuralmodelestimationincreasewiththenumber
ofpotentialcombinationsoflatentvariables;whiletheinformationsuppliedforestimationincreaseswiththenumberofmeasuredparameterstimesthenumberofobservationsinthe
samplesizebotharenonlinear. Thisalonewouldimplythatrequisitesamplesizeisnotalinearfunctionsolelyofindicatorcount,eventhoughsuchheuristicsarewidelyinvokedin
justifyingSEMsamplesize. ThispaperdevelopstwolowerboundsonsamplesizeinSEM,thefirstasafunctionoftheratioofindicatorvariablestolatentvariables,andthesecondasa
functionofminimumeffect,powerandsignificance.ThealgorithmisappliedtoametastudyofasetofresearchpublishedinfiveofthetopMISjournals. Thestudyshowsasystematic
biastowardschoosingsamplesizesthataresignificantlytoosmall. Actualsamplesizesaveragedonly50%oftheminimumneededtodrawtheconclusionsthestudiesclaimed. Overall,
80%oftheresearcharticlesinthemetastudydrewconclusionsfrominsufficientsamples.Lackingaccuratesamplesizeinformation,researchersareinclinedtoeconomizeonsample
collectionwithinadequatesamplesthathurtthecredibilityofresearchconclusions. Guidelinesareprovidedforapplyingthealgorithmsdevelopedinthisstudy,andcompanionsoftware
encapsulatingthepapersformulaeismadeavailablefordownload. (261words)
Keywords: Structuralequationmodeling,SEM,Partialleastsquares,PLS,LISREL,AMOS,samplesize,Ginicorrelation,commonfactorbias,ruleof10

3
1.INTRODUCTION
Thepasttwodecadeshaveseenaremarkableaccelerationofinterestinstructuralequationsmodeling(SEM)methodsinmanagementresearch,includingpartialleastsquares(PLS)and
implementationsofJreskogsSEMalgorithms(LISREL,AMOS,EQS). ThebreadthofapplicationofSEMmethodshasbeenexpanding,withSEMincreasinglyappliedtoexploratory,
confirmatoryandpredictiveanalysiswithavarietyofadhoctopicsandmodels. SEMisparticularlyusefulinthesocialscienceswheremanyifnotmostkeyconceptsarenotdirectly
observable.Becausemanykeyconceptsinthesocialsciencesareinherentlylatent,questionsofconstructvalidityandmethodologicalsoundnesstakeonaparticularurgency.
Tothisday,methodologiesforassessingsuitablesamplesizerequirementsremainavexingquestioninSEMbasedstudies. Thenumberofdegreesoffreedomconsuminginformationin
structuralmodelestimationincreaseswiththenumberofpotentialcombinationsoflatentvariables;whiletheinformationsuppliedinestimatingincreaseswiththenumberofmeasured
parameters(i.e.,indicators)timesthenumberofobservations(i.e.,thesamplesize)botharenonlinearinmodelparameters. Thisshouldimplythatrequisitesamplesizeisnotalinear
functionsolelyofindicatorcount,eventhoughsuchheuristicsarewidelyinvokedinjustifyingSEMsamplesize. MonteCarlosimulationinthisfieldhaslentsupporttothenonlinearityof
samplesizerequirements,thoughresearchtodatehasnotyieldedasamplesizeformulasuitableforSEM. Thispaperproposesasetofnecessaryconditions(thuslowerbounds)forSEM
sampleadequacy.
Theexpositionproceedsasfollows. Section2describesthehistoricalcontext,commentingonhowparticularresearchobjectivesandcomputationallimitationsresultedinourcurrentSEM
toolsets.Section3summarizesthepriorliteratureonsampleadequacyresultsfromMonteCarlosimulations.Section4developsanalgorithmforcomputingtheminimumsamplesize
neededtodetectaminimumeffectatgivenpowerandsignificancelevelsinthestructuralequationmodel. Section5discussesthesewithanapplicationresearcharticleswhose
conclusionsrestonconfirmatorySEManalyses,andassesseswhetherthesamplesizesusedareadequate.
2.PRIORLITERATURE
SEMevolvedinthreedifferentstreams:(1)systemsofequationregressionmethodsdevelopedmainlyattheCowlesCommission;(2)iterativemaximumlikelihoodalgorithmsforpath
analysisdevelopedmainlyattheUniversityofUppsala;and(3)iterativeleastsquaresfitalgorithmsforpathanalysisalsodevelopedattheUniversityofUppsala. Figure1providesa
chronologyofthepivotaldevelopmentsinlatentvariablestatisticsintermsofmethod(precomputer,computerintensiveandSEM)andobjectives(exploratory/predictionor
confirmation).
INSERT FIGURE1:DEVELOPMENTOFSTRUCTURALEQUATIONMODELESTIMATION
BothLISRELandPLSwereconceivedasiterativecomputeralgorithms,withanemphasisfromthestartoncreatinganaccessiblegraphicalanddataentryinterfaceandextensionofWrights
(1921)pathanalysis. EarlyCowlesCommissionworkonsimultaneousequationsestimationcenteredonKoopmanandHoods(1953)algorithmsfromtheeconomicsoftransportationand
optimalrouting,withmaximumlikelihoodestimation,andclosedformalgebraiccalculations,asiterativesolutionsearchtechniqueswerelimitedinthedaysbeforecomputers. Anderson
andRubin(1949,1950)developedthelimitedinformationmaximumlikelihoodestimatorfortheparametersofasinglestructuralequation,whichindirectlyincludedthetwostageleast
squaresestimatoranditsasymptoticdistribution(Anderson,2005)andFarebrother(1999). Twostageleastsquareswasoriginallyproposedasamethodofestimatingtheparametersofa
singlestructuralequationinasystemoflinearsimultaneousequations,beingintroducedbyTheil(1953a,1953b,1961)andmoreorlessindependentlybyBasmann(1957)andSargan
(1958). Andersonslimitedinformationmaximumlikelihoodestimationwaseventuallyimplementedinacomputersearchalgorithm,whereitcompetedwithotheriterativeSEM
algorithms. Ofthese,twostageleastsquareswasbyfarthemostwidelyusedmethodinthe1960sandtheearly1970s.
LISRELandPLSpathmodelingapproacheswerechampionedatCowlesmainlybyNobelistTrygveHaavelmo(1943).UnfortunatelyunderlyingassumptionsofLISRELandPLSwerechallenged
byeconomistssuchasFreedman(1987)whoobjectedtotheirfailuretodistinguishamongcausalassumptions,statisticalimplications,andpolicyclaimshasbeenoneofthemainreasons
forthesuspicionandconfusionsurroundingquantitativemethodsinthesocialsciences(seealsoWolds(1987)response). Haavelmospathanalysisnevergainedalargefollowingamong
U.S.econometricians,butwassuccessfulininfluencingagenerationofHaavelmosfellowScandinavianstatisticians,includingHermannWold,KarlJreskog,andClaesFornell. Fornell
introducedLISRELandPLStechniquestomanyofhisMichigancolleaguesthroughinfluentialpapersinaccounting(FornellandLarker1981),andinformationsystems(Davis,etal,1989).
Dhrymes(1971;Dhrymes,etal.1974)providedevidencethatPLSestimatesasymptoticallyapproachedthoseoftwostageleastsquareswithexactlyidentifiedequations. Thispointis
moreofacademicimportancethanpractical,becausemostempiricalstudiesoveridentify. Butinonesense,allofthelimitedinformationmethods(OLSexcluded)yieldsimilarresults.
3.SAMPLESIZEANDTHERATIOOFINDICATORSTOLATENTVARIABLES
4
StructuralequationmodelinginMIShastakenacasualattitudetowardschoiceofsamplesize. Sincetheearly1990s,MISresearchershavealludedtoanadhocruleofthumbrequiringthe
choosingof10observationsperindicatorinsettingalowerboundfortheadequacyofsamplesizes. Justificationsforthisruleof10appearinseveralfrequentlycitedpublications(Barclay,
etal.1995;Chin1998;Chin,andNewsted1999;KahaiandCooper2003)thoughnoneoftheseresearchersreferstotheoriginalarticulationoftherulebyNunnally(1967)whosuggested
(withoutprovidingsupportingevidence)thatinSEMestimationagoodruleistohaveatleasttentimesasmanysubjectsasvariables.
WithintheMISfield,Goodhue,etal.(2006,2007)studiedtheruleof10usingMonteCarlosimulationtocomparesamplesizesof40,90,150,and200,alongwithvaryingeffectsizes(large,
medium,smallandnoeffect)todeterminetheadequacyofthisruleforagivensignificanceandpoweroftests. Theyconcludedthat:Infact,forsimple[SEM]modelswithnormally
distributeddataandrelativelyreliablemeasures,noneofthethreetechniqueshaveadequatepowertodetectsmallormediumeffectsatsmallsamplesizesThesefindingsruncounterto
extantsuggestionsinMISliterature(Goodhue,etal.2006,p.202b). Thisfindingisnotcompletelyunexpected,assimilarSEMrulesofthumbhavebeeninvestigatedsinceNunnallys
(1967)proposal. Thedebatehasevolvedsignificantlysincehis1967publication.
Theruleof10couchesthesamplesizequestionintermsoftheratioofobservations(samplepoints)tofreeparametersforexample,Bollen(1989)statedthatthoughIknowofnohard
andfastrule,ausefulsuggestionistohaveatleastseveralcasesperfreeparameterandBentler(1989)suggesteda5:1ratioofsamplesizetonumberoffreeparameters. Butisthisthe
rightquestion? Typicallytheirparameterswereconsideredtobeindicatorvariablesinthemodel,butunliketheearlypathanalysis,structuralequationmodelstodayaretypically
estimatedintheirentirety,andthenumberofuniqueentriesinthecovariancematrixis
p(p+1)
2
when p isthenumberofindicators.Itwouldbereasonabletoassumethatthesamplesize
isproportionalto
p(p+1)
2
ratherthan p. Unfortunately,MonteCarlostudiesconductedinthe1980sand1990sshowedthattheproblemissomewhatmoresubtleandcomplexthan
that,andsamplesizeandestimatorperformancearegenerallyuncorrelatedwitheither
p(p+1)
2
or p.
Difficultiesarisebecausethe p indicatorvariablesareusedtoestimatethe k latentvariables(theunobservedvariablesofinterest)intheSEM,andeventhoughtheremaybe
p(p+1)
2

freeparameters,thesearenotindividuallythefocusofSEMestimation. Rather,freeparametersareclusteredaroundamuchsmallersetoflatentvariableswhicharethefocusofthe
estimation(oralternatively,thecorrelationsbetweentheseunobservedlatentvariablesarethefocusofestimation). Tanaka(1987)arguedthatsamplesizeshouldbedependentonthe
numberofestimatedparameters(thelatentvariablesandtheircorrelations)ratherthanonthetotalnumberofindicators;aviewmirroredinotherdiscussionsofminimumsamplesizes
(BrowneandCudeck1989,1993;GewekeandSingleton1980;GebringandAnderson1985). VeicerandFava(1987,1989,1994)wentfurther,afterreviewingavarietyofsuch
recommendationsintheliterature,concludingthattherewasnosupportforrulespositingaminimumsamplesizeasafunctionofindicators. Theyshowedthatforagivensamplesize,a
convergencetopropersolutionsandgoodnessoffitwerefavorablyinfluencedby:(1)agreaternumberofindicatorsperlatentvariable;and(2)agreatersaturation(higherfactorloadings).
MarshandBailey(1991)concludedthattheratioofindicatorstolatentvariablesratherthanjustthenumberofindicators,assuggestedbytheruleof10,isasubstantiallybetterbasison
whichtocalculatesamplesize,reiteratingconclusionsreachedbyBoomsma(1982)whosuggestedusingaratio r =
p
k
, ofindicatorstolatentvariables. InformationinputtotheSEM
estimationincreasesbothwithmoreindicatorsperlatentvariable,aswellaswithmoresampleobservations. Aseriesofstudies(Ding,etal.1995)foundthattheprobabilityofrejecting
truemodelsatasignificancelevelof5%wascloseto5%for r = 2 (where r istheratioofindicatorstolatentvariables)butrosesteadilyas r increased for r = 6 ,rejection
rateswere39%forsamplesizeof50;22%forsamplesizeof100;12%forsamplesizeof200;and6%forsamplesizeof400.
Boomsmas(1982)simulationssuggestedthataratio r ofindicatorstolatentvariablesof r = 4 wouldrequireasamplesizeofatleast100foradequateanalysis;andfor r = 2 would
requireasamplesizeofatleast400. Marshetal(1988,1996,1998))ran35,000MonteCarlosimulationsonLISRELCFAanalysis,yieldingdatathatsuggestedthat: r = S wouldrequirea
samplesizeofatleast200; r = 2 wouldrequireasamplesizeofatleast400; r = 12 wouldrequireasamplesizeofatleast50. Consolidationandsummarizationoftheseresults
suggestsamplesizes:
n Sur
2
-4Sur +11uu
where r istheratioofindicatorstolatentvariables. Furthermore,Marshetal.(1996)recommend r = 6 to 1u indicatorsperlatentvariable,assuming2550%oftheinitialchoicesadd
noexplanatorypower,whichtheyfoundtooftenbethecaseintheirstudies. TheynotethatthisisasubstantiallylargerratiothanfoundinmostSEMstudies,whichtendtolimit
themselvesto34indicatorsperlatentvariable. Itispossiblethatasamplesizeruleoftenobservationsperindicatormayindeedbiasresearcherstowardsselectingsmallernumbersof
indicatorsperlatentvariableinordertocontrolthecostofastudyorthelengthofasurveyinstrument.
4.SAMPLESIZEWITHPAIREDLATENTVARIABLES
ThissectiondevelopsanalgorithmforcomputingthelowerboundonsamplesizerequiredtoconfirmorrejecttheexistenceofaminimumeffectinanSEMatgivensignificanceandpower
levels. WhereSEMstudiesaredirectedtowardshypothesistestingforcomplexmodels,withsomelevelofsignificance o andpower 1 - [,calculatingthepowerrequiresfirstspecifying
theeffectsize o youwanttodetect. Fundingagencies,ethicsboardsandresearchreviewpanelsfrequentlyrequestthataresearcherperformapoweranalysis,theargumentisthatifa
5
studyisinadequatelypowered,thereisnopointincompletingtheresearch. Additionally,intheframeworkofSEMtheassessmentofpowerisaffectedbythevariableinformation
containedinsocialsciencedata. Table1summarizesthenotationused.
INSERT TABLE1:NOTATIONUSEDINTHEPAPER
DECONSTRUCTION
ThisresearchasksWhatisthelowerboundonsamplesize n forconfirmatorytestingofSEMasafunctionofthesedesignparameters? Wewanttodetectaminimumcorrelation
(effect) o inestimating k latent(unobserved)variables,atsignificanceandpowerlevels (o
-
, 1 -[). Inotherwords,deviseanalgorithm () sucb tbot n = |k, o|o
-
, [].
Inthissection,wewilladoptthestandardtargetsforourrequiredTypeIandIIerrorsunderNeymanPearsonhypothesistestingof o
-
= .uS and [ = .2u;buttheserequirementscanbe
relaxedforamoregeneralsolution. Structuralequationmodelsarecharacterizedhereasacollectionofpairsofcanonicallycorrelatedlatentvariables,andadheretothestandard
normalcyassumptiononindicatorvariables. ThisleadsnaturallytoadeconstructionoftheSEMintoanoverlappingsetofbivariatenormaldistributions. Maketheassumptionthatan
arbitrarilyselectedpairoflatentvariables,callthem X
and
,arebivariatenormalwithdensityfunction:
(x, y|p
x
, p
, p, o
x
, o
) =
1
2nc
x
c
j1-p
cxp _-
1
2(1-p
2
)
_
(x-
x
)
2
c
x
2
+
(-
j
)
2
c
j
2
-
2p(x-
x
)(-
j
)
c
x
c
j
]_
andcovariancestructureis = _
o
x
2
po
x
o
po
x
o
2
_
COMBINATORICSOFHYPOTHESISTESTSONLINKS,ANDSIGNIFICANCELEVEL
ItistypicalintheliteraturetopredicateanSEManalysiswiththecaveatthatoneneedstomakestrongargumentsforthecomplexmodelsconstructedfromtheunobserved,latent
constructstestedwiththeparticularSEM,inordertosupporttheparticularlinksthatareincludedinthemodel. Thisisusuallyinterpretedtomeanthateachproposed(andtested)linkin
theSEMneedstobesupportedwithreferencestopriorresearch,anecdotalevidenceandsoforth. Thismaysimplymeanthewholesaleimportofapreexistingmodel(e.g.,theTechnology
AcceptanceModel)basedonthesuccessofthatmodelinothercontexts,butnotspecificallybuildingontheparticulareffectsunderinvestigation. Unfortunately,itisuncommontosee
anydiscussionoftheparticularlinks(causalorotherwise)orcombinationsoflinksthatareexcluded(eitherimplicitlyorexplicitly)fromtheSEMmodel. Ideally,thereshouldalsobe
similarlystrongargumentsmadefortheinapplicabilityofomittedlinksoromittedcombinationsoflinks.
Wecanformalizetheseobservationsbyletting i bethenumberofthepotentiallinksbetweenlatentvariables. Extendtheindividuallinkminimumsamplesizetoaminimumsamplesize
fortheentireSEM;buildingupfrompairsoflatentvariablesbydeterminingthenumberofpossiblecombinationsofthe i pairs,eachwithaneffectthatneedsdetection. Eacheffectcan
bedichotomized:
link
= _
u:
I
<
1:
I

Ourproblemistocomputethenumberofdistinctstructuralequationmodelsthatcanexistintermsofthe0,1valuesoftheirlinksusingcombinatorialanalysis.
INSERT FIGURE2:ANEXAMPLEOFASTRUCTURALEQUATIONMODELWITHSIXLATENTVARIABLESANDFIVECORRELATIONS
INSERT FIGURE3:THESEMEXAMPLEINFIGURE2WITHALLPOSSIBLEPAIREDLINKSSHOWN
Theneachcombinationof {u,1] valuesforlinkswhichourtestsoftheSEMonthewholerequiresustodiscriminateamongstprovidesusasetof
k(k-1)
2
binarynumbers(seefigures2
and3)eachrepresentingauniquecombinationoflatentvariables. Theuniquemodelhypothesizedinanyparticularstudywillbesomemodel(binarynumber)whichisexactlyoneoutof
thepossible 2
k(k-1)
2
,
waysofconnectingtheselatentvariables;testingmustdiscriminatethispathfromthepossible 2
k(k-1)
2
,
-1 otherpathswhichcollectivelydefinethealternative
hypothesis.
Forhypothesistestingwithasignificanceof o
-
(whichwehavebydefaultsetto o
-
= .uS)oneachlink,itisnecessarytocorrectforeffectivesignificancelevel o indifferentiatingone
possiblemodelfromallotherhypothesizedstructuralequationmodelsthatarepossible. TheidkcorrectionisacommonlyusedalternativefortheBonferronicorrectionwherean
experimenteristestingasetof hypotheseswithadatasetcontrollingthefamilywiseerrorrate. Inthecontextofthecurrentresearchtheidkcorrectionprovidesthemostaccurate
results. Forthefollowinganalysis,aidkcorrectiongives o = o(k) = 1 -(1 -o
-
)
2
k(k-1)
,
wherethepowerofthetestcanbeheldat 1 - [ = .8 overtheentireSEMwithno
modification.
6
MINIMUMEFFECTSIZE 6
Minimumeffect,inthecontextofstructuralequationmodels,isthesmallestcorrelationbetweenlatentvariablesthatwewishtobeabletodetectwithoursampleandmodel. Small
effectsaremoredifficulttodetectthanlargeeffectsastheyrequiremoreinformationtobecollected. Informationmaybeaddedtotheanalysisbycollectingmoresampleobservations,
byaddingparameters,andbyconstructingabettermodel.
INSERT FIGURE4:SIGNIFICANCEANDPOWERFORTHEMINIMUMEFFECTTHATNEEDSTOBEDETECTED
Samplesizeforhypothesistestingistypicallydeterminedfromacriticalvalue(seeFigure4)thatdefinestheboundarybetweentherejection(setby o)andnonrejection(setby [)regions.
Theminimumsamplesizethatcandifferentiatebetween E
0
and E
A
occurswherethecriticalvaluethatisexactlythesameunderthenullandalternativehypotheses. Theapproachto
computingsamplesizehereisanalogoustostandardunivariatecalculations(Cochran1977;Kish1955;Lohr1999;SnedecorandCochran1989,WestlandandSeeto2007)butusinga
formulationforvariancecustomizedtothisproblem.
Inthecontextofstructuralequationmodels,canonicalcorrelationbetweenlatentvariablesshouldbeseensimplyascorrelation,thecanonicalqualifierreferringtotheparticularsofits
calculationinSEMsincethelatentvariablesareunobserved,andthuscannotbedirectlymeasured. Correlationisinterpretedasthestrengthofstatisticalrelationshipbetweentwo
randomvariablesobeyingajointprobabilitydistribution(KendallandGibbons1990)likeabivariatenormal.Severalmethodsexisttocomputecorrelation:thePearsonsproductmoment
correlationcoefficient(Fisher1921,1990),SpearmansrhoandKendallstau(KendallandGibbons1990)areperhapsthemostwidelyused(MariandKotz2001). Besidesthesethree
classicalcorrelationcoefficients,variousestimatorsbasedonMestimation(ShevlyakovandVilchevski2002)andorderstatistics(SchechtmanandYitzhaki1987)havebeenproposedinthe
literature.Strengthsandweaknessesofvariouscorrelationcoefficientsmustbeconsideredindecisionmaking.ThePearsoncoefficient,whichutilizesalltheinformationcontainedinthe
variates,isoptimalwhenmeasuringthecorrelationbetweenbivariatenormalvariables(StuartandOrd1991). However,itcanperformpoorlywhenthedataisattenuatedbynonlinear
transformations. Thetworankcorrelationcoefficients,SpearmansrhoandKendallstau,arenotasefficientasthePearsoncorrelationunderthebivariatenormalmodel;nevertheless
theyareinvariantunderincreasingmonotonetransformations,thusoftenconsideredasrobustalternativestothePearsoncoefficientwhenthedatadeviatesfrombivariatenormalmodel.
Despitetheirrobustnessandstabilityinnonnormalcases,theMestimatorbasedcorrelationcoefficientssuffergreatlosses(upto63%accordingtoXu,etal.2010)ofasymptoticrelative
efficiencytothePearsoncoefficientfornormalsamples,thoughsuchheavylossofefficiencymightnotbecompensatedbytheirrobustnessinpractice. SchechtmanandYitzhaki(1987)
proposedacorrelationcoefficientbasedonorderstatisticsforthebivariatedistributionwhichtheycallGinicorrelation(becauseitisrelatedtoGinismeandifferenceinawaythatissimilar
totherelationshipbetweenPearsoncorrelationcoefficientandthevariance).
INSERT FIGURE5:BIVARIATENORMALSCATTERPLOTSFOR j
2
[ = j
[ AND L = _
1 p
p 1
_ WITH n = 5
Asameasureofsuchstrength,correlationshouldbelargeandpositiveifthereisahighprobabilitythatlargeorsmallvaluesofonevariableoccur(respectively)inconjunctionwithlargeof
smallvaluesofanother;anditshouldbelargeandnegativeifthedirectionisreversed(GibbonsandChakraborti1992). Figure5providesarugplotofbivariatenormalscatterplots
generatedbytheRmtvnormpackagethatprovideavisualdescriptionoftheclusteringandbehaviorofparticularvaluesofcorrelation p betweenthelatentvariables.
Wewilluseastandarddefinitionofminimumeffectsizetobedetectedthestrengthoftherelationshipbetweentwovariablesinastatisticalpopulationasmeasuredbythecorrelation p
forpairedlatentvariablesfollowingconventionsarticulatedinWilkinson(1999);Nakagawaetal.(2007)andBrand,etal.(2008). Whereweareassessingcompletedresearch,wecan
substitutefor o thesmallestcorrelation(effectsize)onallofthelinksbetweenlatentvariablesintheSEM. Cohen(1988,1992)providesthefollowingguidelinesforthesocialsciences:
smalleffectsize, |p| =0.1.23;medium, |p| =0.24.36;large, |p| =0.37orlarger. Figure5givesusafeelforCohensrecommendations |p| =0.37stillhasagreatdealofdispersion,
andwemightfinditdifficulttovisuallydeterminecorrelationmerelybylookingatascatterplotwherethevariablesonthetwoaxeshavecorrelation |p| =0.37.
ESTIMATORFORCORRELATIONINABIVARIATENORMALDISTRIBUTION
Let (X
) i = 1,2, , n bearandomsampleofindependentandidenticallydistributed(i.i.d.)datapairsofsize n fromthebivariatenormalpopulationof (X, ) populationwith

continuousjointcumulativedistributionfunction. Let X
1:n
X
2:n
X
n:n
betheorderstatistics(wherethefirstsubscriptistherank,andthesecondthesamplesize)ofthe X
samplevalues;let
1:n

2:n

n:n
betheorderstatisticsofthe
samplevalues;andlet
|:n]
bethe samplevalueassociatedwiththe X
:n
samplevalueinthesamplepairs
(X
).
|:n]
iscalledtheconcomitantofthe i
th
orderstatistic(BalakrishnanandRao1998). Reversingtherolesof X andY,wecanalsoobtaintheassociated X
|:n]
.
ExtendingtheworkofSchechtmanandYitzhaki(1987), Xu,etal.(2010)showthatthetwoGinicorrelationswithrespectto (X
) are
p
u
XY
(X, ) =
1
n(n -1)
(2i -n -1)X
|:n]
n
1
n(n -1)
(2i -n -1)X
:n
n
and
p
u
YX
(, X) =
1
n(n -1)
(2i -n -1)
|:n]
n
1
n(n -1)
(2i -n -1)
:n
n
Ingeneral p
u
XY
(X, ) isnotsymmetricthatis, p
u
XY
(X, ) = p
u
YX
(, X). Suchasymmetryviolatestheaxiomsofcorrelationmeasurement(GibbonsandChakraborti1992;Mariand
Kotz2001)whichisassumedinSEMestimation. Xu,etal.(2010)provideasymmetricalestimator(whichweusehere)obtainedfromtheirlinearcombination:
p
u
(, X) =
1
2
|p
u
YX
(, X) + p
u
XY
(X, )]
Ginicorrelation p
u
possessesthefollowinggeneralproperties(SchechtmanandYitzhaki1987):
1) p
u
e |-1,1]
2) p
u
(, X) = p
u
(, X) = _1 if isamonotoneincreasing(decreasing)functionof X
3) p
u
(, X) isasymptoticallyunbiasedandtheexpectationsof p
u
(, X) and p
u
(X, ) arezerowhen isindependentof X
4) p
u
(+, +) = -p
u
(-, +) = -p
u
(+, -) = p
u
(-, -) forboth p
u
(, X) and = p
u
(X, )
5) p
u
(, X) isinvariantunderallstrictlymonotonetransformationsof X
6) p
u
(, X) isscaleandshiftinvariantwithrespecttoboth X onJ and
7) n (p
u
-p)
T
- H(u, o
u
2
) ;i.e.,convergesindistributiontoanormaldistributionwithmeanzeroandvariance o
u
2
(ThisisfromSchechtmanandYitzhaki(1987)applying
methodsdevelopedbyHoeffding(1948))
8) TheSpearmanrhomeasureofcorrelationisaspecialcaseof p
u
(, X) ;Xu,etal(2010).
8
Xu,etal.(2010)showedthatGinicorrelationsareasymptoticallynormalwiththefollowingmeanandvariance
1
:
p
u
= p(p
u
) =
p -
2(n -2)
n(n - 1)
[orcsin(
p
2
, ) +p4 -p
2
-pS +
n
3
p(n+1)
n(n-1)
+
2
n(n-1)
(orcsin(p) +p1 -p
2
) +o(n
-1
)
o
u
2
= o
2
(p
u
) =
(1 -p
2
)
n(n -1)
_1 -p
2
-p orcsin(p) +
n(n + 1)
6
_ +
(n-2)(1-p
2
)
n(n-1)
_
(1-p
2
)
4-p
2
-p orcsin(
p
2
, )_ +o(n
-1
)
Xu,etal.(2010)usedMonteCarlosimulationstoverifytheseformulasasymptoticresults(usingasymptoticrelativeefficiencyandrootmeansquareerrorperformancemetrics)showing
thattheyareapplicablefordataofevenrelativelysmallsamplesizes(downtoaround30samplepoints). TheirsimulationsconfirmedandextendHeaandNagarajabs(2009)Monte
Carlosimulationsexploringthebehaviorofninedistinctcorrelationestimatorsofthebivariatenormalcorrelationcoefficient,includingtheestimator p
u
,thesamplecorrelationforthe
bivariatenormal,andestimatorsbasedonorderstatistics. Theestimator p
u
wasfoundgenerallytoreducebiasandimproveefficiencyaswellorbetterthanothercorrelationestimators
inthestudy. Xu,etal.(2010)alsocompared p
u
withthreeothercloselyrelatedcorrelationcoefficients: (1)classicalPearsonsproductmomentcorrelationcoefficient,(2)Spearmans
rho,and(3)orderstatisticscorrelationcoefficients. GinicorrelationbridgesthegapbetweentheorderstatisticscorrelationcoefficientandSpearmansrho,anditsestimatorsaremore
mathematicallytractablethanSpearmansrho,whosevarianceinvolvescomplexellipticintegralsthatcannotbeexpressedinelementaryfunctions. Theirefficiencyanalysisshowedthat
estimator p
u
slossofefficiencyisbetween4.5%to11.3%,muchlessthanthatofSpearmansrhowhichrangesfrom8.8%to30.5%.
CALCULATIONOFSAMPLESIZEONASINGLELINK
1
o(n
-1
) convergenceimpliesthatfortheremainingterms :(n) gotozerofasterthan n
-1
; n:(n)
n-
----u
9
Constructahypothesistesttojustdetecttheminimumeffectsize o:
E
0
: p -p
0
= u
E
A
: p -p
0
= o
Theonesample,twosidedformulation(seeFigure4)thatreconcilesthenullandalternativehypothesistestsfortheestimator p
u
p
u
(n) is
u +z
1-
u
2
,
o
u
(n) = o +z
1-[
o
u
(n)
Xu,etal.(2010)showthat |p
u
-p|
n-
----u quickly:for n > Su fromabivariatenormalpopulationtheyshowthatwecanassume |p
u
-p| = u. Similarly,for n > Su wecanassume
that z -:olucs areadequateapproximationsfor t -:olucs intheformula. Evenundertheveryweakassumptionsoftheruleof10asampleof n = Su impliesamodelofat
mostthreevariablessignificantlysimplerthanthemajorityofpublishedmodels. Rearrangingtoplacealltermswith n onthelefthandside:
o
u
2
(n) = _
o
z
1-
u
2
,
-z
1-[
_
2
E
Thustowithinlittle o(n
-1
) andusingtheformulafor o
u
2
E (n, p) =
(1 -p
2
)
n(n -1)
_1 -p
2
-p orcsin(p) +
n(n +1)
6
_ +
(n -2)(1 -p
2
)
n(n -1)
_
(1 -p
2
)
4 -p
2
-p orcsin(
p
2
, )_
Wewanttorestatethisassomefunctionthatcalculatessamplesize n = g(E, p). Solvefor n bysimplifyingintermsof:
A = 1 -p
2
B = p orcsin(
p
2
, )
C = p orcsin(p)
=
A
S -A
E = _
o
z
1-
u
2
,
-z
1-[
_
2
Then n =
-L_L
2
-4P
2
arethesolutionsforthequadraticequationthatrestates E -(n, p) = u:
n
2
-
A[
n
6
-B + +E
E
n -
A[
n
6
+A +2B -C -2
E
= n
2
+En +F = u
Orintermsof A, B, C, onJ E andtakingthelargestroot

n =
1
2E
_A[
n
6
-B + +E +_jA[
n
6
-B + +E[
2
+4 AE[
n
6
+A +2B -C -2_
5.METASTUDYANDDISCUSSION
Thisresearchconstructedtwonecessaryconditionsforsampleadequacy:
1. Section3determinedthesamplesizeneededcompensatefortheratioofnumberofindicatorvariablestolatentvariables(summarizedfromMonteCarlosimulationsthathave
appearedintheliterature);and
10
2. Section4determinedthesamplesizerequiredtoassuretheexistenceornonexistenceofaminimumeffect(correlation)oneachpossiblepairoflatentvariablesintheSEM
(determinedanalytically).
Ofcourse,neitheroftheseconditionsissufficienttoassuresampleadequacyforaparticularchoiceof (o, [) becausetherearesomanyotherfactorsthatcanaffectestimationandsample
sizemulticolinearity,appropriatenessofdatasets,andsoforth. Additionally,theinformationcontainedinthesampleandindicatorvariablesmustbeadequatetocompensatefor
variationsinparticularSEMestimationmethodologies. Forexample,partialleastsquare(PLS)approachesgenerateparameterestimatesthatlackconsistency. Dhrymes(1970);
Schneewei(1990,1991,1993);Thomas,etal.(2005);andFhr(1989)alldemonstratethattheIV/2SLStechniquesconvergetothesameestimators,butaremorerobust. Joreskog
(1967,1970;JreskogandSrbom1996)suggeststhatdeparturesfromnormaldistributionfortheindicatorswilldemandlargersamples,andthatnonnormalindicatorsrequireone,twoor
threemagnitudeslargersamples,dependingondistribution.
Fromapracticalviewpoint,samplesizequestionscantakethreeforms:
1. Apriori:willaskwhatsamplesizewillbesufficientgiventheresearcherspriorbeliefsonwhattheminimumeffectisthatthetestswillneedtodetect
2. Exposteriori:willaskwhatsamplesizeshouldhavebeentakeninordertodetecttheminimumeffectthattheresearcheractuallydetectedinanexisting(eithersufficientor
insufficient)test. Iftheexposteriorimeasuredeffectissmallerthantheresearcherspriorbeliefsabouttheminimumeffect(in1.)thensamplesizeneedstobeincreased
commensurately.
3. Sequentialtestoptimalstopping:iscouchedinasequentialtestoptimalstoppingcontext,wherethesamplesizeisincrementeduntilitisconsideredsufficienttostoptesting.
Inthissection,wereportonanexposteriori metastudythatappliesthealgorithmsdevelopedinthispapertoaspecificbodyofSEMresearchstudiespublishedinfivecorejournalsin
MISandeCommerce(ISR,MISQ,ManagementScience,DecisionSciencesandJMIS)between1989(thedateoftheseminalstudybyDavis,etal.1989)and2007. Weassumedthatthelink
withthesmallesteffectactuallyobservedinthesestudiesdetermines o aconservativeassumption,becausetheresearchwouldhavebeenverylikelytoholdabiasinactuallywanting
todetectevensmallereffectsthanthoseactuallyobserved,butthemodelanddatawouldhaveonlyhadsufficientresolutiontocapturetheminimumeffectobserved.
Additionally,manyofthestudieslistedinAppendixAanalyzedLikertscaledatathatisnotdistributednormally;nevertheless,theassumptionofnormalcyofdataisacommononeinSEM
studies,evenwherethedataisclearlynotnormal,forexamplewheresurveydatareturnsdiscreteLikertscaledatacensoredat0,andderivesfromamassfunctionwhichislikelytobe
skewed. Becauseestimatorbehaviorisbestunderstoodfornormaldata,wecanassumethat,inthesenonnormaldatastudiesourlowerboundonsamplesizeneedsatanonnormalcy
riskpremiumforsampleadequacydeparturesfromanormalweightmatrixinLISRELsuggestthatthismaybetwotothreeordersofmagnitudelargerthansamplesizerequiredfornormal
data.
Samplesizesactuallyusedindrawingconclusionsinthestudywerecomparedwithourcomputedlowerbound,andadifferencetakenasapercentage(thefarrighthandcolumnof
AppendixA). Histogramsofsampleadequacy
ActuuI SS-Rcqucd SS
ActuuI SS
showasignificantsystematicbiastowardstoosmallasamplesizeinthepaperssurveyed. Inthemetastudy,the
averagesamplewas770%toosmall;withtheremovalofthreeoutliers,thisdroppedto400%toosmall(figures6and7). Actualsamplesizesinthese74researcharticleswereonaverage
only50%oftheminimumneededtodrawtheconclusionsthestudiesclaimed;mediansamplesizewas38%oftheminimumrequired,reflectingasubstantialnegativeskewinginthe
undersampling,andstandarddeviationwas29%. Overall,80%oftheresearcharticlesinthismetastudydrewconclusionsfromsamplesthatweresmallerthanthelowerboundson
samplesizecomputedhere. Becauseeachadditionalobservationincreasesthecostofthestudyintime,effortandmonetaryterms,aninclinationtoeconomizeondatacollectionis
understandable. TheconclusionthatseemsmostappropriatefromourmetastudyisthatMISresearchershavebeengiveninadequateguidance,andhavenotbeenwellservedby
existingsamplesizeheuristics. Lackingthesamplesizeinformationtheyneed,researchersmaybeinclinedtoskimponsamplecollection. Unfortunately,whensamplesaretoolarge,
thestudiesweremorecostlythantheyneededtobeindrawingparticularconclusions;whensamplesaretoosmall,thecredibilityoftheirconclusionsisweakened.
INSERT FIGURE6:PERCENTERRORINSAMPLESIZEFOR74STUDIESINTHEENTIREMETASTUDY(MEAN=770,STANDARDDEVIATION=25,SKEWNESS=6.5,KURTOSIS=47)
INSERT FIGURE7:PERCENTERRORINSAMPLESIZEFOR74STUDIESINTHEMETASTUDYREMOVINGOUTLIERS < -25% (MEAN=400,STANDARDDEVIATION=642,SKEWNESS=2.5,KURTOSIS=7.6)

Weshouldnotbesurprised,givenourreviewofthepriorliterature,thatexistingsamplesizeheuristicsaremisleadingresearchersinthisarea. Numerousstudieshaveconcludedthat
linearheuristicsliketheruleof10arepoorguidestofitandexplanatorypowerofthemodeloradequacyofthesamplesize. (BrowneandCudeck1989,1993,Geweke,andSingleton
1980;GebringandAnderson1985);VeicerandFava1987,1989,1994;MarshandBailey1991;Boomsma1982;Ding,etal.1995)
Asnotedearlier,neitheroftheconditionsdevelopedhereissufficienttoassuresampleadequacyforaparticularchoiceof (o, [) becausetherearesomanyfactorsthatcanaffect
estimationandsamplesizeinsomethingascomplexasastructuralequationmodel. Consequently,thenecessarysamplesizeforaccurateestimationwillinmostcasesexceedthelower
boundcomputedhere. Butreviewofactualsamplesizessummarizedinfigures6and7suggeststhat,atitsmostunambitious,thislowerboundwillinsureagainsttheveryerraticunder
sizingofsamplesthatseemscommoninSEManalysis.
11
FutureresearchonsamplesizechoiceshouldbeconductedonlinesspecifictothevariousalgorithmsusedtoestimateSEMPLSsprincipalcomponentsanalysisalgorithms;LISRELand
AMOSsgradientsearchalgorithms;andsystemsofequationsregressionalgorithms. Indeed,seminalresearchineachoftheseareasalludedtothisdecadesago. Wold(1980,1981)went
evenfurtherinadvisingthatPLSismoresuitableforexploratorymodelspecificationsearchesratherthanhypothesestesting,andintroducedtheconceptofplausiblecausalityforthatvery
reason. ThusinPLS,thesamplesizequestionisprobablybothlessrelevantandlesscritical,becausehypothesistestingisbetterlefttoLISRELandsystemsofequationapproaches.
Theprobleminbuildingthestructuralmodelcompletelyontheory,withoutreferencetothedataisthatthelatentconstructschosenbytheresearchermaybesubstantiallydifferentthan
thosethatwoulddropoutofanexploratoryfactoranalysis. ResearchershavedevelopedatestforthiscalledHarmonsonefactortest(PodsakoffandOrgan1986)commonlyusedto
checkforcommonfactorbiasinSEM(andoftenconductedexposteriori). Commonfactorbiasappearsbecauseinherentclusteringresultsfromaparticulardistancemeasureusedto
positiondatapointsinndimensionalspaceforexample,principalcomponentsanalysisdesignsadistancemeasuretominimizethevariancenotexplainedbythemaincomponents
(clusters). ButSEMwillimposepriorbeliefsonthedata,intheformofthestructureoflatentvariables. Thusdataareassumedtoclusteraroundthelatentconstructsthefactor
loadingsdeterminehowthisclusteringoccurs. SEMmodelsareoftenconstructedwithoutreferencetoclusteringintheunderlyingdatagivenaparticulardistancemeasure;itisentirely
theorydriven,thoughthisisnotinitselfabadthing. Commonfactorbiasreflectsthisdivergenceinthemodelandthedata,andifitistooextreme,mayindicatethatthedatais
incomplete,orthatthemodelismisspecified.
Commonfactorbiascanbeavoidedapriorithroughapretestoftheclusteringofindicatordata. Commonfactorbiasoccursbecauseproceduresthatshouldbeastandardpartofmodel
specificationareinpracticeleftuntilafterthedatacollectionandconfirmatoryanalysis.JreskogdevelopedPRELISforthesesortsofpretestsandmodelrespecifications. Ifthisclustering
showsthattheindicatorsareprovidinginformationonfewervariablesthantheresearcherslatentSEMcontains,thisisanindicationthatmoreindicatorsneedtobecollectedthatwill
provide(1)additionalinformationaboutthelatentconstructsthatdontshowupintheclusteranalysis;and(2)additionalinformationtosplitoneexploratoryfactorintothetwoormore
latentconstructstheresearchneedstocompletethehypothesizedmodel. Inexploratoryfactoranalysis,thetwoteststhataremostusefulforthisaretheKaiser(1960)criterionthat
retainsfactorswitheigenvaluesgreaterthanone(unlessafactorextractsatleastasmuchinformationastheequivalentofoneoriginalvariable,wedropit)andthescreetestproposedby
Cattell(1966)thatcomparesthedifferencebetweentwosuccessiveeigenvaluesandstopstakingfactorswhenthisdropsbelowacertainlevel. Ineithercase,thesuggestedfactorsarenot
necessarilythelatentfactorsthattheresearcherstheorywouldsuggestrathertheyaretheinformationthatisactuallyprovidedinthedata,thisinformationbeingthemainjustification
forthecostofdatacollection. Soinpractice,eithertestwouldsetamaximumnumberoflatentfactorsintheSEMifthatSEMistobeexploredwithonesownparticulardataset.
WhenSEMarebuiltaroundvalidrealworldconstructs(eveniftheseareunobservable)thealgorithmsproposedinthispaperimposeonlyweakadditionalassumptionsontheindicatorsand
latentvariablesinordertocomputesamplesizesadequateforestimation.OurlimitedapplicationtoawindowofISandecommercepublicationshasshownthatconcernsarewarranted
concerningexistingSEMsamplesizecalculationsandweneedtoremainsuspiciousofconclusionsreachedinstudiesbasedoninadequatesamplesizes. Furthermore,alargenumberof
studiesinoursampledevisedtheirtestswithoutfirstcommittingtominimumeffectsizethattheyweretryingtodetect,orindicatedinportionofnonresponseinsurveys. Itisclearthat
journalrefereesneedtobeginaskingforsurveyresponse,minimumeffectsize o andajustificationofthesamplesize. Byincorporatingthesesuggestions,itisarguedthattheresearch
communitywillenhancethecredibilityandapplicabilityoftheirresearch,withacommensurateimprovedimpactandinfluenceinbothindustryandacademe.
Note: Iwanttothankthereviewersandeditorswhoperseveredthroughseveralrevisionsofthispaper,andhelpednurtureittocompletion. Anyremainingerrorsarefullymyown

responsibility.
Noteonsoftware: YoumaydownloadatElseviersECRAsiteasoftwarepackagethatcomputesthelowerboundsdevelopedinthispaper. ThissoftwareiswritteninWindowsC#Forms
torunonWindowsplatforms;inadditiontoanumberofthepackagesintheRlanguage,itwasusedtocalculatetheresultsinthispaper.
APPENDIXA:SAMPLEADEQUACYINASETOFECOMMERCEANDMISSEMSTUDIES
INSERTAPPENDIXA*******************************************
REFERENCES
Anderson,T.Originsofthelimitedinformationmaximumlikelihoodandtwostageleastsquaresestimators.JournalofEconometrics127,2005,116.
Anderson,T.andRubin,H..Estimatoroftheparametersofasingleequationinacompletesystemofstochasticequations.AnnalsofMathematicalStatistics20,1949,4663.
12
Anderson,T.andRubin,H. Theasymptoticpropertiesofestimatesoftheparametersofasingleequationinacompletesystemofstochasticequations.AnnalsofMathematicalStatistics
21,1950,57082.
BalakrishnanN.andC.R.Rao OrderStatistics:Applications,ser.Handbookofstatistics;v.17.NewYork:Elsevier,1998.
Barclay,D.W.,Higgins,C.,&Thompson,R..Thepartialleastsquares(PLS)approachtocausalmodeling:Personalcomputeradaptationanduseasanillustration.TechnologyStudies,2(2),
1995,285309.
Basmann,R. Ageneralizedclassicalmethodoflinearestimationofcoefficientsinastructuralequation.Econometrica25,19577783.
Bentler,P.M. EQS,StructuralEquations,ProgramManual,ProgramVersion3.0,LosAngeles:BMDPStatisticalSoftware,Inc.,1989, p.6
Bollen,K.A. Structuralequationswithlatentvariables.NewYork:Wiley,1989,p.268
Boomsma,A RobustnessofLISRELagainstsmallsamplesizesinfactoranalysismodels,inKGJoreskogandHWold(eds)Systemsunderindirectobservations,Causality,structure,prediction
(part1)1982,pp149173,Amsterdam:NorthHolland.
BrandA,BradleyMT,BestLA,StoicaG Accuracyofeffectsizeestimatesfrompublishedpsychologicalresearch".PerceptualandMotorSkills106(2)2008.645649
Browne,M.W.,and Cudeck,R. Alternativewaysofassessingmodelfit.InK.A.Bollen&J.S.Long(Eds.),Testingstructuralequationmodels,1993,pp.13616,NewburyPark,CA:Sage.
Browne,M.W.,andCudeck,R. Singlesamplecrossvalidationindicesforcovariancestructures.MultivariateBehavioralResearch,24,1989,445455.
CattellRB Handbookofmultivariateexperimentalpsychology1966RandMcNallyChicago
Chin,W.W. Thepartialleastsquaresapproachtostructuralequationmodeling.InG.A.Marcoulides(Ed.),ModernMethodsforbusinessresearch(pp.295336).Mahwah,1998,New
Jersey:LawrenceErlbaumAssociates.
Chin,W.W.,andNewsted,P.R.StructuralEquationModelinganalysiswithSmallSamplesUsingPartialLeastSquares.InRickHoyle(Ed.),StatisticalStrategiesforSmallSampleResearch,
SagePublications,1999,pp.307341
Cochran,WG SamplingTechniques,3rdEdition1977NewYork:Wiley
Cohen,J StatisticalPowerAnalysisfortheBehavioralSciences(seconded.)1988,LawrenceErlbaumAssociates
Cohen,J Apowerprimer, PsychologicalBulletin112,1992,155159
Davis,F.D.,RichardP.Bagozzi,PaulR.Warshaw Useracceptanceofcomputertechnology:acomparisonoftwotheoreticalmodels,ManagementScienceVolume35, Issue8,1989, 982
1003
DhrymesPJ.,R.Berner,D.CumminsAComparisonofSomeLimitedInformationEstimatorsforDynamicSimultaneousEquationsModelswithAutocorrelatedErrorsEconometrica,Vol.42,
No.2,1974,pp.311332
DhrymesP.DistributedLags:problemsofestimationandformulation,SanFrancisco:Holden Day,1971
Dhrymes,PJ. EconometricsStatisticalFoundationsandApplications,NewYorkEvanstonandLondon(Harper&Row),1970,p.53
Ding,L.,Belicer,W.F.andHarlow,LL Theeffectsofestimationmethods,numberofindicatorsperfactorandimpropersolutionsonstructuralequationmodelingfitindices,Structural
EquationModeling,2,1995,119144
Farebrother,R. FittingLinearRelationships:AHistoryoftheCalculusofObservations17501900.1999,NewYork:Springer.
Fhr,K. ComparisonofLISRELandPLSEstimationMethodsinLatentVariableModels.IntroducingLatentVariablesintoEconometricModels,ManuscriptSFB303,UniversityofBonn,
Bonn,1989
Fisher,R.A.Onthe`probableerror'ofacoefficientofcorrelationdeducedfromasmallsample, Metron1:1921, 332
Fisher,R.A. StatisticalMethods,ExperimentalDesign,andScientificInference.NewYork:OxfordUniv.Press,1990.
Fornell,ClaesandDavidF.Larker EvaluatingStructuralEquationModelswithUnobservableVariablesandMeasurementError,JournalofMarketingResearch181981,3950.
Freedman,D Asothersseeus:Acasestudyinpathanalysis(withdiscussion) JournalofEducationalStatistics,12,1987,pages101223.
Gerbing,D.W.,&Anderson,J.C. Theeffectsofsamplingerrorandmodelcharacteristicsonparameterestimationformaximumlikelihoodconfirmatoryfactoranalysis.Multivariate
BehavioralResearch,20,1985, 255271.
Gibbons,J.D.andS.Chakraborti,NonparametricStatisticalInference,3rded.NewYork:MarcelDekker,1992.
Goodhue,Dale WilliamLewis,RonaldThompson,StatisticalPowerinAnalyzingInteractionEffects:QuestioningtheAdvantageofPLSwithProductIndicators(ResearchNote),Information
SystemsResearchVol.18,No.2,2007,pp.211227
Goodhue,D.WilliamLewis,RonThompson,"PLS,SmallSampleSize,andStatisticalPowerinMISResearch,"HICSS,vol.8,pp.202b,Proceedingsofthe39thAnnualHawaiiInternational
ConferenceonSystemSciences(HICSS'06)2006
Haavelmo,T. TheStatisticalImplicationsofaSystemofSimultaneousEquationsEconometrica11,1943,112.
Hea,Q.and H.N.NagarajabCorrelationEstimationUsingConcomitantsofOrderStatisticsfromBivariateNormalSamples,CommunicationsinStatisticsTheoryandMethods,Volume38,
Issue12,January2009,pages20032015
Hoeffding,W. Aclassofstatisticswithasymptoticallynormaldistribution.Ann.Mathemat.Statist.19,1948,293325.
Jreskog,K.G. Somecontributionstomaximumlikelihoodfactoranalysis.Psychometrika,32(4),1967, 443482.
Jreskog,K.G.andSrbom,D.,LISREL8User'sReferenceGuide,Chicago:ScientificSoftwareInternational,1996.
Joreskog,K.G. Ageneralmethodforanalysisofcovariancestructures,Biometrika,57,1970,239251.
Kahai,S.S.andCooper,R.B. ExploringtheCoreConceptsofMediaRichnessTheory:TheImpactofCueMultiplicityandFeedbackImmediacyonDecisionQuality,JournalofManagement
InformationSystems,20,1,2003263299
13
Kendall,MandJ.D.Gibbons RankCorrelationMethods,5thed.NewYork:OxfordUniv.Press,1990.
Kish,L SurveySampling,1995,NewYork:Wiley
Koopmans,T.andHood,W.Theestimationofsimultaneouslineareconomicrelationships.InStudiesinEconometricMethod,ed.W.HoodandT.Koopmans.CowlesFoundationMonograph
14.1953NewHaven:YaleUniversityPress.
Lohr,SL.Sampling:DesignandAnalysis.1999Duxbury
MariD.D.andS.Kotz, CorrelationandDependence.London,U.K.:ImperialCollegePress,2001.
Marsh,H.W.,Balla,J.R.,&McDonald,R.P.Goodnessoffitindexesinconfirmatoryfactoranalysis:Theeffectofsamplesize.PsychologicalBulletin,103,1988 391410.
Marsh,H.W.,Balla,J.R.,&Hau,K.T. Anevaluationofincrementalfitindices:Aclarificationofmathematicalandempiricalproperties.InG.A.Marcoulides&R.E.Schumacker(Eds.),
Advancedstructuralequationmodeling:Issuesandtechniques(pp.315353).1996,Mahwah,NJ:LawrenceErlbaumAssociates,Inc.
Marsh,H.W.,Hau,K.T.,Balla,J.R.,&Grayson,D..Ismoreevertoomuch?Thenumberofindicatorsperfactorinconfirmatoryfactoranalysis.MultivariateBehavioralResearch,33,1998,
181220.
Marsh,H.W.andMBailey Confirmatoryfactoranalysesofmultitraitmultimethoddata:AcomparisonofalternativemodelsAppliedPsychologicalMeasurement,Vol.15,No.1,4770
(1991)
Nakagawa,S.andCuthill,I.C. Effectsize,confidenceintervalandstatisticalsignificance:apracticalguideforbiologists BiologicalReviewsCambridgePhilosophicalSociety82,2007,591
605
Nunnally,J.C. PsychometricTheory,NewYork:McGrawHill,1967, p.355
Podsakoff,P.M.andD.W.OrganSelfreportsinorganizationresearch:Problemsandprospects,JournalofManagement,12,1986, 531544
Sargan,J. Estimationofeconomicrelationshipsusinginstrumentalvariables.Econometrica67,1958,55786.
Schechtman,E.,Yitzhaki,S. AmeasureofassociationbasedonGinismeandifference.Commun.Statist.Theor.Meth.16,1987,207231.
Schneewei,H. ModelswithLatentVariables:LISRELversusPLS,in:ContemporaryMathematicsVol.112(1990),p.3340
Schneewei,H. ModelswithLatentVariables:LISRELversusPLS,in:StatisticaNeerlandica45(1991),p.145157
Schneewei,H. ConsistencyatLargeinModelswithLatentVariables,in:K.Hagen,D.J.Barthdomew,M.Deistler,StatisticalModellingandLatentVariables,Elsevier(1993),p.299320
ShevlyakovG.L. andN.O.Vilchevski, RobustnessinDataAnalysis:CriteriaandMethods,ser.Modernprobabilityandstatistics.2002, Utrecht,TheNetherlands:,VSP,2002.
SnedecorandCochran StatisticalMethods,8thed.1989Ames:IowaU.Press
Tanaka,J.S.Howbigisbigenough?:Samplesizeandgoodnessoffitinstructuralequationmodelswithlatentvariables.ChildDevelopment,58,1987,134146.
Tanaka,J.S.Multifacetedconceptionsoffitinstructuralequationmodels.InK.A.Bollen&J.S.Long(Eds.),Testingstructuralequationmodels(pp.1039),1993,NewburyPark,CA:Sage.
Theil,H. Repeatedleastsquaresappliedtoacompleteequationsystems. 1953a,TheHague:CentralPlanningBureau.
Theil,H. Estimationandsimultaneouscorrelationincompleteequationsystems.1953b,TheHague:CentralPlanningBureau.
Theil,H. EconomicForecastsandPolicy,2ndedn.1961,Amsterdam:NorthHolland.
Thomas,D.R.,Lu,I.R.R.&Cedzynski,M.Partialleastsquares:Acriticalreviewandapotentialalternative. ProceedingsoftheAnnualConferenceofAdministrativeSciencesAssociationof
Canada,ManagementScienceDivision,Toronto,2005
Velicer,W.F.,andFava,J.L. Effectsofvariableandsubjectsamplingonfactorpatternrecovery.PsychologicalMethods,3,1998,231251.
Westland,J.C.andW.K.SeeTo TheShortrunPricePerformanceDynamicsofMicrocomputerTechnologies,ResearchPolicy,Volume36,Issue5,2007,Pages591604
Wilkinson,Leland;APATaskForceonStatisticalInferenceStatisticalmethodsinpsychologyjournals:Guidelinesandexplanations".AmericanPsychologist54,1999,594604
Wold,H. "TheFixPointApproachtoInterdependentSystems:ReviewandCurrentOutlook,"inH.Wold(Ed.),TheFixPointApproachtoInterdependentSystems,1981,Amsterdam:North
Holland,135.
Wold,Herman ResponsetoD.A.Freedman,JournalofEducationalStatistics,Vol.12,No.2,1987,pp.202205
Wright,S. Correlationandcausation,JournalofAgriculturalResearch,20,1921557585
Xu,W,Y.S.Hung,,M.Niranjan, andM.Shen AsymptoticMeanandVarianceofGiniCorrelationforBivariateNormalSamples,IEEETrans.OnSignalProcessing,V.58(2)2010

14
APPENDIXA:SAMPLEADEQUACYINASETOFECOMMERCEANDMISSEMSTUDIES
Studies Latent
Variables
Indicator
Variables
Sample
Points
MinimumEffect
Observed
idkcorrected
u
Samplebound
section4
Samplebound
section3
SampleSize
lowerbound
Studysample
()orabove
ChoudhuryandKarahanna,MISQ,V.32(1) 6 35 499 0.12 0.0034 1177 176 1177
KanawattanachaiandYoo,MISQ,V.31(4) 3 11 146 0.21 0.0170 264 122 264
Damien,etal.,MISQ,V.31(3) 6 43 31 0.08 0.0034 2694 443 2694
WebsterandAhuja,MISQ,V.30(3) 6 38 207 0.17 0.0034 568 256 568
Tanriverdi,MISQ,V.30(1) 10 16 356 0.11 0.0011 1661 508 1661
Awadand Krishnan,MISQ,V.30(1) 8 24 532 0.28 0.0018 206 200 206
VanderHeijdenndHans,MISQ,V.28(4) 4 15 1144 0.15 0.0085 628 116 628
Barua,etal.,MISQ,V.28(4) 12 45 1125 0.109 0.0008 1783 116 1783
Straub,etal.,MISQ,V.27(1) 8 34 213 0.1 0.0018 1886 91 1886
Susaria,etal.,MISQ,V.27(1) 8 32 256 0.067 0.0018 4253 100 4253
Tarafdar,etal,JMIS,V.24(1) 10 21 256 0.08 0.0011 3181 376 3181
Fuller,etal,JMIS,V.23(3) 5 22 318 0.3 0.0051 148 88 148
Kearns,etal,JMIS,V.23(3) 9 44 269 0.11 0.0014 1610 95 1610
Wang,etal,JMIS,V.23(2) 5 22 149 0.12 0.0051 1098 88 1098
Lopes,etal,JMIS,V.23(2) 4 13 392 0.24 0.0085 226 166 226
Hess,etal,JMIS,V.22(3) 8 36 233 0.28 0.0018 206 88 206
Johnson,etal,JMIS,V.22(2) 7 16 202 0.19 0.0024 472 333 472
Bhatt, etal,JMIS,V.22(2) 5 26 202 0.22 0.0051 303 112 303
Changetal,JMIS,V.22(1) 3 12 476 0.65 0.0170 9 100 100
Wallace,etal.Dec.Sci.,V.35(2) 6 27 507 0.13 0.0034 997 88 997
Rabinovich,etal., Dec.Sci.,V.34(1) 2 14 840 0.126 0.0500 587 400 587
AbdinnourHelm,etal.Dec.Sci.,V.36(2) 5 12 176 0.1 0.0051 1596 308 1596
EscrigTena,etal.,Dec.Sci.,V.36(2) 9 37 231 0.1 0.0014 1957 95 1957
PullmanandGross,Dec.Sci.,V.35(3) 3 15 400 0.52 0.0170 24 100 100
Tu,etal.,Dec.Sci.,V.35(2) 3 24 303 0.22 0.0170 238 700 700
Droge,etal.,Dec.Sci.,V.34(3) 4 17 437 0.16 0.0085 548 91 548
JanzandPrasarnphanich,Dec.Sci.,V.34(2) 5 15 231 0.41 0.0051 65 200 200
HongandTam,ISR,V.17(2) 5 27 1328 0.1 0.0051 1596 128 1596
Dinevand Hart,ISR,V.17(1) 5 18 369 0.15 0.0051 690 128 690
Jarvenpaa,etal.,ISR,V.15(3) 6 30 136 0.28 0.0034 187 100 187
PavlouandGefen,ISR,V.15(1) 10 33 274 0.1 0.0011 2020 160 2020
Bassellier,etal.,ISR,V.14(4) 7 38 404 0.2 0.0024 422 131 422
Choo,etal.,MS,V.53(3) 5 14 951 0.38 0.0051 81 232 232
15
Sabherwal,etal.,MS,V.52(12) 10 48 121 0.11 0.0011 1661 92 1661

BagozziandDholakia,MS,V.52(7) 14 24 402 0.17 0.0006 736 476 736
deJong,etal.,MS,V.51(11) 5 29 60 0.037 0.0051 11884 172 11884
KimandMalhotra,MS,V.51(5) 4 8 189 0.14 0.0085 725 400 725
Vickery,etal.,MS,V.50(8) 4 14 113 0.212 0.0085 299 138 299
Balasubramanian, etal,MS,V.49(7) 5 22 428 0.12 0.0051 1098 88 1098
Au,etal.,MISQ,V.32(1) 6 20 922 0.17 0.0034 568 156 568
Hsieh,etal.,MISQ,V.32(1) 12 31 451 0.21 0.0008 447 271 447
NadkarniandGupta,MISQ,V.31(3) 6 30 452 0.1 0.0034 1711 100 1711
Liang,etal.,MISQ,V.31(1) 7 21 77 0.277 0.0024 202 200 202
KomiakandBenbasat,MISQ,V.30(4) 7 16 100 0.12 0.0024 1242 333 1242
BhattacherjeeandSanford,MISQ,V.30(4) 7 24 81 0.26 0.0024 235 145 235
Karahanna,etal.,MISQ,V.30(4) 8 29 278 0.1 0.0018 1886 126 1886
SriteandKarahanna,MISQ,V.30(3) 8 27 181 0.29 0.0018 190 151 190
StewartandGosain,MISQ,V.30(2) 8 63 51 0.13 0.0018 1099 657 1099
MooresandChang,MISQ,V.30(1) 5 17 243 0.47 0.0051 43 148 148
PavlouandFygenson,MISQ,V.30(1) 6 18 312 0.12 0.0034 1177 200 1177
AhujaandThatcher,MISQ,V.29(3) 4 12 263 0.17 0.0085 482 200 482
Ko,etal.,MISQ,V.29(1) 5 14 96 0.134 0.0051 874 232 874
Bhattacherjee and Premkumar, MISQ,
V.28(2)
7 27 77 0.1 0.0024 1806 108 1806
Gemino,JMIS,V.24(3) 7 51 223 0.154 0.0024 739 476 739
Son,etal,JMIS,V.24(1) 11 38 625 0.16 0.0009 783 142 783
Klein,etal.,Dec.Sci.,V.38(4) 5 30 91 0.2 0.0051 373 200 373
WangandWei,Dec.Sci.,V.38(4) 4 46 150 0.22 0.0085 275 2538 2538
Keil,etal.,Dec.Sci.,V.38(3) 3 14 178 0.15 0.0170 543 89 543
EttlieandPavlou,Dec.Sci.,V.37(2) 4 31 72 0.15 0.0085 628 616 628
Looney,etal.,Dec.Sci.,V.37(2) 5 31 414 0.153 0.0051 662 232 662
BrownandChin,Dec.Sci.,V.35(3) 7 20 240 0.14 0.0024 902 222 902
TeiglandandWasko,Dec.Sci.,V.34(2) 9 19 83 0.2 0.0014 458 373 458
Saraf,etal.,ISR,V.18(3) 9 45 63 0.254 0.0014 268 100 268
PavlouandDimoka,ISR,V.17(4) 4 10 1665 0.05 0.0085 5904 288 5904
NicolaouandMcKnight,ISR,V.17(4) 6 32 69 0.247 0.0034 250 122 250
PavlouandElSawy,ISR,V.17(3) 4 10 507 0.14 0.0085 725 288 725
PavlouandGefen,ISR,V.16(4) 10 30 1031 0.14 0.0011 1009 200 1009
WixomandTodd,ISR,V.16(1) 7 17 465 0.1 0.0024 1806 302 1806
16
ZhuandKraemer,ISR,V.16(1) 11 34 624 0.04 0.0009 13213 187 13213

Malhotra,etal.,ISR,V.15(4) 6 18 449 0.12 0.0034 1177 200 1177
Karimi,etal.,ISR,V.15(2) 5 20 286 0.04 0.0051 10163 100 10163
Mun andDavis,ISR,V.14(2) 7 54 95 0.18 0.0024 530 604 604
VenkateshandAgarwal,MS,V.52(3) 5 21 757 0.1 0.0051 1596 92 1596
Ahuja,etal.,MS,V.49(1) 4 5 1781 0.17 0.0085 482 616 616
Exploratory Factor
Analysis
Lawley (1940)
Confirmatory Factor
Analysis
J reskog (1969)
Factor Analysis
(PCA) through
iteratedOLS
Wold (1966)
Path Analysis
Wright (1921)
PLS-SEM through
iteratedOLS
Wold (1978)
LISREL-SEM
J reskog (1969)
Systems of Linear
Equations Estimation
Koopmans (1950)
Instrumental Variables
and2SLS
Theil (1953)
3SLS and
full-information
regressionSEM
Zellner (1962)
Model Specification Searches
FIGURE2:DEVELOPMENTOFSTRUCTURALEQUATIONMODELESTIMATION
FIGURE2:ANEXAMPLEOFASTRUCTURALEQUATIONMODELWITHSIXLATENTVARIABLESANDFIVECORRELATIONS
17
FIGURE3:THESEMEXAMPLEINFIGURE2WITHALLPOSSIBLEPAIREDLINKSSHOWN
18
FIGURE4:SIGNIFICANCEANDPOWERFORTHEMINIMUMEFFECTTHATNEEDSTOBEDETECTED

N
e
g
a
t
i
v
e
C
o
r
r
e
l
a
t
i
o
n

P
o
s
i
t
i
v
e
C
o
r
r
e
l
a
t
i
o
n
-2 -1 0 1 2 3 4
-1
0
1
2
3
4
5
x[,1]
x[,2]
-2 -1 0 1 2 3 4
-1
0
1
2
3
4
x[,1]
x[,2]
-2 -1 0 1 2 3 4
-1
0
1
2
3
4
5
x[,1]
x[,2]
-1 0 1 2 3 4
-1
0
1
2
3
4
5
x[,1]
x[,2]
-2 -1 0 1 2 3 4
-1
0
1
2
3
4
5
x[,1]
x[,2]
-2 -1 0 1 2 3 4
-1
0
1
2
3
4
5
x[,1]
x[,2
]
-2 -1 0 1 2 3 4
0
1
2
3
4
5
x[,1]
x[,2
]
-2 -1 0 1 2 3 4
-1
0
1
2
3
4
5
6
x[,1]
x[,2
]
-2 -1 0 1 2 3 4
-1
0
1
2
3
4
5
x[,1]
x[,2]
-2 -1 0 1 2 3 4
-1
0
1
2
3
4
5
x[,1]
x[,2]
19
Figure5:BivariateNormalScatterplotsfor _
p
1
p
2
_ =_
u
u
_ and = _
1 p
p 1
_ with n = Suu
f
r
e
q
u
e
n
c
y
-20000 -15000 -10000 -5000 0
0
1
0
2
0
3
0
4
0
5
0
FIGURE6:PERCENTERRORINSAMPLESIZEFOR74STUDIESINTHEENTIREMETASTUDY(MEAN=770,STANDARDDEVIATION=25,SKEWNESS=6.5,KURTOSIS=
47)
20
FIGURE7:PERCENTERRORINSAMPLESIZEFOR74STUDIESINTHEMETASTUDYREMOVINGOUTLIERS < -25% (MEAN=400,STANDARDDEVIATION=642,

SKEWNESS=2.5,KURTOSIS=7.6)
f
r
e
q
u
e
n
c
y
-2500 -2000 -1500 -1000 -500 0 500
0
1
0
2
0
3
0
4
0

21
p Numberofparameters(indicators)intheSEM
k NumberoflatentvariablesintheSEM
n Computedsamplesizelowerbound
|X
] and
|X ]

BivariateNormalrandomlatentvariables(andtheirrealization)intheSEM
X
1:n
X
2:n
1:n

:n

2
orderstatisticsofthe (X
) samplevalues;thefirstindexisrank,andthesecondissample
size
|:n]
concomitant of the i
th
order statistic;
|:n]
is the sample value associated with the X
:n
samplevalueinthesamplepairs (X
).
o Minimumeffectsizethatourcomputedsamplesizecandetect
p UnknowncorrelationforabivariateNormalrandomvector |X
]
p
u
EstimatorofGinicorrelation p
u
|p
u
; o
u
] MeanandstandarddeviationestimatorsforGinicorrelation
|o
-
; -[] 1 Significanceandpoweroftest
o TheidkcorrectedsignificancefordiscriminationsbetweenpossibleSEMlinkcombinationsat
aresolutionof o
|z
1-u
; z
1-[
] Rejection bound at significance o and nonrejection bound at power 1 -[ ; we substitute
thequantilefunction(inversecumulativeNormal)
-1
(x) for z
x
incalculations
TABLE2:NOTATIONUSEDINTHEPAPER

Lower Bounds on Sample Size in Structural Equation Modeling

Transféré par

Informations du document

Description originale:

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Lower Bounds on Sample Size in Structural Equation Modeling

Transféré par

Droits d'auteur :

Formats disponibles

1

) i = 1,2, , n bearandomsampleofindependentandidenticallydistributed(i.i.d.)datapairsofsize n fromthebivariatenormalpopulationof (X, ) populationwith

Orintermsof A, B, C, onJ E andtakingthelargestroot

INSERT FIGURE7:PERCENTERRORINSAMPLESIZEFOR74STUDIESINTHEMETASTUDYREMOVINGOUTLIERS < -25% (MEAN=400,STANDARDDEVIATION=642,SKEWNESS=2.5,KURTOSIS=7.6)

Note: Iwanttothankthereviewersandeditorswhoperseveredthroughseveralrevisionsofthispaper,andhelpednurtureittocompletion. Anyremainingerrorsarefullymyown

Sabherwal,etal.,MS,V.52(12) 10 48 121 0.11 0.0011 1661 92 1661

ZhuandKraemer,ISR,V.16(1) 11 34 624 0.04 0.0009 13213 187 13213

FIGURE7:PERCENTERRORINSAMPLESIZEFOR74STUDIESINTHEMETASTUDYREMOVINGOUTLIERS < -25% (MEAN=400,STANDARDDEVIATION=642,

Vous aimerez peut-être aussi