Vous êtes sur la page 1sur 3

Understanding and Adjusting for Complex Sample Designs in the Eurasian Context Presented by: Dr.

Jane Zavisca, University of Arizona Adjusting for sampling design in stata The most accurate way of handling complex sampling designs is to use the svy commands in stata. First you run the svyset command to tell stata about the sampling design. The basic syntax is: svyset psu [pweight=weight], strata(strata) The italicized terms are variables whose actualnames will vary depending on your dataset. pweight is the variable containing probability weights psu is the variable identifying "primary sampling units" (the initial stage of clustering) strata is the variable indicating the sampling strata Note that not all complex survey designs involve all of these elements (weights, clustering, and stratification). Also there are many additional options for which you need to check the manual. In particular, for multiple stage designs you can let stata know what each of the various stages are. You can also include finite population corrections if you have the relevant variable and if the sampling fractions are large enough to warrant such correction. Once you have established the survey design using stata, you do not have to let stata know about the design every time you run a command, as long as you use the "svy" prefix when doing your analysis. For example: svy: mean x svy: regress y x This will then automatically correct the estimates based on the parameters previously specified with the svyset command. If you save the dataset after doing svyset, it will save those settings with the dataset (you can also reset them at any time). If you find that svy doesn't work with a given command, see if you can at least add weights or clustering directly as suboptions for the command. When you have the choice, the svy commands are preferred because they are more accurate and being continuously updated and developed.

Some rules of thumb (based on Heeringa 2010) Stata software has by far the best-developed approach to complex samples. Once you identify the design variables for stata (weight, psu, and/or strata), the defaults will work well for most applications. However it can also accommodate even more complex designs and adjustments. Read survey documentation carefully to be sure you are using the correct design variables to correct for probability of selection (weights) and variance correction. Large surveys now routinely release a range of types of weights and sample variance correction factors. Which you use should depend on the purpose of your analysis (esp unit of analysis, implicit target population) and the approach survey designers take to identifying clustering (which can create confidentiality issues). You should always correct for sampling design when reporting univariate statistics. You should usually correct for sampling design when reporting other types of inferential statistics (at least if you are using Stata). Note there has been some debate on whether it is necessary or desirable to adjust for sampling design to model relationships between variables. An alternative is to include variables used to create weights as predictors in models. Winship 1994 explains under what conditions this works, and when may even be preferable to applying weights. However Statas default settings for complex samples today solve the main problem with weights he identifies (note SPSS does NOT you need the complex samples add on module). See Heeringa et al 2010. You can usually safely ignore the finite population correction (when sampling fraction is <.05). If fpc is ignorable, you can usually safely treat the primary sampling unit as the only clustering design variable in adjusting sampling design. E.g. individuals may be nested within families, within census tracts, within regional primary sampling units (cities/towns). But from perspective of sampling error adjustments, only the regional psu need be identified. It is best NOT to delete cases not of interest for analyses of subpopulations, as this may produce incorrect standard errors. Rather, identify the subpopulation; software such as stata will do the appropriate adjustments. Consult Heeringa ch 4 & stata complex surveys manual for details. Comparing models that either a) use weights or b) include variables used to create weights as predictors is a way of testing your core model major changes in point estimates indicate misspecification. RLMS website has a nice guide to doing and interpreting this: http://www.cpc.unc.edu/projects/rlms-hse/project/samprep/index.html

Resources http://www.ats.ucla.edu/stat/stata/faq/svy_howtochoose.htm.Websitewithgeneral guidelinesonhowtoimplementadjustmentsforvarioussampledesignsinStata andothersoftware. Heeringa,StevengG,etal.2010.AppliedSurveyDataAnalysis.ChapmanandHall. Accessibleandcomprehensiveexplanationofcomplexsampledesignsandhow tohandlethem.Orientedtowardtheuserofsecondarysurveys.Appendix reviewssoftware.Bookalsohasworkedexamples. Levy,PaulS.andStanleyLemeshow.SamplingofPopulations,Methodsand Applications,4thedition.Wiley,2008.Technicaltextonaboveissues.Oriented towardsampledesigners(butalsousefulforusers). Groves,RobertM.etal.2009.SurveyMethodology.2 edition.Hoboken:Wiley& Sons.Generaltextbookonthetotalsurveydesignapproach. Biemer,PaulandLyberg,Lars.2003.IntroductiontoSurveyQuality.Wiley&sons. Technicaltextontotalsurveydesign,withfocusonquantifyingvariousformsof error. Fowler,Floyd.2009.SurveyResearchMethods.4 edition.Sag.Goodgeneraltextesp onquestionnairedesign.
th nd

Vous aimerez peut-être aussi