Académique Documents
Professionnel Documents
Culture Documents
CleaningdatainStata|data.library.utoronto.ca
CleaningdatainStata
TableofContents
Someusefultipsbeforeyougetstarted
Creatinganumberofsmallersubsetsbasedonresearchcriteria
Droppingobservations
Droppingvariables
Transformingvariables
Dealingwithoutliers
Creatingnewvariables
Movingvariables
Labellingvariables
Renamingvariables
Afewlastwords
Cleaningdataisaratherbroadtermthatappliestothepreliminarymanipulationsonadatasetpriortoanalysis.Itwillveryoftenbethefirstassignmentofaresearch
assistantandisthetediouspartofanyresearchprojectthatmakesuswishweHADaresearchassistant.Stataisagoodtoolforcleaningandmanipulatingdata,
regardlessofthesoftwareyouintendtouseforanalysis.Yourfirstpassatadatasetmayinvolveanyorallofthefollowing:
Creatinganumberofsmallersubsetsbasedonresearchcriteria
Droppingobservations
Droppingvariables
Transformingvariables
Dealingwithoutliers
Creatingnewvariables
Movingvariables
Labelingvariables
Renamingvariables
Whetherthisisyourfirsttimecleaningdataoryouareaseasoneddatamonkey,youmightfindsomeusefultipsbyreadingmore.
Someusefultipsbeforeyougetstarted[1]
UsetheStatahelpfile.Statahasabuiltinfeaturethatallowsyoutoaccesstheusermanualaswellashelpfilesonanygivencommand.Simplytypehelpinthe
commandwindow,followedbythenameofthecommandyouneedhelpwithandpresstheEnterkey:
http://data.library.utoronto.ca/cleaningdatastata
1/16
6/9/2015
CleaningdatainStata|data.library.utoronto.ca
Writeadofile.Nevercleanadatasetbyblindlyenteringcommands(orworse,clickingbuttons).Youwanttowritethecommandsinadofile,andthenrunit.This
way,ifyoumakeamistake,youwillnothaveruinedyourentiredatasetandyouwillnotneedtostartagainfromscratch.Thisisageneraladvicethatappliestoany
workyoudoonStata.Workingfromdofilesletsotherpeopleseewhatyoudidifyoueverneedadvice,itmakesyourworkreproducibleanditallowsyoutocorrect
smallmistakessomewhatpainlessly.
Tostartadofile,clickontheiconthatlookslikeanotepadonthetopleftcornerofyourStataviewer[2].
Inthepreliminarystagesofyourwork,youmayfeelthatadofileismorehindrancethanitisuseful.Forexample,ifyouarenotsofamiliarwithacommand,you
mayprefertotryitfirst.Onesimplewaytodothatandstillhavedisciplineaboutwritingdofilesistowriteyourdofileinstages,writingonlyafewcommandsbefore
executingthem,correctingmistakesasyougo.Inordertoexecuteanumberofcommandsratherthanthewholedofile,simplyhighlighttheonesyouwantto
execute,andclickontheExecuteSelection(do)icononthetopofyourdofileeditor,atthefarright.
http://data.library.utoronto.ca/cleaningdatastata
2/16
6/9/2015
CleaningdatainStata|data.library.utoronto.ca
AsyoubecomemoreproficientwithprogramminginStata,youwontneedtotryoutcommandsanymore,andyoulldiscoverthejoyofwritingadofileandhavingit
runwithoutaglitch.Torunawholedofile,donothighlightanypartofitandclickontheExecuteSelection(do)icon.
Youmaywonderaboutthecommandsclear,setmoreoffandsetmem15000inthescreenshotexample.Thesethreecommandsareadministrativecommands
thatarequiteusefultohaveatthebeginningofadofile.Thefirst,clear,isusedtoclearanypreviousdatasetyoumayhavebeenworkingon.Thecommandset
moreofftellsStatanottopauseordisplaythemoremessage.Finally,thecommandsetmem15000increasesthememoryavailabletoStatafromyour
computerherewewillneeditasthesizeofthedatasetwedownloadedfrom<odesi>[3]islargerthanthe10mballocatedtodatabydefault.
Onelastcommentaboutdofiles:ifyoudoubleclickasaveddofile,itwillnotopenforediting,butratherStatawillrunthatdofile,whichcanbeabitannoyingTo
reopenadofilefromafolderwithoutexecutingthecommandsinit,rightclickonitandselecteditratherthanopen.
Alwayskeepalog.Again,thisisageneralruleofthumbonStata.Keepingalogmeansyoucangobackandlookatwhatyoudidwithouthavingtodoitagain.
StartingalogisjustamatterofaddingacommandatthetopofyourdofilethattellsStatatolog,aswellaswhereyouwantthelogtobesaved:
logusingwhateverpathyouwant:\pickanameforyourlog.smcl[4],replace[5]
Notehowlogsaresavedunderthesmclextension.
Donotforgettocloseyourlogbeforestartinganewone.Thelastcommandonyourdofile[6]willusuallybelogclose.
Saveasyougo.Computerscrash,powergoesout,stuffhappens.Saveyourdofileseveryfewminutesasyouwritethem.Savingadofileisdonethesamewayas
savinganytexteditordocument:eitherclickonthedisketteicon,orpressCTRL+S:
Youshouldalsosaveyourdatasetasyoumodifyit,butmakesuretokeeponeversionoftheoriginaldataset,incaseyouneedtostartover.Thecommandtosave
adatasetonStataissave,followedbythepathwhereyouwantthedatasettobesaved,andthe[optional]commandreplace.
http://data.library.utoronto.ca/cleaningdatastata
3/16
6/9/2015
CleaningdatainStata|data.library.utoronto.ca
NotehowtheextensionforStatadatais.dta,andalsonotehowthenewdatasethasadifferentnamefromtheoriginal[7].
Becomefamiliarwithyourdataset.Datasetscomewithcodebooks.Youshouldknowwhateachvariableis,howitscoded,howmissingvaluesareidentified.A
goodpracticeistoactuallylookatthedata,sothatyouunderstandthestructureoftheinformation.Todoso,youcanclickonDatainthetopleftcornerofyour
viewerandselectDataeditor,thenDataeditor(browse).Anewwindowwillopenandyoucanseeyourdata.
Youcanalsousethecommandbrowse,eitherbytypingitdirectlyinthecommandwindow,orfromadofile:
Oneofthedistinguishingfeaturesof<odesi>isthatwhenyoudownloadadataset,itcomeswithlabels.Variablelabelsaredescriptionsofvariables,andvalue
labelsareusedtodescribethewayvariablesarecoded.Basically,thevaluelabelsitsontopofthecode,sothatwhenyoubrowse,youseewhatthecodemeans
ratherthanwhatitis.Tomakethisclearer,letslookatthedatawithnolabels.Look,forexample,attheGEOPRVvariable.
http://data.library.utoronto.ca/cleaningdatastata
4/16
6/9/2015
CleaningdatainStata|data.library.utoronto.ca
Backtotop
Creatinganumberofsmallersubsetsbasedonresearchcriteria
Therearemanyreasonswhyyoumaywantasmallersubsetofyourdatabutthemainoneisthatthebiggerthedataset,theharderitisforStatatomanage,which
slowsdownyoursystem.Yourgoalistomakeyourdatasetassmallaspossible,whilekeepingalltherelevantinformation.Yourresearchagendadetermineswhat
yourfinaldatasetwillcontain.
LetssayyouhavedataonthehealthhabitsofCanadiansaged12andup,butyourresearchquestionisspecifictowomenofreproductiveagelivinginOntario[8].
Youclearlydontneedtokeepthemeninyourdataset,andyouwontneedtokeeptheresidentsofprovincesotherthanOntario.Furthermore,youcanprobably
dropwomenunder15andover55yearsold.Now,letslookathowyouwoulddothat.
Backtotop
Droppingobservations
Todropobservations,youneedtocombineoneoftwoStatacommands(keepordrop)withtheifqualifier.
Makesureyouhavesavedyouroriginaldatasetbeforeyougetstarted.
Thekeepcommandshouldbeusedwithcaution(oravoidedaltogether)becauseitwilldropallbutwhatyouspecificallykeep.Thiscanbeaproblemifyouarenot
100%certainofwhatyouwanttokeep.
ThedropcommandwilldropfromyourdatasetwhatyouspecificallyaskStatatodrop.
Theifqualifierrestrictsthescopeofthecommandtothoseobservationsforwhichthevalueofanexpressionistrue.Thesyntaxforusingthisqualifierisquite
simple:
commandifexp
Wherecommandinthiscasewouldbe,dropandexpistheexpressionthatneedstobetrueforthedropcommandtoapply[9].
UsingtheexampleofwomenofreproductiveageinOntario,thefirsthighlightedlinedropsmen,thesecondlinedropsanyobservationnotinOntario,whilethelast
linedropsobservationsinagegroupsolderoryoungerthanoursubsetofinterest.
Youhavetobecarefulwithlogicaloperatorsnoticethesyntaxinthethirdline.AcommonmistakeistoaskStatatodropifDHHGAGE>10&DHHGAGE<2.There
arenoindividualsinthedatasetwhoareolderthan55ANDyoungerthan15.Wewanttodropifolderthan55ORyoungerthan15.
Hereisalistofoperatorsinexpressions.Youwouldmostlyuselogicalandrelationaloperatorsinconjunctionwithif:
http://data.library.utoronto.ca/cleaningdatastata
5/16
6/9/2015
CleaningdatainStata|data.library.utoronto.ca
Backtotop
Droppingvariables
Anotherwayinwhichyoumayneedtomakeyourdatasetsmallerisbydroppingvariablesthatarenotusefultoyourresearch.Itmaybethattheinformation
containedinagivenvariableisduplicated(i.e.anothervariableprovidesthesameinfo),ormaybealltheobservationsforavariablearemissing,oravariablejust
happenstobeinyourdatasetbutisirrelevanttoyourresearch.Droppingvariablesisverystraightforwardsimplyusethedropcommand.
LookingatthedatafromCCHS,thevariableSLP_01(Numberofhoursspentsleepingpernight)iscodedas.a(NOTAPPLICABLE)foreachobservationinthe
dataset.
Clearlywewillnotlearnanythingfromthatvariable,sowecandropit.Thesyntaxfordroppingvariableissimple:
dropvarlist
http://data.library.utoronto.ca/cleaningdatastata
6/16
6/9/2015
CleaningdatainStata|data.library.utoronto.ca
Wherevarlististhelistofvariablesyouwouldliketodrop.Itseasytodropanumberofavariableatatimethisway.HereIamdroppingallthevariablesthatwere
codedasNotApplicableformorethan95%ofobservations[10]:
Backtotop
Transformingvariables
Sometimesvariablesarenotcodedthewayyouwantthemtobe.Inthissectionwewilllookattwotransformationsyoumayneedtodoonsomevariablesbefore
usingthem:recodeanddestring.
Therecodecommandchangesthevaluesofnumericvariablesaccordingtotherulesspecified.IntheCCHSdataset,manyvariableshavemissingvaluescoded
as.aor.d.Thisisconvenientbecauseitwillnotaffectcalculationsyoumightdousingthedata(forexampleifyoucalculateanaverage).However,many
datasetsuse999asamissingvariablecode,andthatmightbeproblematic.Wemightwanttorecodetheseas.inordertonothavethemaffectanycalculations
weplanondoingwiththedata.Thesyntaxforthiscommandis:
recodevarlist(oldvalue(s)=newvalue)[11]
LetsrecodetheheightandBMIvariablesfromtheCCHSdata,(forthesakeofillustration,sinceitsreallynotnecessaryinthiscase):
Thedestringcommandallowsyoutoconvertdatasavedinthestringformat(i.e.alphanumeric)intoanumericalformat.TheCCHSdatasetdoesnotcontainany
stringvariable.Inordertoseewhatastringvariablelookslike,wecanusetheconversecommand,tostring,tocreateastringvariable.Wewillthenconvertthat
variablebacktoanumericalformat.
Astringvariableshowsupinredinthedataeditor:
http://data.library.utoronto.ca/cleaningdatastata
7/16
6/9/2015
CleaningdatainStata|data.library.utoronto.ca
AlthoughitmaylookthesameasthevariableCIH_2,Statacannotdoanycalculationsonthestringvariable(sinceitsformatistellingStatathatitismadeofletters
orothersymbols).Letsdestringit:
Noticetheuseoftheoptionsgenerateandreplace.Whenwecreatedthefakestringvariable,weusedgeneratebecausewewantedanewseparatevariable.
Now,whenwedestring,wearereplacingthestringvariablebyitsnumericalcounterpart.Howyouchoosetodothisinyourowndatasetdependsonhowyouplanto
usethevariables.Willyoustillhaveanyuseforthestringvariable?Ifsogenerateanewonewhenyoudestring.Doyoujustwantthatvariabletonotbeinstring
format?Thenreplaceitwiththenewone.
Here,wecanseethatourvariablestringisnowcompletelyidenticaltothevariableCIH_2:
http://data.library.utoronto.ca/cleaningdatastata
8/16
6/9/2015
CleaningdatainStata|data.library.utoronto.ca
(Wecandropthatvariablenow)
Backtotop
Dealingwithoutliers
Outliersdeservetheirownsectionbecausethereisoftenconfusionastowhatexactlyconstitutesanoutlier.AnoutlierisNOTanobservationwithanunusualbut
possiblevalueforavariable[12]rareeventsdooccur.Theoutliersyoushouldbeconcernedaboutaretheonesthatcomefromcodingerror.Howdoyoutellwhich
iswhich?Commonsensegoesalongwayhere.
First,lookatyourdatausingthedataeditor(browse).Outlierstendtojumpatyou.Ifyouhaveasmalldataset,youcanalsotabulateeachofyourvariables:
tabvarlist[13]
Tabulatingavariablewillgiveyoualistofallthepossiblevaluesthatvariabletakesinthedataset.Outlierswillbetheextremevalues.Lookattheorderof
magnitude.Arethesevaluesbelievable?
Ifthedatasetisverybig,however,itmaynotbepracticaltostareatallthevaluesavariablecantake.Infact,Statawillnottabulateiftherearetoomanydifferent
values.
Youcanlookatyourdatainascatterplot:
IntheCCHSdataset,caseidistheindividualid,whilehwtghtmistheheightinmeters.Thegraphtellsustherearenooutliersinthisdataset:
http://data.library.utoronto.ca/cleaningdatastata
9/16
6/9/2015
CleaningdatainStata|data.library.utoronto.ca
Anotherwaytolookforoutliersistosummarizetheobservationsforavariable,usingthedetailedoption:
Theresultwindowwillshowthemainpercentilesofthedistribution(includingthemedian50%),thefirstfourmoments,aswellasthefoursmallestandfourlargest
observations:
http://data.library.utoronto.ca/cleaningdatastata
10/16
6/9/2015
CleaningdatainStata|data.library.utoronto.ca
Clearly,therearenooutliers.Letsimagineforamomentthatthe99percentileoftheheightdistributionincludesanobservationwith5.2menteredastheheight.Isit
plausiblethattherereallywasa5.2mwomanrecordedinthisdataset?Lookattheorderofmagnitudebywhichthisobservationwoulddifferfromthesecondlargest.
Itsalmost50standarddeviationsbigger...
Whatshouldyoudowithsuchanobservation?Thereareanumberofsolutionsbutnoneisperfect:
Dropitfromyourdataset(dropifhwtghtm>1.803)
Usetheifqualifiertoexcludeitwhengeneratingstatisticsthatusetheheightvariable(commandifhwtghtm<=1.803)
Ignoreitiftheheightvariableisnotactuallythatimportantinyourresearchandtherestofthevariablesforthisobservationsarecodedjustfine
Backtotop
Creatingnewvariables
Therearetwomaincommandsyouneedtoknowtogeneratenewvariables:genisforthebasics,whileegenallowsyoutogetprettyfancy.Youcancombine
thesewithqualifierssuchasiforinaswellasprefixsuchasbyandbysort[14].
Forexample,sayyouwanttocreateavariablethattellsyouwhetherthewomeninthedatasethavealiveinpartner.Whilethereisnosurefirewaytoestablishthat,
wewillapproximateitbyassumingthatwomenwhoindicatedtheirmaritalstatusasmarriedorcommonlawactuallylivewiththeirspouseorcommonlawpartner:
Thefirstlinecreatesthevariableliveinandassignsitavalueof1ifthevalueofthemaritalstatusvariable(dhhgms)iseither1(married)or2(commonlaw).The
secondlinereplacesthemissingvaluecodeby0,makingtheliveinvariablebinary.
Now,letssayyouwouldliketocreateacategoricalvariablethattellsyou,byagegroup,ifawomanisbeloworaboveaverageintermsofbodymassindex(BMI).
http://data.library.utoronto.ca/cleaningdatastata
11/16
6/9/2015
CleaningdatainStata|data.library.utoronto.ca
Thefirstlineofcommandcreatesavariable(meanbmi)whichtakesonauniquevalueforeachagegroup,theaverageBMIforthatagegroup.Theprefixbysortis
acombinationofbyandsortyoucouldequivalentlybreakitintotwocommands:
sortDHHGAGE
byDHHGAGE:egenmeanbmi=mean(HWTGBMI)
ThesortpartofthecommandorganizestheobservationaccordingtothevariableDHHGAGE,fromsmallesttolargest,asteprequiredbeforedoinganyactionby
thevariable.Itsusuallyeasiertojustusebysort.
Thesecondandthirdlines(startingwithgen)createabinaryvariablewhichequals0ifanobservationhasaBMIlowerthantheaverageforheragegroup,and1if
herBMIisaboveheragegroupaverage.
Backtotop
Movingvariables
Nowthatyouhavecreatedthesenewvariables,itwouldbenicetomakesurethattherulesbywhichyougeneratedthemwascorrect.Ideally,youwouldliketolook
atlivein(thenewvariablebasedonmaritalstatus)anddhhgms(themaritalstatusvariable).However,itshardtocomparetwovariablesunlesstheyaresideby
side.Youcanusetheordercommandtomoveavariable(i.e.moveacolumnofyourdataset).
Whenyoucreateavariable,bydefaultitbecomesthelastcolumnofyourdataset.Youcanmoveitnexttoanothervariableinstead:
Nowifwelookatourdataset,wecanseecomparethenewvariabletotheoldandmakesurethatwecodeditproperly:
http://data.library.utoronto.ca/cleaningdatastata
12/16
6/9/2015
CleaningdatainStata|data.library.utoronto.ca
Similarly,sinceourtwonewvariablespertainingtoBMIarenowthelastcolumns,letsmovetheoriginalBMIvariabletotheendofthedataset:
Itnoweasytoglanceatournewvariables:
http://data.library.utoronto.ca/cleaningdatastata
13/16
6/9/2015
CleaningdatainStata|data.library.utoronto.ca
Doyounoticetheproblemonline8?Thevariablebmicatshouldnotbecoded1iftheoriginalBMIvariableiscodedasamissingvalue.Wecanfixthiswithaquick
replace:
replacebmicat=.ifhwtgbmi==.d
Backtotop
Labellingvariables
Wheneveryoucreateanewvariable,itisagoodideatolabelit.Why?Havingyourvariableslabeledmakesiteasyforyouoranyoneelseusingyourdatasetto
quicklyseewhateachvariablerepresents.Youshouldthinkofyourworkassomethingthatpeopleshouldbeabletoreproduce.Labelingyourvariablesisasmall
taskthatmakesitmucheasierforotherstouseyourdata[15].
Thesyntaxforlabelingvariablesisasfollow:
labelvariablevarnamelabel.
Inourpreviousexample,thecommandwouldlooklikethis:
Notethatyoucanabbreviatethiscommandtolabvar:
Backtotop
Renamingvariables
Youmayfindthatyouworkfasterifyourvariableshavenamesthatyourecognizeatfirstglance.Inmostcasesthisisbynomeansanecessarytaskincleaning
data,butifyouusedatafromanothercountry,forexample,youmayfindthatthevariablenamesareinaforeignlanguage,makingitveryhardtoremember.The
syntaxisaseasyascanbe:
renameoldnamenewname
Letsseethefinaldofile
Yourdofilemaybeslightlydifferentfromthisbutitshouldresultinthesamefinaldataset:
http://data.library.utoronto.ca/cleaningdatastata
14/16
6/9/2015
CleaningdatainStata|data.library.utoronto.ca
Letstryrunningitinonegotoseeifitworks.DonothighlightanycommandandclickonExecute(Do).NotethatwheneverStataencountersthecommandbrowse
adataeditorwillpopuponyourscreen.HavealookatyourdatathenclosethedataeditorinorderforStatatocontinuerunningthedofile.
Letsalsotakethetimetoopenourlogstoseewhatitlookslikeandhowitcouldbeuseful.
Finallyletslookatourfinaldatasetsandmakesureitcontainsalltherightvariables,intherightformat.
Backtotop
Afewlastwords
Thisconcludesourworkshopbutitsonlythebeginningforyou.Learningtousestatisticalsoftwareinvolvesalotoftrialanderror,angrygoogling,anddesperately
tryingtofindsomeonewhoknowshowtowritealoopListedbelowareafewexcellentresourcestofurtheryourworkingknowledgeofStata:
UCLA:http://www.ats.ucla.edu/stat/stata/default.htm(http://www.ats.ucla.edu/stat/stata/default.htm)
Princeton:http://data.princeton.edu/stata/default.html(http://data.princeton.edu/stata/default.html)
http://www.princeton.edu/~otorres/Stata/statnotes(http://www.princeton.edu/%7Eotorres/Stata/statnotes)
LSE:http://personal.lse.ac.uk/lembcke/ecStata/2009/MResStataNotesJan2009PartA.pdf
(http://personal.lse.ac.uk/lembcke/ecStata/2009/MResStataNotesJan2009PartA.pdf)
http://personal.lse.ac.uk/lembcke/ecStata/2009/MResStataNotesFeb2009PartB.pdf
(http://personal.lse.ac.uk/lembcke/ecStata/2009/MResStataNotesFeb2009PartB.pdf)
UniversityofNorthCarolinaatChapelHill:http://www.cpc.unc.edu/research/tools/data_analysis/statatutorial
(http://www.cpc.unc.edu/research/tools/data_analysis/statatutorial)
Stata:http://www.stata.com/support/faqs/(http://www.stata.com/support/faqs/)
http://data.library.utoronto.ca/cleaningdatastata
15/16
6/9/2015
CleaningdatainStata|data.library.utoronto.ca
Backtotop
[1]Thereisanassumptionherethatyoualreadyhaveadataset.Ifyoudonotandyouneedassistanceassemblingdata,pleasevisitthedatalibrary(THIS
COMMENTNEEDSTOREFERENCETHEGUIDEONHOWTODOWNLOADADATASETFROMSDA)
[2]Youcanuseothertexteditorstocreateandmanagedofiles.Forexample,SmultronisanopensourcesoftwarethatworkswellwithStata.
[3]Youcanseethesizeofadatasetbyrightclickingonit,thenselectingproperties.
[4]Youshouldcreateafolderinaneasytorememberlocation(desktopworkswell)foryourStatawork.Thencheckitspropertiesbyrightclickingonit,andcopythe
location.Thatsyourpath.
[5],replaceisoptionalherebutratherusefulifyouwanttokeepjustonelogperdofile.Ifyoudonthavethe,replacecommand,youwillneedtomodifythename
ofthelogeverytimeyourunthedofile.
[6]However,ifadofileisinterruptedbecauseofanerrorandalogisopen,youwillneedtocloseitbeforerunningthesamedofileagain,becauseoneofthefirst
commandofthedofileistostartalog,whichwillresultinanerrormessageunlessthepreviouslogisclosed.Simplytypethecommandlogcloseinthecommand
window,orhighlightitandexecutefromyourdofile.
[7]Notetousersofthisguide:thiscommandwouldtypicallybelocatedtowardstheendofthedofile.Ihavecreatedascreenshotherewithanewdofileonlyto
showonecommandalone.Alltheexamplesinthisguidethatsimilarlyuseanewdofilewithonlyonecommandweredonethatwaytosavespace.Thegoalofthis
workshopistolearntocreateacleaningdofile,inwhichcommandsarelistedoneaftertheother.Itrustthatuserscanunderstandthecommandswellenoughby
theendoftheworkshoptoassemblethemintheorderthatislogicalforthepurposeoftheirowntask.
[8]TheexamplesinthisguidewerecreatedusingacustomizedsubsetoftheCanadiancommunityhealthsurvey(CCHS),annualcomponent,20072008,available
throughtheDataLiberationInitiative(DLI)anddownloadedusingSDA@CHASS.
[9]SeetheStatahelpfilesonexpressionsandoperators:typehelpexpandhelpoperatorinthecommandscreen.
[10]ThereisnoruleofthumbatplayhereIsimplypickedalistofvariablesthatcontainedlittleusefulinformation.Sometimes,thefactthatonlyasmallnumberof
observationscontaininformationISinformative,inandofitself.Donotdropvariablesthattellyousomethingimportant.
[11]Notethatyoucanalsousethiscommandtomakegroups.TheCCHSdatasetalreadyhasagebyagegroupbutifyouhadavariableforactualage,youcould
generateanagegroupvariableusingrecode.SeetheStatahelpsheet(helprecode)formoreoptions.
[12]Admittedly,theseareindeedoutliers,justnotthetypewewanttodoanythingabout.Leavethosealone.Dealingwithtrueeventsinanywayislikelytodo
moreharmthangoodasyouwouldtruncateyourdataset,potentiallycreatingbiasinyouranalysislater.
[13]Youreplacevarlistwiththelistofthevariablesyouwanttabulated,asinthedropexample.
[14]Allofthesecommands,qualifiersandprefixeshaveStatahelpfiles.Havealookatthemforamoreindepthpresentation.
[15]Knowinghowtolabelvariablescanalsobeusefulifthedatawasnotprovidedtoyouwithadictionaryfileyoucanthenusethequestionnairetobuildlabelsfor
allyourvariablesofinterest,justasadictionaryfilewoulddo.
Backtotop
MDLhours
Contactus
http://data.library.utoronto.ca/cleaningdatastata
16/16