Académique Documents
Professionnel Documents
Culture Documents
What is Statistics?
1.1
What is Statistics?
Statisticsisawaytogetinformationfromdata.
1.2
What is Statistics?
Statisticsisawaytogetinformationfromdata
Statistics
Data
Information
1.3
1.5
1.6
1.7
1.8
1.9
Descriptive Statistics
Descriptivestatisticsdealswithmethodsoforganizing,
summarizing,andpresentingdatainaconvenientand
informativeway.
Oneformofdescriptivestatisticsusesgraphicaltechniques,
whichallowstatisticspractitionerstopresentdatainwaysthat
makeiteasyforthereadertoextractusefulinformation.
Chapter2and3introducesseveralgraphicalmethods.
1.10
Descriptive Statistics
Anotherformofdescriptivestatisticsusesnumerical
techniquestosummarizedata.
Themeanandmedianarepopularnumericaltechniquesto
describethelocationofthedata.
Therange,variance,andstandarddeviationmeasurethe
variabilityofthedata
Chapter4introducesseveralnumericalstatisticalmeasures
thatdescribedifferentfeaturesofthedata.
1.11
studentshasofferedPepsiColaanexclusivityagreementthat
wouldgivePepsiexclusiverightstosellitsproductsatall
universityfacilitiesforthenextyearwithanoptionforfuture
years.
Inreturn,theuniversitywouldreceive35%oftheoncampus
revenuesandanadditionallumpsumof$200,000peryear.
Pepsihasbeengiven2weekstorespond.
1.12
Sample
Parameters
statistics
StatisticalInference
Confidencelevel
Significancelevel
1.13
Pepsicurrentlysellsanaverageof22,000cansperweek(over
the40weeksoftheyearthattheuniversityoperates).
Thecanssellforanaverageof1dollareach.Thecosts
includinglaborare30centspercan.
Pepsiisunsureofitsmarketsharebutsuspectsitis
considerablylessthan50%.
1.14
Theonlyproblemisthatwedonotknowhowmanysoft
drinksaresoldweeklyattheuniversity.
1.15
Accordingly,sheorganizesasurveythatasks500studentsto
keeptrackofthenumberofsoftdrinkstheypurchaseinthe
next7days.
Theresponsesarestoredinafileonthediskthataccompanies
thisbook.Case12.1
1.16
Inferential statistics
TheinformationwewouldliketoacquireinCase12.1isan
estimateofannualprofitsfromtheexclusivityagreement.The
dataarethenumbersofcansofsoftdrinksconsumedin7days
bythe500studentsinthesample.
Wewanttoknowthemeannumberofsoftdrinksconsumed
byall50,000studentsoncampus.
Toaccomplishthisgoalweneedanotherbranchofstatistics
inferentialstatistics.
1.17
Inferential statistics
Inferentialstatisticsisabodyofmethodsusedtodraw
conclusionsorinferencesaboutcharacteristicsofpopulations
basedonsampledata.
Thepopulationinquestioninthiscaseisthesoftdrink
consumptionoftheuniversity's50,000students.
Thecostofinterviewingeachstudentwouldbeprohibitiveand
extremelytimeconsuming.
Statisticaltechniquesmakesuchendeavorsunnecessary.
Instead,wecansampleamuchsmallernumberofstudents
(thesamplesizeis500)andinferfromthedatathenumberof
softdrinksconsumedbyall50,000students.Wecanthen
estimateannualprofitsforPepsi.
1.18
Example 12.5
Whenanelectionforpoliticalofficetakesplace,thetelevision
networkscancelregularprogrammingandinsteadprovide
electioncoverage.
Usuallytheballotsarecountedtheresultsarereported.This
takestime.
However,forimportantofficessuchaspresidentorsenatorin
largestates,thenetworksactivelycompetetoseewhichwill
bethefirsttopredictawinner.
1.19
Example 12.5
Thisisdonethroughexitpolls,whereinarandomsampleof
voterswhoexitthepollingboothisaskedforwhomthey
voted.
Fromthedatathesampleproportionofvoterssupportingthe
candidatesiscomputed.
Astatisticaltechniqueisappliedtodeterminewhetherthereis
enoughevidencetoinferthattheleadingcandidatewillgarner
enoughvotestowin.
1.20
Example 12.5
TheexitpollresultsfromthestateofFloridaduringthe2000
yearelectionswererecorded(onlythevotesoftheRepublican
candidateGeorgeW.BushandtheDemocratAlbertGore).
Supposethattheresults(765peoplewhovotedforeitherBush
orGore)werestoredonafileonthedisk.(1=Goreand2=
Bush)
Xm1205
Thenetworkanalystswouldliketoknowwhethertheycan
concludethatGeorgeW.BushwillwinthestateofFlorida.
1.21
Example 12.5
Example12.5describesaverycommonapplicationof
statisticalinference.
Thepopulationthetelevisionnetworkswantedtomake
inferencesaboutistheapproximately5millionFloridianswho
votedforBushorGoreforpresident.
Thesampleconsistedofthe765peoplerandomlyselectedby
thepollingcompanywhovotedforeitherofthetwomain
candidates.
1.22
Example 12.5
Thecharacteristicofthepopulationthatwewouldliketo
knowistheproportionofthetotalelectoratethatvotedfor
Bush.
Specifically,wewouldliketoknowwhethermorethan50%
oftheelectoratevotedforBush(countingonlythosewho
votedforeithertheRepublicanorDemocraticcandidate).
1.23
Example 12.5
Becausewewillnotaskeveryoneofthe5millionactual
votersforwhomtheyvoted,wecannotpredicttheoutcome
with100%certainty.
Asamplethatisonlyasmallfractionofthesizeofthe
populationcanleadtocorrectinferencesonlyacertain
percentageofthetime.
Youwillfindthatstatisticspractitionerscancontrolthat
fractionandusuallysetitbetween90%and99%.
1.24
Sample
Asampleisasubsetofdatadrawnfromthe
population.
Potentiallyverylarge,butlessthanthepopulation.
E.g.asampleof765votersexitpolledonelectionday.
1.25
1.26
Sample
Subset
Parameter
Statistic
PopulationshaveParameters,
SampleshaveStatistics.
1.27
Descriptive Statistics
aremethodsoforganizing,summarizing,andpresenting
datainaconvenientandinformativeway.Thesemethods
include:
GraphicalTechniques(Chapter2,3),and
NumericalTechniques(Chapter4).
Theactualmethoduseddependsonwhatinformationwe
wouldliketoextract.Areweinterestedin
measure(s)ofcentrallocation?and/or
measure(s)ofvariability(dispersion)?
DescriptiveStatisticshelpstoanswerthesequestions
1.28
Inferential Statistics
DescriptiveStatisticsdescribethedatasetthatsbeing
analyzed,butdoesntallowustodrawanyconclusionsor
makeanyinterferencesaboutthedata.Henceweneed
anotherbranchofstatistics:inferentialstatistics.
Inferentialstatisticsisalsoasetofmethods,butitisused
todrawconclusionsorinferencesaboutcharacteristicsof
populationsbasedondatafromasample.
1.29
Statistical Inference
Statisticalinferenceistheprocessofmakinganestimate,
prediction,ordecisionaboutapopulationbasedonasample.
Population
Sample
Inference
Statistic
Parameter
WhatcanweinferaboutaPopulationsParameters
basedonaSamplesStatistics?
1.30
Statistical Inference
Weusestatisticstomakeinferencesaboutparameters.
Therefore,wecanmakeanestimate,prediction,ordecision
aboutapopulationbasedonsampledata.
Thus,wecanapplywhatweknowaboutasampletothe
largerpopulationfromwhichitwasdrawn!
1.31
Statistical Inference
Rationale:
Largepopulationsmakeinvestigatingeachmemberimpractical
andexpensive.
Easierandcheapertotakeasampleandmakeestimatesaboutthe
populationfromthesample.
However:
Suchconclusionsandestimatesarenotalwaysgoingtobecorrect.
Forthisreason,webuildintothestatisticalinferencemeasuresof
reliability,namelyconfidencelevelandsignificancelevel.
1.32
Whenthepurposeofthestatisticalinferenceistodrawa
conclusionaboutapopulation,thesignificancelevel
measureshowfrequentlytheconclusionwillbewrongin
thelongrun.
E.g.a5%significancelevelmeansthat,inthelongrun,thistype
ofconclusionwillbewrong5%ofthetime.
1.33
1.34
Inthiscase,ourconfidencelevelis95%(19/20=0.95),
whileoursignificancelevelis5%.
1.35
1.36