Stat Review - Keller

What is Statistics?
Statisticsisawaytogetinformationfromdata
Statistics
Data Information
Data:Facts,especially Information:Knowledge
numericalfacts,collected communicatedconcerning
togetherforreferenceor someparticularfact.
information.
Statisticsisatoolforcreatingnewunderstandingfromasetof
numbers.
Definitions:OxfordEnglishDictionary
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Key Statistical Concepts
Population
apopulationisthegroupofallitemsofinterestto
astatisticspractitioner.
frequentlyverylarge;sometimesinfinite.
E.g.All5millionFloridavoters,perExample12.5
Sample
Asampleisasetofdatadrawnfromthe
population.
Potentiallyverylarge,butlessthanthepopulation.
E.g.asampleof765votersexitpolledonelectionday.
Parameter
Adescriptivemeasureofapopulation.
Statistic
Adescriptivemeasureofasample.

Population Sample
Subset
Statistic
Parameter
PopulationshaveParameters,
SampleshaveStatistics.

Descriptive Statistics
aremethodsoforganizing,summarizing,andpresenting
datainaconvenientandinformativeway.Thesemethods
include:
GraphicalTechniques(Chapter2),and
NumericalTechniques(Chapter4).
Theactualmethoduseddependsonwhatinformationwe
wouldliketoextract.Areweinterestedin
measure(s)ofcentrallocation?and/or
measure(s)ofvariability(dispersion)?
DescriptiveStatisticshelpstoanswerthesequestions

Statistical Inference
Statisticalinferenceistheprocessofmakinganestimate,
prediction,ordecisionaboutapopulationbasedonasample.
Population
Sample
Inference
Statistic
Parameter
WhatcanweinferaboutaPopulationsParameters
basedonaSamplesStatistics?
Definitions
Avariableissomecharacteristicofapopulationorsample.
E.g.studentgrades.
Typicallydenotedwithacapitalletter:X,Y,Z
Thevaluesofthevariablearetherangeofpossiblevalues
foravariable.
E.g.studentmarks(0..100)
Dataaretheobservedvaluesofavariable.
E.g.studentmarks:{67,74,71,83,93,55,48}

Interval Data
Intervaldata
Realnumbers,i.e.heights,weights,prices,etc.
Alsoreferredtoasquantitativeornumerical.
ArithmeticoperationscanbeperformedonIntervalData,
thusitsmeaningfultotalkabout2*Height,orPrice+$1,
andsoon.

Nominal Data
NominalData
Thevaluesofnominaldataarecategories.
E.g.responsestoquestionsaboutmaritalstatus,coded
as:
Single=1,Married=2,Divorced=3,Widowed=4
Becausethenumbersarearbitraryarithmeticoperations
dontmakeanysense(e.g.doesWidowed2=Married?!)
Nominaldataarealsocalledqualitativeorcategorical.

Ordinal Data
OrdinalDataappeartobecategoricalinnature,buttheir
valueshaveanorder;arankingtothem:
E.g.Collegecourseratingsystem:
poor=1,fair=2,good=3,verygood=4,excellent=5
Whileitsstillnotmeaningfultodoarithmeticonthisdata
(e.g.does2*fair=verygood?!),wecansaythingslike:
excellent > poororfair < very good
Thatis,orderismaintainednomatterwhatnumericvalues
areassignedtoeachcategory.
Graphical & Tabular Techniques for Nominal
Data
Theonlyallowablecalculationonnominaldataistocount
thefrequencyofeachvalueofthevariable.
Wecansummarizethedatainatablethatpresentsthe
categoriesandtheircountscalledafrequencydistribution.
Arelativefrequencydistributionliststhecategoriesandthe
proportionwithwhicheachoccurs.
RefertoExample2.1

Nominal Data (Tabular Summary)

Nominal Data (Frequency)
BarChartsareoftenusedtodisplayfrequencies
Nominal Data
Itallthesameinformation,
(basedonthesamedata).
Justdifferentpresentation.

Graphical Techniques for Interval
Data
Thereareseveralgraphicalmethodsthatareusedwhenthe
dataareinterval(i.e.numeric,noncategorical).
Themostimportantofthesegraphicalmethodsisthe
histogram.
Thehistogramisnotonlyapowerfulgraphicaltechnique
usedtosummarizeintervaldata,butitisalsousedtohelp
explainprobabilities.

Building a Histogram
1) CollecttheData
2) Createafrequencydistributionforthedata.
3) DrawtheHistogram.

Histogram and Stem & Leaf

Ogive
Isagraphofacumulativefrequencydistribution.
Wecreateanogiveinthreesteps
1)Calculaterelativefrequencies.
2)Calculatecumulativerelativefrequenciesbyaddingthe
currentclassrelativefrequencytothepreviousclass
cumulativerelativefrequency.
(Forthefirstclass,itscumulativerelativefrequencyisjustitsrelativefrequency)

Cumulative Relative Frequencies
firstclass
nextclass:.355+.185=.540
:
:
lastclass:.930+.070=1.00

Ogive
The ogive can be used

to answer questions
like:
What telephone bill

value is at the 50th
percentile?
around $35
(ReferalsotoFig.2.13inyourtextbook)
Scatter Diagram
Example2.9Arealestateagentwantedtoknowtowhat
extentthesellingpriceofahomeisrelatedtoitssize
1) Collectthedata
2) Determinetheindependentvariable(Xhousesize)and
thedependentvariable(Ysellingprice)
3) UseExceltocreateascatterdiagram

Scatter Diagram
Itappearsthatinfactthereisarelationship,thatis,the
greaterthehousesizethegreaterthesellingprice

Patterns of Scatter Diagrams
LinearityandDirectionaretwoconceptsweareinterestedin
Positive Linear Relationship Negative Linear Relationship
Weak or Non-Linear Relationship

Time Series Data
Observationsmeasuredatthesamepointintimearecalled
crosssectionaldata.
Observationsmeasuredatsuccessivepointsintimeare
calledtimeseriesdata.
Timeseriesdatagraphedonalinechart,whichplotsthe
valueofthevariableontheverticalaxisagainstthetime
periodsonthehorizontalaxis.

Numerical Descriptive Techniques
MeasuresofCentralLocation
Mean,Median,Mode
MeasuresofVariability
Range,StandardDeviation,Variance,CoefficientofVariation
MeasuresofRelativeStanding
Percentiles,Quartiles
MeasuresofLinearRelationship
Covariance,Correlation,LeastSquaresLine

Measures of Central Location
Thearithmeticmean,a.k.a.average,shortenedtomean,is
themostpopular&usefulmeasureofcentrallocation.
Itiscomputedbysimplyaddingupalltheobservationsand
dividingbythetotalnumberofobservations:
Sum of the observations

Mean =
Number of observations

Arithmetic Mean
SampleMean
PopulationMean

Statistics is a pattern language
Population Sample
Size N n
Mean

The Arithmetic Mean
isappropriatefordescribingmeasurementdata,e.g.
heightsofpeople,marksofstudentpapers,etc.
isseriouslyaffectedbyextremevaluescalledoutliers.
E.g.assoonasabillionairemovesintoaneighborhood,the
averagehouseholdincomeincreasesbeyondwhatitwas
previously!

Measures of Variability
Measuresofcentrallocationfailtotellthewholestoryabout
thedistribution;thatis,howmucharetheobservations
spreadoutaroundthemeanvalue?
For example, two sets of
class grades are shown. The
mean (=50) is the same in
each case
But, the red class has

greater variability than the
blue class.

Range
Therangeisthesimplestmeasureofvariability,calculated
as:
Range=LargestobservationSmallestobservation
E.g.
Data:{4,4,4,4,50} Range=46
Data:{4,8,15,24,39,50} Range=46
Therangeisthesameinbothcases,
butthedatasetshaveverydifferentdistributions

Statistics is a pattern language
Population Sample
Size N n
Mean
Variance

Variance
population mean
Thevarianceofapopulationis:
population size
sample mean
Thevarianceofasampleis:
Note! the denominator is sample size (n) minus one !

Application
Example4.7.Thefollowingsampleconsistsofthenumber
ofjobssixrandomlyselectedstudentsappliedfor:17,15,
23,7,9,13.
Findsitsmeanandvariance.
Whatarewelookingtocalculate?
Thefollowingsampleconsistsofthenumberofjobssix
randomlyselectedstudentsappliedfor:17,15,23,7,9,13.
Findsitsmeanandvariance.
asopposedtoor2
Sample Mean & Variance
SampleMean
SampleVariance
SampleVariance(shortcutmethod)

Standard Deviation
Thestandarddeviationissimplythesquarerootofthe
variance,thus:
Populationstandarddeviation:
Samplestandarddeviation:

Standard Deviation
ConsiderExample4.8whereagolfclubmanufacturerhas
designedanewclubandwantstodetermineifitishitmore
consistently(i.e.withlessvariability)thanwithanoldclub.
UsingTools > Data Analysis [may need to add in > Descriptive
StatisticsinExcel,weproducethefollowingtablesfor
interpretation
Yougetmore
consistent
distancewiththe
newclub.

The Empirical Rule If the histogram is bell
shaped
Approximately 68% of all observations fall
within one standard deviation of the mean.
Approximately 95% of all observations fall

within two standard deviations of the mean.
Approximately 99.7% of all observations fall

within three standard deviations of the mean.
Chebysheffs TheoremNot often used because interval is
very wide.
Amoregeneralinterpretationofthestandarddeviationis
derivedfromChebysheffsTheorem,whichappliestoall
shapesofhistograms(notjustbellshaped).
Theproportionofobservationsinanysamplethatlie
withinkstandarddeviationsofthemeanisatleast:
For k=2 (say), the theorem
states that at least 3/4 of all
observations lie within 2
standard deviations of the
mean. This is a lower bound
compared to Empirical Rules
approximation (95%).

Box Plots
Theseboxplotsarebasedon
datainXm0415.
Wendysservicetimeis
shortestandleastvariable.
Hardeeshasthegreatest
variability,whileJackin
theBoxhasthelongest
servicetimes.

Methods of Collecting Data
Therearemanymethodsusedtocollectorobtaindatafor
statisticalanalysis.Threeofthemostpopularmethodsare:
DirectObservation
Experiments,and
Surveys.

Sampling
Recallthatstatisticalinferencepermitsustodraw
conclusionsaboutapopulationbasedonasample.
Sampling(i.e.selectingasubsetofawholepopulation)is
oftendoneforreasonsofcost(itslessexpensivetosample
1,000televisionviewersthan100millionTVviewers)and
practicality(e.g.performingacrashtestonevery
automobileproducedisimpractical).
Inanycase,thesampledpopulationandthetarget
populationshouldbesimilartooneanother.

Sampling Plans
Asamplingplanisjustamethodorprocedurefor
specifyinghowasamplewillbetakenfromapopulation.
Wewillfocusourattentiononthesethreemethods:
SimpleRandomSampling,
StratifiedRandomSampling,and
ClusterSampling.

Simple Random Sampling
Asimplerandomsampleisasampleselectedinsuchaway
thateverypossiblesampleofthesamesizeisequallylikely
tobechosen.
Drawingthreenamesfromahatcontainingallthenamesof
thestudentsintheclassisanexampleofasimplerandom
sample:anygroupofthreenamesisasequallylikelyas
pickinganyothergroupofthreenames.

Stratified Random Sampling
Afterthepopulationhasbeenstratified,wecanusesimple
randomsamplingtogeneratethecompletesample:
f we only have sufficient resources to sample 400 people total,

we would draw 100 of them from the low income group
if we are sampling 1000 people, wed draw

50 of them from the high income group.

Cluster Sampling
Aclustersampleisasimplerandomsampleofgroupsor
clustersofelements(vs.asimplerandomsampleof
individualobjects).
Thismethodisusefulwhenitisdifficultorcostlytodevelop
acompletelistofthepopulationmembersorwhenthe
populationelementsarewidelydispersedgeographically.
Clustersamplingmayincreasesamplingerrordueto
similaritiesamongclustermembers.

Sampling Error
Samplingerrorreferstodifferencesbetweenthesampleand
thepopulationthatexistonlybecauseoftheobservations
thathappenedtobeselectedforthesample.
Anotherwaytolookatthisis:thedifferencesinresultsfor
differentsamples(ofthesamesize)isduetosamplingerror:
E.g.Twosamplesofsize10of1,000households.Ifwe
happenedtogetthehighestincomeleveldatapointsinour
firstsampleandallthelowestincomelevelsinthesecond,
thisdeltaisduetosamplingerror.

Nonsampling Error
Nonsamplingerrorsaremoreseriousandaredueto
mistakesmadeintheacquisitionofdataorduetothesample
observationsbeingselectedimproperly.Threetypesof
nonsamplingerrors:
Errorsindataacquisition,
Nonresponseerrors,and
Selectionbias.
Note:increasingthesamplesizewillnotreducethistypeof
error.
Approaches to Assigning
Probabilities
Therearethreewaystoassignaprobability,P(Oi),toan
outcome,Oi,namely:
Classicalapproach:makecertainassumptions(suchas
equallylikely,independence)aboutsituation.
Relativefrequency:assigningprobabilitiesbasedon
experimentationorhistoricaldata.
Subjectiveapproach:Assigningprobabilitiesbasedonthe
assignorsjudgment.

Interpreting Probability
Onewaytointerpretprobabilityisthis:
Ifarandomexperimentisrepeatedaninfinitenumberof
times,therelativefrequencyforanygivenoutcomeisthe
probabilityofthisoutcome.
Forexample,theprobabilityofheadsinflipofabalanced
coinis.5,determinedusingtheclassicalapproach.The
probabilityisinterpretedasbeingthelongtermrelative
frequencyofheadsifthecoinisflippedaninfinitenumber
oftimes.

Conditional Probability
Conditionalprobabilityisusedtodeterminehowtwoevents
arerelated;thatis,wecandeterminetheprobabilityofone
eventgiventheoccurrenceofanotherrelatedevent.
ConditionalprobabilitiesarewrittenasP(A|B)andreadas
theprobabilityofAgivenBandiscalculatedas:

Independence
Oneoftheobjectivesofcalculatingconditionalprobability
istodeterminewhethertwoeventsarerelated.
Inparticular,wewouldliketoknowwhethertheyare
independent,thatis,iftheprobabilityofoneeventisnot
affectedbytheoccurrenceoftheotherevent.
TwoeventsAandBaresaidtobeindependentif
P(A|B)=P(A)
or
P(B|A)=P(B)

Complement Rule
ThecomplementofaneventAistheeventthatoccurswhen
Adoesnotoccur.
Thecomplementrulegivesustheprobabilityofanevent
NOToccurring.Thatis:
P(AC)=1P(A)
Forexample,inthesimplerollofadie,theprobabilityofthe
number1beingrolledis1/6.Theprobabilitythatsome
numberotherthan1willberolledis11/6=5/6.
Multiplication Rule
Themultiplicationruleisusedtocalculatethejoint
probabilityoftwoevents.Itisbasedontheformulafor
conditionalprobabilitydefinedearlier:
IfwemultiplybothsidesoftheequationbyP(B)wehave:
P(AandB)=P(A|B)P(B)
Likewise,P(AandB)=P(B|A)P(A)
IfAandBareindependentevents,thenP(AandB)=P(A)P(B)

Addition Rule
Recall:theadditionrulewasintroducedearliertoprovidea
waytocomputetheprobabilityofeventAorBorbothA
andBoccurring;i.e.theunionofAandB.
P(AorB)=P(A)+P(B)P(AandB)
WhydowesubtractthejointprobabilityP(AandB)from
thesumoftheprobabilitiesofAandB?
P(AorB)=P(A)+P(B)P(AandB)

Addition Rule for Mutually Excusive
Events
IfandAandBaremutuallyexclusivetheoccurrenceofone
eventmakestheotheroneimpossible.Thismeansthat
P(AandB)=0
Theadditionruleformutuallyexclusiveeventsis
P(AorB)=P(A)+P(B)
Weoftenusethisformwhenweaddsomejointprobabilities
calculatedfromaprobabilitytree
Two Types of Random Variables
DiscreteRandomVariable
onethattakesonacountablenumberofvalues
E.g.valuesontherollofdice:2,3,4,,12
ContinuousRandomVariable
onewhosevaluesarenotdiscrete,notcountable
E.g.time(30.1minutes?30.10000001minutes?)
Analogy:
IntegersareDiscrete,whileRealNumbersareContinuous

Laws of Expected Value
1. E(c)=c
Theexpectedvalueofaconstant(c)isjustthevalueofthe
constant.
2. E(X+c)=E(X)+c
3. E(cX)=cE(X)
Wecanpullaconstantoutoftheexpectedvalueexpression
(eitheraspartofasumwitharandomvariableXorasacoefficient
ofrandomvariableX).

Laws of Variance
1. V(c)=0
Thevarianceofaconstant(c)iszero.
2. V(X+c)=V(X)
Thevarianceofarandomvariableandaconstantisjustthe
varianceoftherandomvariable(per1above).
3. V(cX)=c2V(X)
Thevarianceofarandomvariableandaconstantcoefficientis
thecoefficientsquaredtimesthevarianceoftherandomvariable.

Binomial Distribution
Thebinomialdistributionistheprobabilitydistributionthat
resultsfromdoingabinomialexperiment.Binomial
experimentshavethefollowingproperties:
1. Fixednumberoftrials,representedasn.
2. Eachtrialhastwopossibleoutcomes,asuccessanda
failure.
3. P(success)=p(andthus:P(failure)=1p),foralltrials.
4. Thetrialsareindependent,whichmeansthatthe
outcomeofonetrialdoesnotaffecttheoutcomesofany
othertrials.
Binomial Random Variable
Thebinomialrandomvariablecountsthenumberof
successesinntrialsofthebinomialexperiment.Itcantake
onvaluesfrom0,1,2,,n.Thus,itsadiscreterandom
variable.
Tocalculatetheprobabilityassociatedwitheachvaluewe
usecombintorics:
forx=0,1,2,,n

Binomial Table
WhatistheprobabilitythatPatfailsthequiz?
i.e.whatisP(X4),givenP(success)=.20andn=10?
P(X4)=.967
Binomial Table
WhatistheprobabilitythatPatgetstwoanswerscorrect?
i.e.whatisP(X=2),givenP(success)=.20andn=10?
P(X=2)=P(X2)P(X1)=.678.376=.302
remember, the table shows cumulative probabilities
=BINOMDIST() Excel Function
ThereisabinomialdistributionfunctioninExcelthatcan
alsobeusedtocalculatetheseprobabilities.Forexample:
WhatistheprobabilitythatPatgetstwoanswerscorrect?
# successes
# trials
P(success)
cumulative
(i.e. P(Xx)?)
P(X=2)=.3020
=BINOMDIST() Excel Function
ThereisabinomialdistributionfunctioninExcelthatcan
alsobeusedtocalculatetheseprobabilities.Forexample:
WhatistheprobabilitythatPatfailsthequiz?
# successes
# trials
P(success)
cumulative
(i.e. P(Xx)?)
P(X4)=.9672
Binomial Distribution
Asyoumightexpect,statisticianshavedevelopedgeneral
formulasforthemean,variance,andstandarddeviationofa
binomialrandomvariable.Theyare:

Poisson Distribution
NamedforSimeonPoisson,thePoissondistributionisa
discreteprobabilitydistributionandreferstothenumberof
events(a.k.a.successes)withinaspecifictimeperiodor
regionofspace.Forexample:
Thenumberofcarsarrivingataservicestationin1hour.(The
intervaloftimeis1hour.)
Thenumberofflawsinaboltofcloth.(Thespecificregionisa
boltofcloth.)
Thenumberofaccidentsin1dayonaparticularstretchof
highway.(Theintervalisdefinedbybothtime,1day,andspace,
theparticularstretchofhighway.)

The Poisson Experiment
Likeabinomialexperiment,aPoissonexperimenthasfour
definingcharacteristicproperties:
1. Thenumberofsuccessesthatoccurinanyintervalis
independentofthenumberofsuccessesthatoccurinany
otherinterval.
2. Theprobabilityofasuccessinanintervalisthesamefor
allequalsizeintervals
3. Theprobabilityofasuccessisproportionaltothesizeof
theinterval.
4. Theprobabilityofmorethanonesuccessinaninterval
approaches0astheintervalbecomessmaller.

ThePoissonrandomvariableisthenumberofsuccesses
thatoccurinaperiodoftimeoranintervalofspaceina
Poissonexperiment. successes
E.g.Onaverage,96trucksarriveatabordercrossing
everyhour. time
period
E.g.Thenumberoftypographicerrorsinanewtextbook
editionaverages1.5per100pages.
successes
interval
(?!)
Poisson Probability Distribution
TheprobabilitythataPoissonrandomvariableassumesa
valueofxisgivenby:
andeisthenaturallogarithmbase.
FYI:

Example 7.12
Thenumberoftypographicalerrorsinneweditionsof
textbooksvariesconsiderablyfrombooktobook.After
someanalysisheconcludesthatthenumberoferrorsis
Poissondistributedwithameanof1.5per100pages.The
instructorrandomlyselects100pagesofanewbook.What
istheprobabilitythattherearenotypos?
Thatis,whatisP(X=0)giventhat=1.5?
Thereisabouta22%chanceoffindingzeroerrors
AsmentionedonthePoissonexperimentslide:
Theprobabilityofasuccessisproportionaltothesizeof
theinterval
Thus,knowinganerrorrateof1.5typosper100pages,we
candetermineameanvaluefora400pagebookas:
=1.5(4)=6typos/400pages.

Example 7.13
Fora400pagebook,whatistheprobabilitythatthereare
notypos?
P(X=0)=
thereisaverysmallchancetherearenotypos

Example 7.13
Excelisanevenbetteralternative:

Probability Density Functions
Unlikeadiscreterandomvariablewhichwestudiedin
Chapter7,acontinuousrandomvariableisonethatcan
assumeanuncountablenumberofvalues.
Wecannotlistthepossiblevaluesbecausethereisan
infinitenumberofthem.
Becausethereisaninfinitenumberofvalues,the
probabilityofeachindividualvalueisvirtually0.

Point Probabilities are Zero
Becausethereisaninfinitenumberofvalues,the
probabilityofeachindividualvalueisvirtually0.
Thus,wecandeterminetheprobabilityofarangeofvalues
only.
E.g.withadiscreterandomvariableliketossingadie,itis
meaningfultotalkaboutP(X=5),say.
Inacontinuoussetting(e.g.withtimeasarandomvariable),the
probabilitytherandomvariableofinterest,saytasklength,takes
exactly5minutesisinfinitesimallysmall,henceP(X=5)=0.
ItismeaningfultotalkaboutP(X5).
Probability Density Function
Afunctionf(x)iscalledaprobabilitydensityfunction(over
therangeaxbifitmeetsthefollowing
requirements:
1) f(x)0forallxbetweenaandb,and
f(x)
area=1
a b x
2) Thetotalareaunderthecurvebetweenaandbis1.0

The Normal Distribution
Thenormaldistributionisthemostimportantofall
probabilitydistributions.Theprobabilitydensityfunctionof
anormalrandomvariableisgivenby:
Itlookslikethis:
Bellshaped,
Symmetricalaroundthemean
The Normal Distribution
Importantthingstonote:
Thenormaldistributionisfullydefinedbytwoparameters:
itsstandarddeviationandmean
Thenormaldistributionisbellshapedand
symmetricalaboutthemean
Unliketherangeoftheuniformdistribution(axb)
Normaldistributionsrangefromminusinfinitytoplusinfinity
Standard Normal Distribution
Anormaldistributionwhosemeaniszeroandstandard
deviationisoneiscalledthestandardnormaldistribution.
0
1
Asweshallseeshortly,anynormaldistributioncanbe
convertedtoastandardnormaldistributionwithsimple
algebra.Thismakescalculationsmucheasier.
Calculating Normal Probabilities
Wecanusethefollowingfunctiontoconvertanynormal
randomvariabletoastandardnormalrandomvariable
Some advice:
always draw a
picture!

Example:Thetimerequiredtobuildacomputerisnormally
distributedwithameanof50minutesandastandard
deviationof10minutes:
0
Whatistheprobabilitythatacomputerisassembledina
timebetween45and60minutes?
Algebraicallyspeaking,whatisP(45<X<60)?
P(45<X<60)?
meanof50minutesanda
standarddeviationof10minutes

WecanuseTable3in
AppendixBtolookup
probabilitiesP(0<Z<z)
WecanbreakupP(.5<Z<1)into:
P(.5<Z<0)+P(0<Z<1)
Thedistributionissymmetricaroundzero,sowehave:
P(.5<Z<0)=P(0<Z<.5)
Hence:P(.5<Z<1)=P(0<Z<.5)+P(0<Z<1)
HowtouseTable3
ThistablegivesprobabilitiesP(0<Z<z)
Firstcolumn=integer+firstdecimal
Toprow=seconddecimalplace
P(0<Z<0.5)
P(0<Z<1)
P(.5<Z<1)=.1915+.3414=.5328

Using the Normal Table (Table 3)
WhatisP(Z>1.6)?
P(0 < Z < 1.6) = .4452
0 1.6
P(Z > 1.6) = .5 P(0 < Z < 1.6)
= .5 .4452
= .0548
WhatisP(Z<2.23)?
P(0 < Z < 2.23)
P(Z < -2.23) P(Z > 2.23)
-2.23 0 2.23
P(Z < -2.23) = P(Z > 2.23)
= .5 P(0 < Z < 2.23)
= .0129
WhatisP(Z<1.52)?
P(Z < 0) = .5 P(0 < Z < 1.52)
0 1.52
P(Z < 1.52) = .5 + P(0 < Z < 1.52)
= .5 + .4357
= .9357
WhatisP(0.9<Z<1.9)?
P(0 < Z < 0.9)
P(0.9 < Z < 1.9)
0 0.9 1.9
P(0.9 < Z < 1.9) = P(0 < Z < 1.9) P(0 < Z < 0.9)
=.4713 .3159
= .1554
Finding Values of Z
OtherZvaluesare
Z.05=1.645
Z.01=2.33

Using the values of Z
Becausez.025=1.96andz.025=1.96,itfollowsthatwecan
state
P(1.96<Z<1.96)=.95
Similarly
P(1.645<Z<1.645)=.90

Other Continuous Distributions
Threeotherimportantcontinuousdistributionswhichwillbe
usedextensivelyinlatersectionsareintroducedhere:
StudenttDistribution,
ChiSquaredDistribution,and
FDistribution.

Student t Distribution
Herethelettertisusedtorepresenttherandomvariable,
hencethename.ThedensityfunctionfortheStudentt
distributionisasfollows
(nu)iscalledthedegreesoffreedom,and
(Gammafunction)is(k)=(k1)(k2)(2)(1)

Student t Distribution
Inmuchthesamewaythatanddefinethenormal
distribution,,thedegreesoffreedom,definestheStudent
tDistribution:
Figure 8.24
Asthenumberofdegreesoffreedomincreases,thet
distributionapproachesthestandardnormaldistribution.
Determining Student t Values
Thestudenttdistributionisusedextensivelyinstatistical
inference.Table4inAppendixBlistsvaluesof
Thatis,valuesofaStudenttrandomvariablewithdegrees
offreedomsuchthat:
ThevaluesforAarepredetermined
criticalvalues,typicallyinthe
10%,5%,2.5%,1%and1/2%range.
Using the t table (Table 4) for
values
Forexample,ifwewantthevalueoftwith10degreesof
freedomsuchthattheareaundertheStudenttcurveis.05:
Area under the curve value (tA) : COLUMN
t.05,10
t.05,10=1.812
Degrees of Freedom : ROW

F Distribution
TheFdensityfunctionisgivenby:
F>0.Twoparametersdefinethisdistribution,andlike
wevealreadyseentheseareagaindegreesoffreedom.
isthenumeratordegreesoffreedomand
isthedenominatordegreesoffreedom.

Determining Values of F
Forexample,whatisthevalueofFfor5%oftheareaunder
therighthandtailofthecurve,withanumeratordegreeof
freedomof3andadenominatordegreeoffreedomof7?
Solution:usetheFlookup(Table6)
There are different tables
for different values of A.
Make sure you start with
the correct table!!
F.05,3,7=4.35
F.05,3,7
Denominator Degrees of Freedom : ROW
Numerator Degrees of Freedom : COLUMN
Determining Values of F
Forareasunderthecurveonthelefthandsideofthecurve,
wecanleveragethefollowingrelationship:
Paycloseattentiontotheorderoftheterms!

Chapter 9
Sampling Distributions
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc. 1.100

Sampling Distribution of the Mean
Afairdieisthrowninfinitelymanytimes,
withtherandomvariableX=#ofspotsonanythrow.
TheprobabilitydistributionofXis:
x 1 2 3 4 5 6
P(x) 1/6 1/6 1/6 1/6 1/6 1/6
andthemeanandvariancearecalculatedaswell:

Sampling Distribution of Two Dice
Asamplingdistributioniscreatedbylookingat
allsamplesofsizen=2(i.e.twodice)andtheirmeans
Whilethereare36possiblesamplesofsize2,thereareonly
11valuesfor,andsome(e.g.=3.5)occurmore
frequentlythanothers(e.g.=1).
Sampling Distribution of Two Dice
Thesamplingdistributionofisshownbelow:
6/36
P()
1.0 1/36 5/36
1.5 2/36
2.0 3/36
4/36
P()
2.5 4/36
3.0 5/36
3.5 6/36 3/36
4.0 5/36
4.5 4/36 2/36
5.0 3/36
5.5 2/36
6.0 1/36 1/36
1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0

Compare
ComparethedistributionofX
1 2 3 4 5 6 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0
withthesamplingdistributionof.
Aswell,notethat:

Central Limit Theorem
Thesamplingdistributionofthemeanofarandomsample
drawnfromanypopulationisapproximatelynormalfora
sufficientlylargesamplesize.
Thelargerthesamplesize,themorecloselythesampling
distributionofXwillresembleanormaldistribution.

Central Limit Theorem
Ifthepopulationisnormal,thenXisnormallydistributed
forallvaluesofn.
Ifthepopulationisnonnormal,thenXisapproximately
normalonlyforlargervaluesofn.
Inmanypracticalsituations,asamplesizeof30maybe
sufficientlylargetoallowustousethenormaldistribution
asanapproximationforthesamplingdistributionofX.

Sampling Distribution of the Sample
Mean
1.
2.
3.IfXisnormal,Xisnormal.IfXisnonnormal,Xis
approximatelynormalforsufficientlylargesamplesizes.
Note:thedefinitionofsufficientlylargedependsonthe
extentofnonnormalityofx(e.g.heavilyskewed;
multimodal)

Example 9.1(a)
Theforemanofabottlingplanthasobservedthattheamount
ofsodaineach32ouncebottleisactuallyanormally
distributedrandomvariable,withameanof32.2ouncesand
astandarddeviationof.3ounce.
Ifacustomerbuysonebottle,whatistheprobabilitythatthe
bottlewillcontainmorethan32ounces?

Example 9.1(a)
WewanttofindP(X>32),whereXisnormallydistributed
and=32.2and=.3
thereisabouta75%chancethatasinglebottleofsoda
containsmorethan32oz.

Example 9.1(b)
Theforemanofabottlingplanthasobservedthattheamount
ofsodaineach32ouncebottleisactuallyanormally
distributedrandomvariable,withameanof32.2ouncesand
astandarddeviationof.3ounce.
Ifacustomerbuysacartonoffourbottles,whatisthe
probabilitythatthemeanamountofthefourbottleswillbe
greaterthan32ounces?

Example 9.1(b)
WewanttofindP(X>32),whereXisnormallydistributed
with=32.2and=.3
Thingsweknow:
1) Xisnormallydistributed,thereforesowillX.
2) =32.2oz.
3)

Example 9.1(b)
Ifacustomerbuysacartonoffourbottles,whatisthe
probabilitythatthemeanamountofthefourbottleswillbe
greaterthan32ounces?
Thereisabouta91%chancethemeanofthefourbottles
willexceed32oz.

Graphically Speaking
mean=32.
2
what is the probability that one what is the probability that the
bottle will contain more than 32 mean of four bottles will exceed 32
ounces? oz?

Sampling Distribution: Difference of two
means
Thefinalsamplingdistributionintroducedisthatofthe
differencebetweentwosamplemeans.Thisrequires:
independentrandomsamplesbedrawnfromeachoftwo
normalpopulations
Ifthisconditionismet,thenthesamplingdistributionofthe
differencebetweenthetwosamplemeans,i.e.
willbenormallydistributed.
(note:ifthetwopopulationsarenotbothnormally
distributed,butthesamplesizesarelarge(>30),the
distributionofisapproximatelynormal)
Sampling Distribution: Difference of two
means
Theexpectedvalueandvarianceofthesampling
distributionofaregivenby:
mean:
standarddeviation:
(alsocalledthestandarderrorifthedifferencebetweentwo
means)
Estimation
Therearetwotypesofinference:estimationandhypothesis
testing;estimationisintroducedfirst.
Theobjectiveofestimationistodeterminetheapproximate
valueofapopulationparameteronthebasisofasample
statistic.
E.g.,thesamplemean()isemployedtoestimatethe
populationmean().

Estimation
Theobjectiveofestimationistodeterminetheapproximate
valueofapopulationparameteronthebasisofasample
statistic.
Therearetwotypesofestimators:
PointEstimator
IntervalEstimator

Point & Interval Estimation
Forexample,supposewewanttoestimatethemeansummer
incomeofaclassofbusinessstudents.Forn=25students,
iscalculatedtobe400$/week.
pointestimate intervalestimate
Analternativestatementis:
Themeanincomeisbetween380and420$/week.

Estimating when is known
the confidence
WeestablishedinChapter9: interval
the sample mean

is in the center of
Thus,theprobabilitythattheinterval: the interval
containsthepopulationmeanis1.Thisisa
confidenceintervalestimatorfor.

Four commonly used confidence
levels
ConfidenceLevel
cut & keep handy!
Table 10.1
Example 10.1
Acomputercompanysamplesdemandduringleadtimeover
25timeperiods:
235 374 309 499 253
421 361 514 462 369
394 439 348 344 330
261 374 302 466 535
386 316 296 332 334
Itsisknownthatthestandarddeviationofdemandoverlead
timeis75computers.Wewanttoestimatethemeandemand
overleadtimewith95%confidenceinordertosetinventory
levels

Example 10.1 CALCULATE
Inordertouseourconfidenceintervalestimator,weneedthe
followingpiecesofdata:
370.16 Calculatedfromthedata
1.96
75
Given
n 25
therefore:
Thelowerandupperconfidencelimitsare340.76and399.56.

Example 10.1 INTERPRET
Theestimationforthemeandemandduringleadtimelies
between340.76and399.56wecanusethisasinputin
developinganinventorypolicy.
Thatis,weestimatedthatthemeandemandduringleadtime
fallsbetween340.76and399.56,andthistypeofestimator
iscorrect95%ofthetime.Thatalsomeansthat5%ofthe
timetheestimatorwillbeincorrect.
Incidentally,themediaoftenrefertothe95%figureas19
timesoutof20,whichemphasizesthelongrunaspectof
theconfidencelevel.

Interval Width
Awideintervalprovideslittleinformation.
Forexample,supposeweestimatewith95%confidencethat
anaccountantsaveragestartingsalaryisbetween$15,000
and$100,000.
Contrastthiswith:a95%confidenceintervalestimateof
startingsalariesbetween$42,000and$45,000.
Thesecondestimateismuchnarrower,providingaccounting
studentsmorepreciseinformationaboutstartingsalaries.

Interval Width
Thewidthoftheconfidenceintervalestimateisafunctionof
theconfidencelevel,thepopulationstandarddeviation,and
thesamplesize

Selecting the Sample Size
Wecancontrolthewidthoftheintervalbydeterminingthe
samplesizenecessarytoproducenarrowintervals.
Supposewewanttoestimatethemeandemandtowithin5
units;i.e.wewanttotheintervalestimatetobe:
Since:
Itfollowsthat
Solveforntogetrequisitesamplesize!
Selecting the Sample Size
Solvingtheequation
thatis,toproducea95%confidenceintervalestimateofthe
mean(5units),weneedtosample865leadtimeperiods
(vs.the25datapointswehavecurrently).

Sample Size to Estimate a Mean
Thegeneralformulaforthesamplesizeneededtoestimatea
populationmeanwithanintervalestimateof:
Requiresasamplesizeofatleastthislarge:

Example 10.2
Alumbercompanymustestimatethemeandiameteroftrees
todeterminewhetherornotthereissufficientlumberto
harvestanareaofforest.Theyneedtoestimatethistowithin
1inchataconfidencelevelof99%.Thetreediametersare
normallydistributedwithastandarddeviationof6inches.
Howmanytreesneedtobesampled?

Example 10.2
Thingsweknow:
Confidencelevel=99%,therefore=.01
1
Wewant,henceW=1.
Wearegiventhat=6.
Example 10.2
Wecompute
Thatis,wewillneedtosampleatleast239treestohavea
99%confidenceintervalof 1

Nonstatistical Hypothesis Testing
Acriminaltrialisanexampleofhypothesistestingwithout
thestatistics.
Inatrialajurymustdecidebetweentwohypotheses.The
nullhypothesisis
H0:Thedefendantisinnocent
Thealternativehypothesisorresearchhypothesisis
H1:Thedefendantisguilty
Thejurydoesnotknowwhichhypothesisistrue.Theymust
makeadecisiononthebasisofevidencepresented.
Therearetwopossibleerrors.
ATypeIerroroccurswhenwerejectatruenullhypothesis.
Thatis,aTypeIerroroccurswhenthejuryconvictsan
innocentperson.
ATypeIIerroroccurswhenwedontrejectafalsenull
hypothesis.Thatoccurswhenaguiltydefendantisacquitted.

TheprobabilityofaTypeIerrorisdenotedas(Greek
letteralpha).TheprobabilityofatypeIIerroris(Greek
letterbeta).
Thetwoprobabilitiesareinverselyrelated.Decreasingone
increasestheother.

Thecriticalconceptsaretheses:
1.Therearetwohypotheses,thenullandthealternative
hypotheses.
2.Theprocedurebeginswiththeassumptionthatthenull
hypothesisistrue.
3.Thegoalistodeterminewhetherthereisenoughevidenceto
inferthatthealternativehypothesisistrue.
4.Therearetwopossibledecisions:
Concludethatthereisenoughevidencetosupportthe
alternativehypothesis.
Concludethatthereisnotenoughevidencetosupportthe
alternativehypothesis.
5.Twopossibleerrorscanbemade.
TypeIerror:Rejectatruenullhypothesis
TypeIIerror:Donotrejectafalsenullhypothesis.

P(TypeIerror)=
P(TypeIIerror)=

Concepts of Hypothesis Testing (1)
Therearetwohypotheses.Oneiscalledthenullhypothesis
andtheotherthealternativeorresearchhypothesis.The
usualnotationis:
pronounce
d
H nought
H0:thenullhypothesis
H1:thealternativeorresearchhypothesis
Thenullhypothesis(H0)willalwaysstatethattheparameter
equalsthevaluespecifiedinthealternativehypothesis(H1)
Concepts of Hypothesis Testing
ConsiderExample10.1(meandemandforcomputersduring
assemblyleadtime)again.Ratherthanestimatethemean
demand,ouroperationsmanagerwantstoknowwhetherthe
meanisdifferentfrom350units.Wecanrephrasethis
requestintoatestofthehypothesis:
H0:=350
Thus,ourresearchhypothesisbecomes:
This is what we are
H1:350 interested in
determining

Concepts of Hypothesis Testing (4)
Therearetwopossibledecisionsthatcanbemade:
Concludethatthereisenoughevidencetosupportthe
alternativehypothesis
(alsostatedas:rejectingthenullhypothesisinfavorofthe
alternative)
Concludethatthereisnotenoughevidencetosupportthe
alternativehypothesis
(alsostatedas:notrejectingthenullhypothesisinfavorof
thealternative)
NOTE:wedonotsaythatweacceptthenullhypothesis
Concepts of Hypothesis Testing
Oncethenullandalternativehypothesesarestated,thenext
stepistorandomlysamplethepopulationandcalculateatest
statistic(inthisexample,thesamplemean).
Iftheteststatisticsvalueisinconsistentwiththenull
hypothesiswerejectthenullhypothesisandinferthatthe
alternativehypothesisistrue.
Forexample,ifweretryingtodecidewhetherthemeanis
notequalto350,alargevalueof(say,600)wouldprovide
enoughevidence.Ifiscloseto350(say,355)wecouldnot
saythatthisprovidesagreatdealofevidencetoinferthatthe
populationmeanisdifferentthan350.
Types of Errors
ATypeIerroroccurswhenwerejectatruenullhypothesis
(i.e.RejectH0whenitisTRUE)
H0 T F
Reject I
Reject II
ATypeIIerroroccurswhenwedontrejectafalsenull
hypothesis(i.e.DoNOTrejectH0whenitisFALSE)
Recap I
1)Twohypotheses:H0&H1
2)ASSUMEH0isTRUE
3)GOAL:determineifthereisenoughevidencetoinferthat
H1isTRUE
4)Twopossibledecisions:
RejectH0infavorofH1
NOTRejectH0infavorofH1
5)Twopossibletypesoferrors:
TypeI:rejectatrueH0[P(TypeI)=]
TypeII:notrejectafalseH0[P(TypeII)=]

Example 11.1
Adepartmentstoremanagerdeterminesthatanewbilling
systemwillbecosteffectiveonlyifthemeanmonthly
accountismorethan$170.
Arandomsampleof400monthlyaccountsisdrawn,for
whichthesamplemeanis$178.Theaccountsare
approximatelynormallydistributedwithastandarddeviation
of$65.
Canweconcludethatthenewsystemwillbecosteffective?

Example 11.1
Thesystemwillbecosteffectiveifthemeanaccountbalance
forallcustomersisgreaterthan$170.
Weexpressthisbeliefasaourresearchhypothesis,thatis:
H1:>170(thisiswhatwewanttodetermine)
Thus,ournullhypothesisbecomes:
H0:=170(thisspecifiesasinglevalueforthe
parameterofinterest)
Example 11.1
Whatwewanttoshow:
H1:>170
H0:=170(wellassumethisistrue)
Weknow:
n=400,
=178,and
=65
Hmm.Whattodonext?!

Example 11.1
Totestourhypotheses,wecanusetwodifferentapproaches:
Therejectionregionapproach(typicallyusedwhen
computingstatisticsmanually),and
Thepvalueapproach(whichisgenerallyusedwitha
computerandstatisticalsoftware).
Wewillexplorebothinturn

Example 11.1 Rejection Region
Therejectionregionisarangeofvaluessuchthatifthetest
statisticfallsintothatrange,wedecidetorejectthenull
hypothesisinfavorofthealternativehypothesis.
isthecriticalvalueoftorejectH0.
Example 11.1
Allthatslefttodoiscalculateandcompareitto170.
wecancalculatethisbasedonanylevelof
significance()wewant

Example 11.1
Ata5%significancelevel(i.e.=0.05),weget
Solvingwecompute=175.34
Sinceoursamplemean(178)isgreaterthanthecriticalvaluewe
calculated(175.34),werejectthenullhypothesisinfavorofH1,i.e.
that:>170andthatitiscosteffectivetoinstallthenewbilling
system

Example 11.1 The Big Picture
H1:>170 =175.34
H0:=170
=178
RejectH0infavorof
Standardized Test Statistic
Aneasiermethodistousethestandardizedteststatistic:
andcompareitsresultto:(rejectionregion:z>)
Sincez=2.46>1.645(z.05),werejectH0infavorofH1

PLOT POWER CURVE

p-Value
Thepvalueofatestistheprobabilityofobservingatest
statisticatleastasextremeastheonecomputedgiventhat
thenullhypothesisistrue.
Inthecaseofourdepartmentstoreexample,whatisthe
probabilityofobservingasamplemeanatleastasextreme
astheonealreadyobserved(i.e.=178),giventhatthenull
hypothesis(H0:=170)istrue?
p-value

Interpreting the p-value
Thesmallerthepvalue,themorestatisticalevidenceexists
tosupportthealternativehypothesis.
Ifthepvalueislessthan1%,thereisoverwhelming
evidencethatsupportsthealternativehypothesis.
Ifthepvalueisbetween1%and5%,thereisastrong
Ifthepvalueisbetween5%and10%thereisaweak
Ifthepvalueexceeds10%,thereisnoevidencethat
supportsthealternativehypothesis.
Weobserveapvalueof.0069,hencethereis
overwhelmingevidencetosupportH1:>170.

Interpreting the p-value
Comparethepvaluewiththeselectedvalueofthe
significancelevel:
Ifthepvalueislessthan,wejudgethepvaluetobe
smallenoughtorejectthenullhypothesis.
Ifthepvalueisgreaterthan,wedonotrejectthenull
hypothesis.
Sincepvalue=.0069<=.05,werejectH0infavorofH1

Chapter-Opening Example
Theobjectiveofthestudyistodrawaconclusionaboutthe
meanpaymentperiod.Thus,theparametertobetestedisthe
populationmean.Wewanttoknowwhetherthereisenough
statisticalevidencetoshowthatthepopulationmeanisless
than22days.Thus,thealternativehypothesisis
H1:<22
Thenullhypothesisis
H0:=22

Theteststatisticis
x
z
/ n
Wewishtorejectthenullhypothesisinfavorofthe
alternativeonlyifthesamplemeanandhencethevalueof
theteststatisticissmallenough.Asaresultwelocatethe
rejectionregioninthelefttailofthesamplingdistribution.
Wesetthesignificancelevelat10%.


Rejectionregion: z z z.10 1.28
FromthedatainSSAwecompute
x
x

4,759
i
21.63
and 220 220
x 21.63 22
z .91
/ n 6 / 220
pvalue=P(Z<.91)=.5.3186=.1814

Conclusion:Thereisnotenoughevidencetoinferthatthe
meanislessthan22.
Thereisnotenoughevidencetoinferthattheplanwillbe
profitable.
SinceZ(.91)>Z.10(1.28)
WefailtorejectHo: > 22
at a 10% level of significance.

PLOT POWER CURVE


Right-Tail Testing
Calculatethecriticalvalueofthemean()andcompare
againsttheobservedvalueofthesamplemean()

Left-Tail Testing
Calculatethecriticalvalueofthemean()andcompare
againsttheobservedvalueofthesamplemean()

TwoTail Testing
Twotailtestingisusedwhenwewanttotestaresearch
hypothesisthataparameterisnotequal()tosomevalue

Example 11.2
AT&Tsarguesthatitsratesaresuchthatcustomerswont
seeadifferenceintheirphonebillsbetweenthemandtheir
competitors.Theycalculatethemeanandstandarddeviation
foralltheircustomersat$17.09and$3.87(respectively).
Theythensample100customersatrandomandrecalculatea
monthlyphonebillbasedoncompetitorsrates.
Whatwewanttoshowiswhetherornot:
H1:17.09.Wedothisbyassumingthat:
H0:=17.09
Example 11.2
Therejectionregionissetupsowecanrejectthenull
hypothesiswhentheteststatisticislargeorwhenitissmall.
statissmall statislarge
Thatis,wesetupatwotailrejectionregion.Thetotalarea
intherejectionregionmustsumto,sowedividethis
probabilityby2.
Example 11.2
Ata5%significancelevel(i.e.=.05),wehave
/2=.025.Thus,z.025=1.96andourrejectionregionis:
z<1.96orz>1.96
z.025 +z.025 z
0

Example 11.2
Fromthedata,wecalculate=17.55
Usingourstandardizedteststatistic:
Wefindthat:
Sincez=1.19isnotgreaterthan1.96,norlessthan1.96
wecannotrejectthenullhypothesisinfavorofH1.Thatis
thereisinsufficientevidencetoinferthatthereisa
differencebetweenthebillsofAT&Tandthecompetitor.
PLOT POWER CURVE

Summary of One- and Two-Tail
Tests
One-Tail Test Two-Tail Test One-Tail Test

(left tail) (right tail)

Inference About A Population[SIGMA
UNKNOWN]
Population

Sample
Inference
Statistic
Parameter
Wewilldeveloptechniquestoestimateandtestthree
populationparameters:
PopulationMean
PopulationVariance
PopulationProportionp

Inference With Variance Unknown
Previously,welookedatestimatingandtestingthe
populationmeanwhenthepopulationstandarddeviation()
wasknownorgiven:
Buthowoftendoweknowtheactualpopulationvariance?
Instead,weusetheStudenttstatistic,givenby:

Testing when is unknown
Whenthepopulationstandarddeviationisunknownandthe
populationisnormal,theteststatisticfortestinghypotheses
aboutis:
whichisStudenttdistributedwith=n1degreesof
freedom.Theconfidenceintervalestimatorofisgiven
by:

Example 12.1
Willnewworkersachieve90%ofthelevelofexperienced
workerswithinoneweekofbeinghiredandtrained?
Experiencedworkerscanprocess500packages/hour,thusif
ourconjectureiscorrect,weexpectnewworkerstobeable
toprocess.90(500)=450packagesperhour.
Giventhedata,isthisthecase?

Example 12.1 IDENTIFY
Ourobjectiveistodescribethepopulationofthenumbersof
packagesprocessedin1hourbynewworkers,thatiswe
wanttoknowwhetherthenewworkersproductivityismore
than90%ofthatofexperiencedworkers.Thuswehave:
H1:>450
Thereforewesetourusualnullhypothesisto:
H0:=450

Example 12.1 COMPUTE
Ourteststatisticis:
Withn=50datapoints,wehaven1=49degreesoffreedom.
Ourhypothesisunderquestionis:
H1:>450
Ourrejectionregionbecomes:
Thuswewillrejectthenullhypothesisinfavorofthe
alternativeifourcalculatedteststaticfallsinthisregion.
Fromthedata,wecalculate=460.38,s =38.83andthus:
Since
werejectH0infavorofH1,thatis,thereissufficient
evidencetoconcludethatthenewworkersareproducingat
morethan90%oftheaverageofexperiencedworkers.

Canweestimatethereturnoninvestmentforcompaniesthat
wonqualityawards?
Wearegivenarandomsampleofn=83suchcompanies.
Wewanttoconstructa95%confidenceintervalforthemean
return,i.e.whatis:??

Fromthedata,wecalculate:
Forthisterm
andso:

Check Requisite Conditions
TheStudenttdistributionisrobust,whichmeansthatifthe
populationisnonnormal,theresultsofthettestand
confidenceintervalestimatearestillvalidprovidedthatthe
populationisnotextremelynonnormal.
Tocheckthisrequirement,drawahistogramofthedataand
seehowbellshapedtheresultingfigureis.Ifahistogram
isextremelyskewed(sayinthecaseofanexponential
distribution),thatcouldbeconsideredextremely
nonnormalandhencetstatisticswouldbenotbevalidin
thiscase.

Inference About Population
Variance
Ifweareinterestedindrawinginferencesabouta
populationsvariability,theparameterweneedto
investigateisthepopulationvariance:
Thesamplevariance(s2)isanunbiased,consistentand
efficientpointestimatorfor.Moreover,
thestatistic,,hasachisquareddistribution,
withn1degreesoffreedom.
Testing & Estimating Population
Variance
Combiningthisstatistic:
Withtheprobabilitystatement:
Yieldstheconfidenceintervalestimatorfor:
lower confidence upper confidence

limit limit

Consideracontainerfillingmachine.Managementwantsa
machinetofill1liter(1,000ccs)sothatthatvarianceofthe
fillsislessthan1cc2.Arandomsampleofn=251literfills
weretaken.Doesthemachineperformasitshouldatthe5%
significancelevel?
Variance is less than 1 cc2

Wewanttoshowthat:
H1:<1
(soournullhypothesisbecomes:H0:=1).Wewilluse
thisteststatistic:

Sinceouralternativehypothesisisphrasedas:
H1:<1
WewillrejectH0infavorofH1ifourteststatisticfallsinto
thisrejectionregion:
Wecomputerthesamplevariancetobe:s2=.8088
re
Andthusourteststatistictakesonthisvalue
pa
m
co
Example 12.4
Aswesaw,wecannotrejectthenullhypothesisinfavorof
thealternative.Thatis,thereisnotenoughevidencetoinfer
thattheclaimistrue.
Note:theresultdoesnotsaythatthevarianceisgreaterthan
1,ratheritmerelystatesthatweareunabletoshowthatthe
varianceislessthan1.
Wecouldestimate(at99%confidencesay)thevarianceof
thefills

Inordertocreateaconfidenceintervalestimateofthe
variance,weneedtheseformulae:
lower confidence upper confidence

limit limit
weknow(n1)s2=19.41fromourpreviouscalculation,and
wehavefromTable5inAppendixB:

Comparing Two Populations
Previouslywelookedattechniquestoestimateandtest
parametersforonepopulation:
PopulationMean,PopulationVariance
Wewillstillconsidertheseparameterswhenwearelooking
attwopopulations,howeverourinterestwillnowbe:
Thedifferencebetweentwomeans.
Theratiooftwovariances.

Difference of Two Means
Inordertotestandestimatethedifferencebetweentwo
populationmeans,wedrawrandomsamplesfromeachof
twopopulations.Initially,wewillconsiderindependent
samples,thatis,samplesthatarecompletelyunrelatedtoone
another.
Becausewearecomparetwopopulationmeans,weusethe
statistic:

Sampling Distribution of
1.isnormallydistributediftheoriginalpopulations
arenormalorapproximatelynormalifthepopulationsare
nonnormalandthesamplesizesarelarge(n1,n2>30)
2.Theexpectedvalueofis
3.Thevarianceofis
andthestandarderroris:

Making Inferences About
Sinceisnormallydistributediftheoriginal
populationsarenormalorapproximatelynormalifthe
populationsarenonnormalandthesamplesizesarelarge(n1,
n2>30),then:
isastandardnormal(orapproximatelynormal)random
variable.Wecouldusethistobuildteststatisticsor
confidenceintervalestimatorsfor

Making Inferences About
exceptthat,inpractice,thezstatisticisrarelyusedsince
thepopulationvariancesareunknown.
??
Insteadweuseatstatistic.Weconsidertwocasesforthe
unknownpopulationvariances:whenwebelievetheyare
equalandconverselywhentheyarenotequal.

When are variances equal?
Howdoweknowwhenthepopulationvariancesareequal?
Sincethepopulationvariancesareunknown,wecantknow
forcertainwhethertheyreequal,butwecanexaminethe
samplevariancesandinformallyjudgetheirrelativevalues
todeterminewhetherwecanassumethatthepopulation
variancesareequalornot.

Test Statistic for (equal
variances)
1) Calculatethepooledvarianceestimatoras
2) anduseithere:
degrees of freedom

CI Estimator for (equal
variances)
Theconfidenceintervalestimatorforwhenthe
populationvariancesareequalisgivenby:
pooled variance estimator degrees of freedom

Test Statistic for (unequal
variances)
Theteststatisticforwhenthepopulationvariances
areunequalisgivenby:
degrees of freedom
Likewise,theconfidenceintervalestimatoris:

Twomethodsarebeingtestedforassemblingofficechairs.
Assemblytimesarerecorded(25timesforeachmethod).At
a5%significancelevel,dotheassemblytimesforthetwo
methodsdiffer?
Thatis,H1:
Hence,ournullhypothesisbecomes:H0:
Reminder:Thisisatwotailedtest.

Theassemblytimesforeachofthetwomethodsare
recordedandpreliminarydataisprepared
The sample variances are similar, hence we will assume that

the population variances are equal
Recall,wearedoingatwotailedtest,hencetherejection
regionwillbe:
Thenumberofdegreesoffreedomis:
Henceourcriticalvaluesoft(andourrejectionregion)
becomes:

Inordertocalculateourtstatistic,weneedtofirstcalculate
thepooledvarianceestimator,followedbythetstatistic

Sinceourcalculatedtstatisticdoesnotfallintotherejection
region,wecannotrejectH0infavorofH1,thatis,thereisnot
sufficientevidencetoinferthatthemeanassemblytimes
differ.

Excel,ofcourse,alsoprovidesuswiththeinformation
Compare
or look at p-value

Confidence Interval
Wecancomputea95%confidenceintervalestimateforthe
differenceinmeanassemblytimesas:
Thatis,weestimatethemeandifferencebetweenthetwo
assemblymethodsbetween.36and.96minutes.Note:zero
isincludedinthisconfidenceinterval
Matched Pairs Experiment
Previouslywhencomparingtwopopulations,weexamined
independentsamples.
If,however,anobservationinonesampleismatchedwith
anobservationinasecondsample,thisiscalledamatched
pairsexperiment.
Tohelpunderstandthisconcept,letsconsiderexample13.4

Identifying Factors
Factorsthatidentifythettestandestimatorof:

Inference about the ratio of two
variances
Sofarwevelookedatcomparingmeasuresofcentral
location,namelythemeanoftwopopulations.
Whenlookingattwopopulationvariances,weconsiderthe
ratioofthevariances,i.e.theparameterofinteresttousis:
Thesamplingstatistic:isFdistributedwith
degreesoffreedom.
Inference about the ratio of two
variances
Ournullhypothesisisalways:
H0:
(i.e.thevariancesofthetwopopulationswillbeequal,hence
theirratiowillbeone)
Therefore,ourstatisticsimplifiesto:
df1=n11
df2=n21

Inexample13.1,welookedatthevariancesofthesamples
ofpeoplewhoconsumedhighfibercerealandthosewhodid
notandassumedtheywerenotequal.Wecanusetheideas
justdevelopedtotestifthisisinfactthecase.
Wewanttoshow:H1:
(thevariancesarenotequaltoeachother)
Hencewehaveournullhypothesis:H0:

Sinceourresearchhypothesisis:H1:
Wearedoingatwotailedtest,andourrejectionregionis:

Ourteststatisticis:
.58 1.61 F
Hencethereissufficientevidencetorejectthenull
hypothesisinfavorofthealternative;thatis,thereisa
differenceinthevariancebetweenthetwopopulations.
WemayneedtoworkwiththeExceloutputbeforedrawing
conclusions
Our research hypothesis

H1:
requires two-tail testing,
but Excel only gives us values
for one-tail testing
If we double the one-tail p-value Excel gives us, we have the p-

value of
the test were conducting (i.e. 2 x 0.0004 = 0.0008). Refer to
the text and CD Appendices for more detail.

Stat Review - Keller

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Stat Review - Keller

Transféré par

Droits d'auteur :

Formats disponibles

What is Statistics?

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

The ogive can be used

What telephone bill

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Positive Linear Relationship Negative Linear Relationship

Weak or Non-Linear Relationship

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Sum of the observations

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

But, the red class has

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Note! the denominator is sample size (n) minus one !

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Approximately 95% of all observations fall

Approximately 99.7% of all observations fall

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

f we only have sufficient resources to sample 400 people total,

if we are sampling 1000 people, wed draw

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.