Académique Documents
Professionnel Documents
Culture Documents
Outline
CourseBriefing Introduction
WhatisStatistics Data&DataSources PopulationsandSamples Examples
Tabular&GraphicalMethodsofDataPresentation
FrequencyDistribution Histograms&ParetoCharts StemPlots&ScatterPlots
2
CourseInformation
COURSEINSTRUCTORS
Dr. Michael Li Dr. Chen Shaoxiang S3-B1A-19 S3-B2A-30 67904659 67906143 zfli@ntu.edu.sg aschen@ntu.edu.sg
COURSEASSESSMENT
Components
Coursework Final Examination (Open-book) Total
Marks
40% 60% 100%
Coursework Components
Class Participation Case Study (Group) Two In-Class Quizzes Sub-Total
Marks
20% 30% 50% 100%
COURSEDELIVERY
12lectures+12tutorials(pleasepayattentiontoMI mobilityinitiative) Twoinclassquizzes:duringTutorial7(week9,afterrecess)&Tutorial11 (week13)respectively Statisticalsoftwareknowledge(required):SPSS(averypowerful/useful statisticssoftware),Excel(addonforstatisticalanalysis),TreePlan (decision trees)
3
CourseCoverage
MakingSenseofDataandSummarizingData ConceptofProbability BayesTheorem RandomVariables&ProbabilityDistributions Binomial, Uniform,Normal,Covariance(AppendixB) DecisionAnalysis SamplingDistributions StatisticalInference:ConfidenceIntervals&HypothesisTesting DesignofExperiment&AnalysisofVariance RegressionModels Simple&MultipleRegressions Requiredtextbook
BruceL.Bowerman,RichardT.OConnellandEmilyS.Murphree. BusinessStatisticsinPractice,SixthEditionMcGrawHill/Irwin,2012
4
WhatIsStatistics?
1. 2. 3. Collecting Data
e.g., Survey
Presenting Data
e.g., Charts & Tables
Data Analysis
Why?
Characterizing Data
e.g., Average
Statistics is the science of data. It involves collecting, classifying, summarizing, organizing, analyzing, and interpreting numerical information.
19841994T/MakerCo.
Decision Making
BasicConcepts
Data:factsandfiguresfromwhich
conclusionscanbedrawn Dataset:thedatathatarecollectedfora particularstudy
Elements:maybepeople,objects, events,orotherentries
Timeseriesdata:datacollectedover differenttimeperiods
Mosteconomicsdataaretimeseries data,e.g.,inflation,unemployment rate,CPI,exchangerate,etc. Periodic(monthly,quarterly,oryearly) corporatesalesfiguresarealsotime seriesdata
6
CrossSectionalData SGExample
Source:SingaporePopulation2012(DepartmentofStatistics)
TimeSeriesData SGExample
DataSources
Existingsources(secondary):dataalreadygatheredby publicorprivatesources
Library Government Datacollectionagency Internet
Experimentalandobservationalstudies(primary):data thatwecollectourselvesforaspecificpurpose
Responsevariable:themainvariableofinterest,e.g.,salary Factors:othervariablesrelatedtoresponsevariable,e.g., education,experiences,etc.
9
DataSourcesfromNTULibrary
NTULibraryBusinessDatabases(someexamples):
CompustatGlobal
Currency,statement,balancesheet,flowoffunds,andsupplemental dataitemsdataoflistedglobalcompaniesfrom1989onwards
BusinessMonitorInternational
Countryrisksandbusinessenvironment
Datamonitor360
Intelligencesincompanies,industries,productsandcountries,etc.
GlobalMarketInformationDatabase(GMID)
Businessintelligenceoncountries,consumersandindustries
InternationalFinancialStatistics(IMF)
Statisticsonexchangerates,internationalreserves,banking,balance ofpayments,governmentfinances,prices,etcformostcountriesin theworld
10
SingaporeGovernmentDataSources
StatisticsSingapore
Economicdata,sectorleveldata,demographicdata, householdsurveydata,nationalcensusdata
HousingDevelopmentBoard(HDB)
Resaleflatprices
UrbanRedevelopmentBoard(URA)
Privateresidentialtransactions
LandTransportAuthority(LTA) Onemotoring
Vehiclepopulation,COEprices,realtimetrafficetc
SingaporeTourismBoard(STB)
Annual,quarterlyandmonthlytourismstatistics
11
KeyConcepts:PopulationsandSamples
Population Thesetofallelementsaboutwhichwewish todrawconclusions(people,objectsor events) Anexaminationoftheentirepopulationof measurements Aselectedsubsetoftheunitsofa population
Census
Sample
12
StatisticalMethods
Statistical Methods
Descriptive Statistics
Inferential Statistics
Example1:EstimatingCellPhoneCosts(p.8)
Abankwishestodecidewhethertohireacellular managementservicetochooseitsemployeescalling plans.
Over10,000employees,ondifferenttypesofcallingplans
CellPhoneCosts(cont.)
Selectingarandomsample(from2,136employees)
Arandomsampleof100employeeson500minuteplan Keyobservation:manyoverages andunderage
Excelfunction: Countif(range,criteria)
15
Example2:RatingaNewDesign
Abrandingcompanyisstudyingtoseeifchangesshould bemadeinthebottledesignforapopularsoftdrink.
Respondentsareshoppersfromalargeshoppingmallona particularSaturday Exposedtothenewbottledesignandaskedtorate:
Fiveitemswitha7pointLikertscale (surveyinstrument) Acompositescoreisthesumofallfiveitems Ruleofthumb:ascoreof25isthesmallestscoreforasuccess
16
RatingaNewDesign(cont.)
Samplingmethod:interceptionmethod
Notacompletelyrandomsample,butcangeneratean approximatelyrandomsample(how?) Asamplesizeof60
Worksheet: Design
Keyobservations:57of60(i.e.,95%%)compositescoresareatleast25
17
Example3:EstimatingCarGasMileage
Studyoftaxcreditofferedbythefederalgovernmentto automakersforimprovingfueleconomyofgasoline poweredmidsizecars Automakerhasintroducedanewmodelandwishesto demonstrateitqualifiesforthetaxcredit USEPAFuelEconomy:
http://www.epa.gov/fueleconomy/ Marketaverage:26milespergallon(mpg)(year2009) Taxincentivegoal:animprovementof5mpg,i.e.,atleast31 mpg
18
EstimatingCarGasMileage(cont.)
Anapproximatelyrandomsampleof50cars
Onecarfromeachof50consecutiveproductionshifts EachselectedcarissubjecttoanEPAtest
7.5milecitydrivingtrip&a10milehighwaydriving Acombinedmileageforthecar
Varyfrom29.8mpgto33.3mpg 38ourof50(76%)ofthemileagesaregreaterthan31mpg.
19
DataPresentationTechniques
GraphicallySummarizingQualitativeData
Frequencydistribution,barchart,piechart,Paretochart
GraphicallySummarizingQuantitativeData
Frequencydistribution,histograms,ogives
20
FrequencyDistributionforQualitativeData
Withqualitativedata,namesidentifythedifferent categories Thisdatacanbesummarizedusingafrequency distribution
Frequencydistribution:
Atablethatsummarizesthenumberofitemsineachofseveralnon overlappingclasses
21
Example2.1: 2006JeepPurchasingPatterns
Table2.1listsall251vehiclessoldin2006bytheJeepdealers
Itdoesnotrevealmuchusefulinformation
Afrequencydistributionisausefulsummary
SimplycountthenumberoftimeseachmodelappearsinTable2.1
Worksheet:JeepSales
22
RelativeFrequencyandPercentFrequency
Relativefrequencysummarizestheproportionof itemsineachclass
Foreachclass,dividethefrequencyoftheclassbythetotal numberofobservations Multiplyby100toobtainthepercentfrequency
Worksheet:JeepSales
23
BarChartsandPieCharts
Barchart:Averticalorhorizontalrectanglerepresents thefrequencyforeachcategory
Heightcanbefrequency,relativefrequency,orpercent frequency
24
ExcelBarandPieChartoftheJeepSalesData
Worksheet:JeepSales
25
ParetoChart
Paretochart:Abarcharthavingthedifferentkindsof defectslistedonthehorizontalscale
Barheightrepresentsthefrequencyofoccurrence Barsarearrangedindecreasingheightfromlefttoright Sometimesaugmentedbyplottingacumulativepercentage pointforeachbar
Worksheet:Labels
26
GraphicallySummarizingQuantitativeData
Oftenneedtosummarizeanddescribetheshapeof thedistribution Onewayistogroupthemeasurementsintoclasses ofafrequencydistributionand
Classifyandcount Thefrequencydistributionisatable
Thendisplaythedataintheformofahistogram
Thehistogramisapictureofthefrequencydistribution
27
ConstructingaFrequencyDistribution
Stepsinmakingafrequencydistribution:
1. 2. 3. 4. 5. Findthenumberofclasses Findtheclasslength Formnonoverlappingclassesofequalwidth Tallyandcount Graphthehistogram
Example2.2:Paymenttime Asampleof60observations,min=10days,max=65days
28
NumberofClasses&ClassLength
NumberofClasses
GroupallofthendataintoKnumber ofclasses Kisthesmallestwholenumberfor which2K n(aguideonly) InExamples2.2n=65
ForK=6,26 =64,<n ForK=7,27 =128,>n SouseK=7classes
Classlength
Findthelengthofeachclassasthe largestmeasurementminusthe smallestdividedbythenumberof classesfoundearlier(K) ForExample2.2,(2910)/7 = 2.7143
Becausepaymentsmeasuredindays, roundtothreedays
29
Histogram UsingExcel
25 20 15 10 5 0 10<13 13<16 3 14
Histogram
23
Histogram UsingSPSS
SPSSdatafile: Lect01PaymentTime.sav
Note:Moststatisticalsoftwaregenerateshistogramsautomatically sothereis nouniquehistogramsolongasthegraphshowsthedatapattern.
Histograms:ThreeGeneralCases
Symmetrical: Therightand lefttailsofthe histogram appeartobe mirrorimages ofeachother
CumulativeDistributions
Anotherwaytosummarizeadistributionistoconstructa cumulativedistribution Todothis,usethesamenumberofclasses,classlengths,and classboundariesusedforthefrequencydistribution Ratherthanacount,werecordthenumberofmeasurements thatarelessthantheupperboundaryofthatclass,inother words,arunningtotal.
33
Ogive
Ogive:Agraphofacumulative distribution
Plotapointaboveeachupper classboundaryatheightof cumulativefrequency Connectpointswithlinesegments Canalsobedrawnusing
Cumulativerelativefrequencies Cumulativepercentfrequencies
Worksheet:PayTime
34
StemandLeafDisplays
Purposeistoseetheoverallpatternofthedata,by groupingthedataintoclasses
thevariationfromclasstoclass theamountofdataineachclass thedistributionofthedatawithineachclass
Bestforsmalltomoderatelysizeddatadistributions
35
Thestemandleafdisplay:
29+0.8=29.8 298 3013455677888 310012334444455667778899 3201112334455778 33+0.3=33.3 3303
CarMileageExample
Lookingatthestemandleaf display,thedistributionappears almostsymmetrical Theupperportion(29,30,31)is almostamirrorimageofthelower portionofthedisplay(31,32,33) Butnotexactlyamirrorreflection
SPSSdatafile: Lect01GasMiles.sav
36
ConstructingaStemandLeafDisplay
Norulesthatdictatethenumberofstemvalues
Cansplitthestemsasneeded UseSPSS(Excelcannotgeneratestemplots)
SPSSdatafile:Lect01PaymentTime.sav
37
StemandLeafDisplay SPSS
Stemandleafdisplayfor PaymentTimedata
Stemandleafdisplayfor CarMileagedata
Note:StepandleafdisplaysareNOTunique!
Crosstabulation Tables
Classifiesdataontwodimensions
Rowsclassifyaccordingtoonedimension Columnsclassifyaccordingtoaseconddimension
1. 2. 3.
Requiresthreevariables
Therowvariable Thecolumnvariable Thevariablecountedinthecells
SPSScaneasilycreatecrosstabulationtables
39
Example2.5:InvestorSatisfaction
Therawdata:fundtype&satisfactionlevel
40
InvestorSatisfaction:Crosstabulation
Acrosstabulationtableoffundtypevs.satisfactionlevel
41
Crosstabulations UsingSPSS
AnalyzeDescrip veSta s csCrosstabs
SPSSdatafile: Lect01Invest.sav
42
ScatterPlots
Usedtostudyrelationshipsbetweentwovariables
Placeonevariableonthexaxis Placeasecondvariableontheyaxis Placedotonpaircoordinates
Software
Excel:easy&simple SPSS:easy&sophisticated!
TypesofRelationships
Linear:Astraightlinerelationshipbetweenthetwovariables
Positive:Whenonevariablegoesup,theothervariablegoesup Negative:Whenonevariablegoesup,theothervariablegoesdown
NoLinearRelationship:Thereisnocoordinatedlinearmovementbetween thetwovariables
43
ScatterPlots UsingExcel
Worksheet SalesPlot
44
EndofLecture1
NEXTLECTURE:CHAPTER3 DESCRIPTIVESTATISTICS
45