Vous êtes sur la page 1sur 125

Saint Josephs

University
BUSINESS STATISTICS

PURPOSE
Explain:
What Statistics is
The types of data
The different methods of sampling

INTRODUCTION

In God We Trust;

All Others Use Data


Information derives from the analysis of
data
Analysis refers to extracting larger
meaning from data to support evaluation
and decision making.
Data are also used as key inputs to
decision models.

Statistics
Statistics the science of collecting,
organizing, analyzing, interpreting, and
presenting data for the purpose of gaining
insight and making better decisions.
Applications abound in all business
disciplines, manufacturing and quality
control, health care, sports, and daily life.

Statistical Thinking
All work occurs in a system of
interconnected processes
Variation exists in all processes
systematic ways of doing things that
achieve desired results
Variation must be understood and
reduced

Types of Business Data


Balanced Scorecard

Financial Perspective profitability, revenue


growth, ROI, EPS,
Internal Perspective quality levels, productivity,
process yields, cycle time, cost,
Customer Perspective service levels,
satisfaction ratings, repeat business, complaints,

Innovation and Learning Perspective intellectua


assets, employee satisfaction, market innovation,
training effectiveness, supplier performance,

Using a Balanced Scorecard


Lagging measures (outcomes)
Leading measures (performance drivers)
Statistical relationships
Examples

Sears:

employee attitudes predict behavior, which predicts


customer retention, which predicts financial performance
IBM Rochester: causal relationships between people
skills, quality, customer satisfaction, and financial/market
share performance

Application Areas
Accounting

Auditing
Costing

Finance

Financial Trends
Forecasting

Management

Describe Employees
Quality Improvement

Marketing

Consumer Preferences
Marketing Mix Effects

Applications in
Business and Economics

Accounting
Public accounting firms use statistical
sampling procedures when conducting
audits for their clients.

Economics
Economists use statistical information
in making forecasts about the future of
the economy or some aspect of it.

Applications in
Business and
Economics
Marketing

Electronic point-of-sale scanners at


retail checkout counters are used to
collect data for a variety of marketing
research applications.

Production
A variety of statistical quality
control charts are used to monitor
the output of a production process.

Applications in
Business and
FinanceEconomics

Financial advisors use price-earnings ratios and


dividend yields to guide their investment
recommendations.

Statistical Methods
Statistical
Methods

Descriptive
Statistics

Inferential
Statistics

Predictive
Statistics

Statistical Methodology

Descriptive statistics collection,


organization, and description of data
Statistical inference drawing
conclusions about unknown
characteristics of a population based on
samples
Predictive statistics inferring future
values based on historical data

Descriptive Statistics

Involves
Collecting

Data
Summarizing Data
Presenting Data

Purpose: Describe
Data

50

25
0
Q1

Q2

Q3

Q4

X = 30.5 S2 = 113

Inferential Statistics

Involves Samples
Estimation
Hypothesis

Testing

Purpose
Make

Decisions
About Population
Characteristics
Based on a Sample

Population?

Predictive Statistics

Understanding relationships

Predicting future

Data and Data Sets


Data are the facts and figures collected, summarized,
analyzed, and interpreted.

The data collected in a particular study are referred


to as the data set.

Elements, Variables, and


Observations

The elements are the entities on which data are


collected.
A variable is a characteristic of interest for the eleme
elem
The set of measurements collected for a particular
element is called an observation.
The total number of data values in a data set is the
number of elements multiplied by the number of
variables.

Data, Data Sets,


Elements, Variables, and
Observations

Variable
Observatio
s
n
Elemen
Stock
Annual
Earn/
t
Names
Company
Exchange Sales($M) Share($)

Dataram
AMEX
EnergySouth
OTC
Keystone
NYSE
LandCare
NYSE
Psychemedics AMEX

73.10
74.00
365.70
111.40
17.60
Data Set

0.86
1.67
0.86
0.33
0.13

Collection
of
Data

Methods of Collecting Data


The reliability and accuracy of the data affect
the validity of the results of a statistical
analysis.
The reliability and accuracy of the data
depend on the method of collection.
Three of the most popular sources of
statistical data are:

Published

data
Observational studies
Experimental studies

Data Sources
Data
Sources
Primary

Experiment

Survey

Secondary

Observation

Published
(& On-Line)

Published Data
This

is often a preferred source of data due


toexample:
low cost and convenience.
For
Forexample:

Data
Datapublished
publishedby
bythe
theUS
US
Bureau of Census.
Bureau of Census.

For
Forexample:
example:

Published data isThe


found
as abstracts
printed
material,
ofofthe
TheStatistical
Statisticalabstracts
theUnited
UnitedStates,
States,
data
from
tapes, disks, and compiles
on
the
Internet.
compiles
data
fromprimary
primarysources
sources

Data

Compustat,
Compustat,sells
sellsvariety
varietyofoffinancial
financialdata
datatapes
tapes
compiled
compiledfrom
fromprimary
primarysources
sources

published by the organization that


has collected it is called PRIMARY DATA.

Data published by an organization different than the


organization that has collected it is called
SECONDARY DATA.

Data Sources

Existing Sources
Within a firm almost any department
Business database services Dow Jones & Co.

Government agencies - U.S. Department of Labor


Industry associations Travel Industry Association
of America
Special-interest organizations Graduate Managemen
Admission Counci
Internet more and more firms

Data Sources
In
In experimental
experimental studies
studies the
the variables
variables of
of interest
interest
Statistical
Studies Then
are
are first
first identified.
identified.
Then one
one or
or more
more factors
factors are
are
controlled
controlled so
so that
that data
data can
can be
be obtained
obtained about
about how
how
the
the factors
factors influence
influence the
the variables.
variables.
In
In observational
observational (nonexperimental)
(nonexperimental) studies
studies no
no
attempt
attempt is
is made
made to
to control
control or
or influence
influence the
the
variables
variables of
of interest.
interest.
a survey is a
good
example

Data Acquisition Considerations


Time Requirement

Searching for information can be time consuming


Information may no longer be useful by the time
is available.

Cost of Acquisition

Organizations often charge for information even


when it is not their primary business activity.

Data Errors

Using any data that happens to be available or


that were acquired with little care can lead to po
and misleading information.

Surveys
Surveys solicit information from people.
Surveys can be made by means of

personal

interview
telephone interview
self-administered questionnaire

Surveys
AAgood
goodquestionnaire
questionnairemust
mustbe
bewell
welldesigned:
designed:
Keep
Keepthe
thequestionnaire
questionnaireas
asshort
shortas
aspossible.
possible.
Ask
Askshort,simple,
short,simple,and
andclearly
clearlyworded
worded questions.
questions.
Start
Startwith
withdemographic
demographicquestions
questionstotohelp
help
respondents
respondentsget
getstarted
startedcomfortably.
comfortably.
Use
Usedichotomous
dichotomousand
andmultiple
multiplechoice
choice questions.
questions.
Use
Useopen-ended
open-endedquestions
questionscautiously.
cautiously.
Avoid
Avoidusing
usingleading-questions.
leading-questions.
Pretest
Pretestaaquestionnaire
questionnaireon
onaasmall
smallnumber
numberofofpeople.
people.
Think
Thinkabout
aboutthe
theway
wayyou
youintend
intendtotouse
usethe
the
collected
collecteddata
datawhen
whenpreparing
preparingthe
thequestionnaire.
questionnaire.

Survey Steps

Define purpose

Design
questionnaire

Select sample
design

Sample type
Sample size

Collect data
(field work)
Prepare data

Edit
Code

Analyze data
Interpret findings
Report results

Questionnaire Design

Question content
Mode of response
Question wording
Question sequence
Layout
Pilot testing

1984-1994 T/Maker Co.

Scales of Measurement
Scales
Scales of
of measurement
measurement include:
include:
Nominal

Interval

Ordinal

Ratio

The
The scale
scale determines
determines the
the amount
amount of
of information
information
contained
contained in
in the
the data.
data.
The
The scale
scale indicates
indicates the
the data
data summarization
summarization and
and
statistical
statistical analyses
analyses that
that are
are most
most appropriate.
appropriate.

Scales of Measurement
Nominal
Data
Data are
are labels
labels or
or names
names used
used to
to identify
identify an
an
attribute
attribute of
of the
the element.
element.
A
A nonnumeric
nonnumeric label
label or
or numeric
numeric code
code may
may be
be used.
used.

Scales of Measurement

Nominal
Example:
Example:
Students
Students of
of aa university
university are
are classified
classified by
by the
the
school
school in
in which
which they
they are
are enrolled
enrolled using
using aa
nonnumeric
nonnumeric label
label such
such as
as Business,
Business, Humanities,
Humanities,
Education,
Education, and
and so
so on.
on.

Alternatively,
Alternatively, aa numeric
numeric code
code could
could be
be used
used for
for
the
the school
school variable
variable (e.g.
(e.g. 11 denotes
denotes Business,
Business,
22 denotes
denotes Humanities,
Humanities, 33 denotes
denotes Education,
Education, and
and
so
so on).
on).

Scales of Measurement
Ordinal
The
The data
data have
have the
the properties
properties of
of nominal
nominal data
data and
and
the
the order
order or
or rank
rank of
of the
the data
data is
is meaningful
meaningful..
A
A nonnumeric
nonnumeric label
label or
or numeric
numeric code
code may
may be
be used.
used.

Scales of Measurement
Example:
Example:
Ordinal
Students
Students of
of aa university
university are
are classified
classified by
by their
their
class
class standing
standing using
using aa nonnumeric
nonnumeric label
label such
such as
as
Freshman,
Freshman, Sophomore,
Sophomore, Junior,
Junior, or
or Senior.
Senior.
Alternatively,
Alternatively, aa numeric
numeric code
code could
could be
be used
used for
for
the
the class
class standing
standing variable
variable (e.g.
(e.g. 11 denotes
denotes
Freshman,
Freshman, 22 denotes
denotes Sophomore,
Sophomore, and
and so
so on).
on).

Scales of Measurement
The
The data
data have
have the
the properties
properties of
of ordinal
ordinal data,
data, and
and
Interval
the
the interval
interval between
between observations
observations is
is expressed
expressed in
in
terms
terms of
of aa fixed
fixed unit
unit of
of measure.
measure.
Interval
Interval data
data are
are always
always numeric
numeric..

Scales of Measurement
Example:
Example:
Interval
Melissa
Melissa has
has an
an SAT
SAT score
score of
of 1205,
1205, while
while Kevin
Kevin
has
has an
an SAT
SAT score
score of
of 1090.
1090. Melissa
Melissa scored
scored 115
115
points
points more
more than
than Kevin.
Kevin.

Scales of Measurement
Ratio
The
The data
data have
have all
all the
the properties
properties of
of interval
interval data
data
and
and the
the ratio
ratio of
of two
two values
values is
is meaningful
meaningful..

Variables
Variables such
such as
as distance,
distance, height,
height, weight,
weight, and
and time
time
tim
use
use the
the ratio
ratio scale.
scale.

This
This scale
scale must
must contain
contain aa zero
zero value
value that
that indicates
indicates
that
that nothing
nothing exists
exists for
for the
the variable
variable at
at the
the zero
zero point
point
poin

Scales of Measurement
Example:
Example:
Ratio
Melissas
Melissas college
college record
record shows
shows 36
36 credit
credit hours
hours
earned,
earned, while
while Kevins
Kevins record
record shows
shows 72
72 credit
credit
hours
hours earned.
earned. Kevin
Kevin has
has twice
twice as
as many
many credit
credit
hours
hours earned
earned as
as Melissa.
Melissa.

How Are Data Measured?


Qualitative
1. Nominal Scale
2. Ordinal Scale

Quantitative
3. Interval Scale
4. Ratio Scale

How Are Data Measured?


1. Nominal Scale

Categories/Labels
e.g., Male-Female
Data is nonnumeric
or numeric
No Arithmetic
Operations
Count

2. Ordinal Scale

All of the above, plus


Ordering Implied
e.g., High-Low

Qualitative

3. Interval Scale

Equal Intervals
No True 0
Data is always numeric
e.g., Degrees Celsius
Arithmetic Operations
Multiples not meaningful

4. Ratio Scale

Properties of Interval Scale


True 0
Meaningful Ratios
e.g., Height in Inches

Quantitative

Discrete/Continuous??
What scale??
Discrete Continuous Nominal Ordinal Interval Ratio
Gender
Male, Female
Weight
123, 140.2, etc.
Auto Speed
78, 64, 45, etc.
Temperature
78, 33, 85, etc.
# of Siblings
0-2, 3-5, +6
Letter Grade
A, B, C, etc.

Discrete/Continuous??
What scale??
Discrete Continuous Nominal Ordinal Interval Ratio
Gender
Male, Female

Auto Speed
78, 64, 45, etc.

X
X

Temperature
78, 33, 85, etc.

Weight
123, 140.2, etc.

# of Siblings
0-2, 3-5, +6
Letter Grade
A, B, C, etc.

X
x

x
x
X
X
X

How Are Data Measured?


We will use most of the time ratio data.
Some of the time we will use nominal and
ordinal (qualitative or categorical). Rarely
do we use interval data.

Data Types
Data

Quantitative

Qualitative

(Numerical)

(Categorical)

Discrete

Continuous

Scales of Measurement
Data
Qualitative

Quantitativ
e

Numerical
Numerical

Nonnumerical
Nonnumerical

Numerical
Numerical

Nomina
Nomina Ordina
Ordina
ll
ll

Nominal
Nominal Ordinal
Ordinal

Interval
Interval Ratio
Ratio

Data Classification

Discrete or continuous

Attributes: discrete data obtained from counting


E.g., number of defects per unit of production,
percentage of on-time flight arrivals, number of
complaints per customer, percentage of top box
responses in a satisfaction survey

Variables: continuous numerical data obtained


from a measurement process
Delivery time, number of ounces in a bottle of beer,
monthly revenues, diameter of a drilled hole, balance in
your checking account, time spent on homework

Data Type Examples

Quantitative
Discrete

To How Many Magazines Do You Subscribe Currently? ___


(Number)
How many cars went through the toll booth?
Continuous

How Tall Are You? ___ (Inches)


How much does the box weigh?

Qualitative
Do You Own Savings Bonds? __ Yes __ No
What is your religion?

Data Classification

Type of Data
Cross-Sectional

measurements taken at

one time period


Time series data collected over time

Number of Variables
Univariate

data consisting of a single


variable to measure some entity
Multivariate data consisting of two or more
variables to measure some entity

Cross-Sectional/TimeSeries Data

Cross sectional data is collected at a certain


point in time

Marketing survey (observe preferences by gender,


age)
Test score in a statistics course
Starting salaries of an MBA program graduates

Time series data is collected over


successive points in time

Weekly closing price of gold


Amount of crude oil imported monthly

Data Classification

Type of Data
Cross-Sectional

measurements taken at

one time period


Time series data collected over time

Number of Variables
Univariate

data consisting of a single


variable to measure some entity
Multivariate data consisting of two or more
variables to measure some entity

Cross-Sectional,
Univariate

Cross-Sectional,
Multivariate

Time Series, Univariate

Time Series, Multivariate

Populations and Samples

Population all items of interest for a particular


decision or investigation

Sample a subset of a population

All married drivers in the U.S. over age 25


All individuals who do not own a cell phone
Nielsen samples of TV viewers
Accounting department samples of invoices for audits

Samples are used

To reduce costs of data collection


When a full census cannot be taken

Why Collect Data Samples?


1. Destruction of
Test Units
Quality Control

2. Accurate &
Reliable Results
3. Pragmatic Reasons
Time
Cost

Why Collect Data Samples?


In most cases, the primary reason for taking
a sample is because of time and costs.

Key Terms

Population (Universe)

Parameter

& Parameter

S in Sample
Summary Measure about Population
& Statistic

Sample

All Items of Interest

P in Population

Portion of Population

Statistic

Summary Measure about Sample

Standard Notation
Measure
Mean
Stand. Dev.

Sample

Population

Variance

Size

Statistical
Computer Packages

Typical Software

SAS
SPSS
MINITAB
Excel

Need Statistical
Understanding

Assumptions
Limitations

Sampling
Methods

Types of Samples
Type of
Sample
Non
Probability

Probability

Simple
Random
Judgement

Focus
Groups

Convenience

Systematic

Stratified

Cluster

Probability Samples
The two Rs
Representative of the population
Random

Simple Random Sample


1. Each Population Element
Has an Equal Chance of
Being Selected
2. Selecting 1 Subject Does
Not Affect Selecting
Others
3. May Use Random
Number Table, Lottery,
Fish Bowl

Simple Random Sampling


In simple random sampling all the samples
with the same size are equally likely to be
chosen.
To conduct random sampling

assign

a number to each element of the chosen


population (or use already given numbers),
randomly select the sample numbers
(members). Use a random numbers table, or a
software package.

Simple Random Sampling

Example
A government

income-tax auditor is responsible


for 1,000 tax returns.
The auditor will randomly select 40 returns to
audit.
Use Excels random number generator to
select the returns.

Solution
We generate 50 numbers between 1 and 1000 (we
need only 40 numbers, but the extra might be used if
duplicate numbers are generated.)

Simple Random Sampling


X(100)

50 numbers
uniformly distributed
between 0 and 1
50 Random numbers
between 0 and 1000,
each has a probability
of 1/1000 to be selected

0.3820002
0.3820002
0.1006806
0.1006806
0.5964843
0.5964843
0.8991058
0.8991058
0.8846095
0.8846095
0.9584643
0.9584643
0.0144963
0.0144963
0.4074221
0.4074221
0.8632466
0.8632466
0.1385846
0.1385846
0.2450331
0.2450331
..
..

Round-up

382.00018
382.00018
100.68056
100.68056
596.48427
596.48427
899.10581
899.10581
884.60952
884.60952
958.46431
958.46431
14.496292
14.496292
407.4221
407.4221
863.24656
863.24656
138.58455
138.58455
245.03311
245.03311
..
..

383
383 383
101
101 101
597
597 597 50 random uniformly
900
900 900 distributed whole885
885 885
959 numbers between
959 959
15 1 and 1000.
15 15
408
408 408
864
864 864The auditor should
139
139 139
246
246 246select 40 files numbered
.
.

..
..

383, 101, ...

Types of Samples
Type of
Sample
Non
Probability

Probability

Simple
Random
Judgement

Focus
Groups

Convenience

Systematic

Stratified

Cluster

Systematic Sample
IfIf aa sample
sample size
size of
of nn is
is desired
desired from
from aa population
population
containing
containing N
N elements,
elements, we
we might
might sample
sample one
one
element
element for
for every
every nn//N
N elements
elements in
in the
the population.
population.
We
We randomly
randomly select
select one
one of
of the
the first
first nn//N
N elements
elements
from
from the
the population
population list.
list.
We
We then
then select
select every
every nn//N
Nth
th element
element that
that follows
follows in
in
the
the population
population list.
list.

Systematic Sample
This
This method
method has
has the
the properties
properties of
of aa simple
simple random
random
sample,
sample, especially
especially ifif the
the list
list of
of the
the population
population
elements
elements is
is aa random
random ordering.
ordering.
Advantage
Advantage:: The
The sample
sample usually
usually will
will be
be easier
easier to
to
identify
identify than
than it
it would
would be
be ifif simple
simple random
random sampling
sampling
were
were used.
used.

Example
Example:: Selecting
Selecting every
every 100
100thth listing
listing in
in aa telephone
telephone
book
book after
after the
the first
first randomly
randomly selected
selected listing
listing

Types of Samples
Type of
Sample
Non
Probability

Probability

Simple
Random
Judgement

Focus
Groups

Convenience

Systematic

Stratified

Cluster

Stratified Sample
1. Divide Population into
Subgroups
Mutually Exclusive
Collectively Exhaustive
At Least 1 Common
Characteristic of Interest

2. Select Simple Random


Samples from
Subgroups

All Students

Commuters

Residents

Sample

Stratified Random
Sampling
The
The population
population is
is first
first divided
divided into
into groups
groups of
of
elements
elements called
called strata
strata..
Each
Each element
element in
in the
the population
population belongs
belongs to
to one
one and
and
only
only one
one stratum.
stratum.
Best
Best results
results are
are obtained
obtained when
when the
the elements
elements within
within
each
each stratum
stratum are
are as
as much
much alike
alike as
as possible
possible
(i.e.
(i.e. aa homogeneous
homogeneous group
group).
).

Stratified Random
Sampling
A
A simple
simple random
random sample
sample is
is taken
taken from
from each
each stratum.
stratum.
Formulas
Formulas are
are available
available for
for combining
combining the
the stratum
stratum
sample
sample results
results into
into one
one population
population parameter
parameter
estimate.
estimate.
Advantage
Advantage:: IfIf strata
strata are
are homogeneous,
homogeneous, this
this method
method
is
is as
as precise
precise as
as simple
simple random
random sampling
sampling but
but with
with
aa smaller
smaller total
total sample
sample size.
size.
Example
Example:: The
The basis
basis for
for forming
forming the
the strata
strata might
might be
be
department,
department, location,
location, age,
age, industry
industry type,
type, and
and so
so on.
on.

Stratified Random Sampling

There are several ways to build the stratified


sample. For example, keep the proportion of
each stratum in the population.
A sample of size 1,000 is to be drawn
Stratum
1
2
3
4

Income
under $15,000
15,000-29,999
30.000-50,000
over $50,000

Population proportion
25%
40%
30%
5%

Stratum size
250
400
300
50
Total 1,000

Types of Samples
Type of
Sample
Non
Probability

Probability

Simple
Random
Judgement

Quota

Convenience

Systematic

Stratified

Cluster

Cluster Sample

Divide Population
into Clusters

Companies (Clusters)

If Managers
are Elements then
Companies are Clusters

Select Clusters Randomly

Survey All or a Random


Sample of Elements in
Cluster

Sample

Cluster Sampling
The
The population
population is
is first
first divided
divided into
into separate
separate groups
groups
of
of elements
elements called
called clusters
clusters..
Ideally,
Ideally, each
each cluster
cluster is
is aa representative
representative small-scale
small-scale
version
version of
of the
the population
population (i.e.
(i.e. heterogeneous
heterogeneous group).
group).

A
A simple
simple random
random sample
sample of
of the
the clusters
clusters is
is then
then taken.
taken
All
All elements
elements within
within each
each sampled
sampled (chosen)
(chosen) cluster
cluster
form
form the
the sample.
sample.

Cluster Sampling
Example
Example:: A
A primary
primary application
application is
is area
area sampling,
sampling,
where
where clusters
clusters are
are city
city blocks
blocks or
or other
other well-defined
well-defined
areas.
areas.
Advantage
Advantage:: The
The close
close proximity
proximity of
of elements
elements can
can be
be
cost
cost effective
effective (i.e.
(i.e. many
many sample
sample observations
observations can
can be
be
obtained
obtained in
in aa short
short time).
time).
Disadvantage
Disadvantage:: This
This method
method generally
generally requires
requires aa
larger
larger total
total sample
sample size
size than
than simple
simple or
or stratified
stratified
random
random sampling.
sampling.

Types of Samples
Type of
Sample
Non
Probability

Probability

Simple
Random
Judgement

Focus
Groups

Convenience

Systematic

Stratified

Cluster

Nonprobability Samples
1. Judgement
Use Experience to Select Sample
e.g., Test Markets

2. Focus Groups
Involves an objective moderator who introduces a
topic to a group of respondents and directs their
discussion of it in a nonstructured and natural
fashion.

3. Convenience
Use Elements Most Available

Convenience Sampling
It
It is
is aa nonprobability
nonprobability sampling
sampling technique
technique.. Items
Items are
are
included
included in
in the
the sample
sample without
without known
known probabilities
probabilities
of
of being
being selected.
selected.
The
The sample
sample is
is identified
identified primarily
primarily by
by convenience
convenience..

Example
Example:: A
A professor
professor conducting
conducting research
research might
might use
use
student
student volunteers
volunteers to
to constitute
constitute aa sample.
sample.

Convenience Sampling

Advantage
Advantage:: Sample
Sample selection
selection and
and data
data collection
collection are
are
relatively
relatively easy.
easy.
Disadvantage
Disadvantage:: It
It is
is impossible
impossible to
to determine
determine how
how
representative
representative of
of the
the population
population the
the sample
sample is.
is.

Judgment Sampling

The
The person
person most
most knowledgeable
knowledgeable on
on the
the subject
subject of
of the
the
study
study selects
selects elements
elements of
of the
the population
population that
that he
he or
or
she
she feels
feels are
are most
most representative
representative of
of the
the population.
population.
It
It is
is aa nonprobability
nonprobability sampling
sampling technique
technique..
Example
Example:: A
A reporter
reporter might
might sample
sample three
three or
or four
four
senators,
senators, judging
judging them
them as
as reflecting
reflecting the
the general
general
opinion
opinion of
of the
the senate.
senate.

Judgment Sampling
Advantage
Advantage:: It
It is
is aa relatively
relatively easy
easy way
way of
of selecting
selecting aa
sample.
sample.

Disadvantage
Disadvantage:: The
The quality
quality of
of the
the sample
sample results
results
depends
depends on
on the
the judgment
judgment of
of the
the person
person selecting
selecting the
the
sample.
sample.

Example: St. Andrews


St. Andrews College receives
900 applications annually from
prospective students. The
application form contains
a variety of information
including the individuals
scholastic aptitude test (SAT) score and whether or not
the individual desires on-campus housing.

Example: St. Andrews


The director of admissions
would like to know the
following information:
the

average SAT score for


the 900 applicants, and
the proportion of
applicants that want to live on campus.

Example: St. Andrews


We will now look at three
alternatives for obtaining the
desired information.
Conducting a census of the
entire 900 applicants

Selecting a sample of 30
applicants, using a random number table

Selecting a sample of 30 applicants, using Excel

Conducting a Census

If the relevant data for the entire 900


applicants were in the colleges
database, the population parameters of
interest could be calculated using the
formulas presented in Chapter 3.

We will assume for the moment that


conducting a census is practical in this
example.

Conducting a Census

Population Mean SAT Score


x

900

Population Standard Deviation for SAT Score

990

2
(
x

)
i

900

80

Population Proportion Wanting On-Campus Housing


648
p
.72
900

Simple Random
Sampling

Now suppose that the necessary data on the


current years applicants were not yet entered in th
colleges database.

Furthermore, the Director of Admissions must obtain


estimates of the population parameters of interest
a meeting taking place in a few hours.

She decides a sample of 30 applicants will be used.


The applicants were numbered, from 1 to 900, as
their applications arrived.

Simple Random Sampling:


Using a Random Number Table

Taking a Sample of 30 Applicants

Because the finite population has 900 elements, w


will need 3-digit random numbers to randomly
select applicants numbered from 1 to 900.
We will use the last three digits of the 5-digit
random numbers in the third column of the
textbooks random number table, and continue
into the fourth column as needed.

Simple Random Sampling:


Using a Random Number Table

Taking a Sample of 30 Applicants

The numbers we draw will be the numbers


of the
applicants we will sample unless
the random number is greater than 900
or
the random number has already been
We will continue to draw random numbers until
used.
we have selected 30 applicants for our sample.

(We will go through all of column 3 and part of


column 4 of the random number table,
encountering
in the process five numbers greater than 900

Simple Random Sampling:


Using a Random Number Table

Use of Random Numbers for Sampling

3-Digit
Applicant
Random Number Included in Sample
744
No. 744
436
No. 436
865
No. 865
790
No. 790
835
No. 835
902
Number exceeds 900
190
No. 190
836
No. 836
. . . and so on

Simple Random Sampling:


Using a Random Number Table

Sample Data
No.
1
2
3
4
5
.
.
30

Random
Number
744
436
865
790
835
.
.
498

SAT
Score
Applicant
Conrad Harris 1025
Enrique Romero 950
Fabian Avante 1090
Lucila Cruz
Chan Chiang
930
.
.
.
.
Emily Morse
1010

Live OnCampus
Yes
Yes
No
1120
Yes
No
.
.
No

Simple Random Sampling:


Using a Computer

Taking a Sample of 30 Applicants

Computers can be used to generate random


numbers for selecting random samples.

For example, Excels function


= RANDBETWEEN(1,900)
can be used to generate random numbers betwee
1 and 900.

Then we choose the 30 applicants corresponding


to the 30 smallest random numbers as our sample
sampl

Point Estimation

x as Point Estimator of
x

29,910

997
30
30
i

s as Point Estimator of
s

(x x)
i

29

163,996
75.2
29

p as Point Estimator of p
p 20 30 .68

Note: Different random numbers would have


identified a different sample which would have
resulted in different point estimates.

Summary of Point Estimates


Obtained from a Simple Random Sample
Population
Parameter

Parameter
Value

= Population mean 990


SAT score
= Population std.
deviation for
SAT score

Point
Estimator

Point
Estimate

x = Sample mean 997


SAT score

80

s = Sample std. 75.2


deviation for
SAT score

.72
p = Population proportion wanting
campus housing

p = Sample pro.68
portion wanting
campus housing

Sampling Distribution ofx

Process of Statistical Inference


Population
with mean
=?

A simple random sample


of n elements is selected
from the population.

The value of x is used to


make inferences about
the value of .

The sample data


provide a value for
the sample meanx .

Key Terms

Population (Universe)

Parameter

& Parameter

S in Sample
Summary Measure about Population
& Statistic

Sample

All Items of Interest

P in Population

Portion of Population

Statistic

Summary Measure about Sample

Standard Notation
Measure
Mean
Stand. Dev.

Sample

Population

Variance

Size

Errors Due to Sampling

Sampling Error - occurs because sample


is taken instead of census
Errors

are due to chance


Equally likely to be too high or too low
Improve by increasing sample size

Nonsampling Error - Bias


A

directional error
Can not be reduced by increasing sample size

Sampling and Nonsampling errors


Two major types of errors can arise when a
sampling procedure is performed.
Sampling Error

Sampling

error refers to differences between the


sample and the population, because of the
specific observations that happen to be
selected.
Sampling error is expected to occur when
making a statement about the population based
on the sample taken.

Sampling Errors
Population income distribution

( population mean)
The sample mean falls here only because
Sampling error
certain randomly selected observations
were included in the sample.

x ( sample mean)

Non-sampling Errors
Non-sampling errors occur due to mistakes
made along the process of data acquisition
Increasing sample size will not reduce this
type of errors.
There are three types of Non-sampling
errors;

Errors

in data acquisition,
Non-response errors,
Selection bias.

Data Acquisition Error


Population

If this observation

Sample

Sampling error + Data acquisition error

is wrongly recorded here

then the sample mean is affected

Non-Response Error

Population

No response here... may lead to biased results here.

Sample

Selection Bias

Population

When parts of the population cannot be selected...

Sample

the sample cannot represent


the whole population.

Errors Due to Sampling


Sampling error is expected and is
OK!!
Nonsampling error is not expected
and is BAD!!

Major Understanding of Statistics


(The 2Rs)
In most cases, we do not know the value
of a population parameter, such as
(population mean).
So, we take a sample to obtain estimates
of the population parameters.

Major Understanding of Statistics


(The 2Rs)
We want the sample to be:
Representative of the population we are

interested in
Random: each member of the population
has an equal chance of being included

Classic Example
If you were going to conduct a
telephone poll for the national
election in Florida, who would you
call?
(not Katherine Harris nor Jeb Bush!)

Classic Example
You want a representative/random
sample. In this case, what does it
mean to be representative/random?

Classic Example
What is your population of interest?
Total Population
Registered Voters

Registered Voters who plan to VOTE!!

Major Understanding of Statistics


(Sample Statistics)
Lets say we are interested in the average
GMAT score of graduating MBA students in
the United States.
Obviously, N, the population size, is very large.
It would be too time consuming and costly
to survey everyone, so we take a sample.
And of course, this sample is 2R
(representative/random).

Major Understanding of Statistics


(Sample Statistics)
However, lets assume we know the
population average GMAT score (yes, I
know its unlikely, but lets assume) and
it happens to be 550 (i.e., = 550).
Now, we take a 2R sample of 2000
students (n = 2000).

Major Understanding of Statistics


(Sample Statistics)
QUESTION:
Does the sample mean, X , of these 2000
students have to equal 550 exactly??
YES???
NO???

Major Understanding of Statistics


(Sample Statistics)
QUESTION:
Does the sample mean, X , of these 2000
students have to equal 550 exactly??
YES???---WRONG!
NO???----RIGHT!!!

Major Understanding of Statistics


(Sample Statistics)
You are not sampling the entire population,
so it is unlikely the sample mean, X , will
equal the population mean, , exactly (i.e.,
unlikely = X ). BUT

Major Understanding of Statistics


(Sample Statistics)
We hope the sample mean is close, i.e.,
statistically close. And statistics tells us
what close is.
As a result, two important statistical
concepts follows..

Major Understanding of Statistics


(Sample Statistics)
(1) What if we took another 2R sample of
2000 graduating MBA students (NO, we
did not get the same 2000 students from
the 1st sample!). Will the sample mean
from the 2nd sample, X 2, equal the
sample mean from the 1st sample, X 1,
i.e., X 2 = X 1 ??
HIGHLY UNLIKELY

Major Understanding of Statistics


(Sample Statistics)
This phenomenon of closeness and changing
statistical values from sample to sample is
also true of other statistics, such as sample
standard deviation and sample variance.

Major Understanding of Statistics


(Sample Statistics)
(2) Again back to our average GMAT score of all
graduating MBA students in the United States.
What if we took another 2R sample, but this
time our sample size is 20,000 students (n =
20,000). We would expect that most likely the
sample mean from this sample of 20,000
would be closer and a better estimate of the
true population mean than the sample means
from the samples of 2000.

Vous aimerez peut-être aussi