Introduction and Descriptive Statistics

1
Introduction and Descriptive

Using Statistics
Statistics
Percentiles and Quartiles
Measures of Central Tendency
Measures of Variability
Grouped Data and the Histogram
Skewness and Kurtosis
Relations between the Mean and Standard Deviation
Methods of Displaying Data
Exploratory Data Analysis
Using the Computer
1-1. Using Statistics (Two

Categories)
Descriptive Statistics
Collect
Organize
Summarize
Display
Analyze
Inferential Statistics
Predict and forecast
values of population
parameters
Test hypotheses about
values of population
parameters
Make decisions
Types of Data - Two Types
Qualitative Categorical or
Nominal: Examples
are Color
Gender
Nationality
Quantitative Measurable or
Countable: Examples
are Temperatures
Salaries
Number of points scored
on a 100 point exam
Scales of Measurement
Nominal Scale - groups or classes
Gender
Ordinal Scale - order matters
Ranks
Interval Scale - difference or distance
matters has arbitrary zero value.
Temperatures
Ratio Scale - Ratio matters has a
natural zero value.
Salaries
Samples and Populations
A population consists of the set of all

measurements for which the investigator
is interested.
A sample is a subset of the measurements
selected from the population.
A census is a complete enumeration of
every item in a population.
Simple Random Sample
Sampling from the population is often

done randomly, such that every possible
sample of equal size (n) will have an
equal chance of being selected.
A sample selected in this way is called a
simple random sample or just a random
sample.
A random sample allows chance to
determine its elements.
Samples and Populations
Population (N)
Sample (n)
Why Sample?
Census of a
population may
be:
Impossible
Impractical
Too costly
1-2 Percentiles and Quartiles
Given any set of numerical observations,

order them according to magnitude.
The Pth percentile in the ordered set is that
value below which lie P% (P percent) of the
observations in the set.
The position of the Pth percentile is given by
(n + 1)P/100, where n is the number of
observations in the set.
Example 1-2
A large department store
collects data on sales made by
each of its salespeople. The
number of sales made on a
given day by each of 20
salespeople is shown on the next
slide. Also, the data has been
sorted in magnitude.
Example 1-2 (Continued) - Sales

and Sorted Sales
Sales
9
6
12
10
13
15
16
14
14
16
17
16
24
21
22
18
19
18
20
17
Sorted Sales
6
9
10
12
13
14
14
15
16
16
16
17
17
18
18
19
20
21
22
24
Example 1-2 (Continued)

Percentiles
Find the 50th, 80th, and the 90th percentiles of this

data set.
To find the 50th percentile, determine the data point
in position (n + 1)P/100 = (20 + 1)(50/100) = 10.5.
Thus, the percentile is located at the 10.5th position.
The 10th observation is 16, and the 11th observation is
also 16.
The 50th percentile will lie halfway between the 10th
and 11th values and is thus 16.

Percentiles

in position (n + 1)P/100 = (20 + 1)(80/100) = 16.8.
The 16th observation is 19, and the 17th observation
is also 20.
The 80th percentile is a point lying 0.8 of the way
from 19 to 20 and is thus 19.8.

Percentiles

in position (n + 1)P/100 = (20 + 1)(90/100) = 18.9.
The 18th observation is 21, and the 19th observation
is also 22.
The 90th percentile is a point lying 0.9 of the way
from 21 to 22 and is thus 21.9.
Quartiles Special Percentiles
Quartiles are the percentage points that break down

the ordered data set into quarters.
The first quartile is the 25th percentile. It is the point
below which lie 1/4 of the data.
The second quartile is the 50th percentile. It is the
point below which lie 1/2 of the data. This is also
called the median.
The third quartile is the 75th percentile. It is the
point below which lie 3/4 of the data.
Quartiles and Interquartile Range

The first quartile, Q1, (25th percentile) is
often called the lower quartile.
The second quartile, Q , (50th
2
percentile) is often called median or the
middle quartile.
The third quartile, Q , (75th percentile)
3
is often called the upper quartile.
The interquartile range is the difference
between the first and the third
quartiles.
Example 1-3: Finding Quartiles

(n+1)P/100
Sales
9
6
12
10
13
15
16
14
14
16
17
16
24
21
22
18
19
18
20
17
Sorted
Sales
6
9
10
12
13
14
14
15
16
16
16
17
17
18
18
19
20
21
22
24
Quartiles
Position
(20+1)25/100=5.25
13 + (.25)(1) = 13.25
Median
(20+1)50/100=10.5
16 + (.5)(0) = 16
Third Quartile
(20+1)75/100=15.75
18+ (.75)(1) = 18.75
First Quartile
Example 1-3: Using the Template

(n+1)P/100
Quartiles
Example 1-3 (Continued): Using

(n+1)P/100
Quartiles
the Template
This is the lower part of the same
template from the previous slide.
Summary Measures: Population

Parameters Sample Statistics
Measures of Central
Tendency
Median
Mode
Mean
Measures of Variability
Range
Interquartile range
Variance
Standard Deviation
Other summary
measures:
Skewness
Kurtosis
1-3 Measures of Central

Tendency
or Location
Median
Middle value when

sorted in order of
magnitude
50th percentile
Mode
Most frequentlyoccurring value
Mean
Average
Example Median (Data is used

from Example 1-2)
Sales
9
6
12
10
13
15
16
14
14
16
17
16
24
21
22
18
19
18
20
17
Sorted Sales
6
9
10
12
13
14
14
15
16
16
16
17
17
18
18
19
20
21
22
24
See slide # 19 for the template output

Median
50th Percentile
(20+1)50/100=10.5
16 + (.5)(0) = 16
Median
The median is the middle

value of data sorted in
order of magnitude. It is
the 50th percentile.
Example - Mode (Data is used

from
Example
1-2)
..
..
..
.. .. .. .. :: .. :: :: :: .. .. .. ..
----------------------------------------------------------------------------------------------------------------------------10 12
1213
1314
1415
1516
1617
1718
1819
1920
2021
2122
22
66
9910
24
24
Mode = 16
The mode is the most frequently occurring value. It
is the value with the highest frequency.
Arithmetic Mean or Average

The mean of a set of observations is their average the sum of the observed values divided by the
number of observations.
Population Mean
Sample Mean
x
i 1
x
i 1
Example Mean (Data is used

Sales
9
6
12
10
13
15
16
14
14
16
17
16
24
21
22
18
19
18
20
17
317
from Example 1-2)

n
x
i 1
317
1585
.
20
Example - Mode (Data is used

from Example 1-2)
..
..
..
..
.. .. .. .. :: .. :: :: :: .. .. ..
----------------------------------------------------------------------------------------------------------------------------10 12
1213
1314
1415
1516
1617
1718
1819
1920
2021
21
66
9910
22 24
24
22
Mean = 15.85
Median and Mode = 16

1-4 Measures of Variability or

Dispersion
Range
Difference between maximum and minimum values
Interquartile Range
Difference between third and first quartile (Q3 - Q1)
Variance
Average*of the squared deviations from the mean
Standard Deviation
Square root of the variance
Definitions of population variance and sample variance differ slightly .
Example - Range and Interquartile

Range (Data is used from Example 1Sorted
- Minimum =
2) Range Maximum
Sales
Sales
Rank
24 - 6 = 18
9
6
12
10
13
15
16
14
14
16
17
16
24
21
22
18
19
18
20
17
6
9
10
12
13
14
14
15
16
16
16
17
17
18
18
19
20
21
22
24
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Minimum
First Quartile
Q1 = 13 + (.25)(1) = 13.25

Q3 = 18+ (.75)(1) = 18.75
Third Quartile
Maximum
Interquartile
Range
Q3 - Q1 =
18.75 - 13.25 = 5.5
Variance and Standard Deviation

Population Variance
(x )
2 i1
N
s
2
i 1
Sample Variance
( x)
N
2
i 1
(x x)
i 1
n 1
(
)
x
n
n
x
i 1
i 1
n 1
s s
Calculation of Sample Variance

x
6
9
10
12
13
14
196
14
15
16
16
16
17
17
18
18
19
20
21
22
24
317
xx
-9.85
-6.85
-5.85
-3.85
-2.85
-1.85
-1.85
-0.85
0.15
0.15
0.15
1.15
1.15
2.15
2.15
3.15
4.15
5.15
6.15
8.15
0
(x x) 2
97.0225
46.9225
34.2225
14.8225
8.1225
3.4225
36
81
100
144
3.4225
0.7225
0.0225
0.0225
0.0225
1.3225
1.3225
4.6225
4.6225
9.9225
17.2225
26.5225
37.8225
66.4225
378.5500
s
2
169
196
225
256
256
256
289
289
324
324
361
400
441
484
576
(x x)
i 1
n 1
378.55
(20 1)
378.55
19.923684
19
n
x
2
i 1
n
x
i 1
n 1
100489
317
5403
5403
20
20
19
20 1
2
5403 5024.45 378.55
19.923684
19
19
s s 19.923684 4.46
5403
Example: Sample Variance Using

(n+1)P/100
Quartiles
the Template
Note: This is
just a
replication
of slide #19.
1-5 Group Data and the

Histogram
Dividing data into groups or classes or

intervals
Groups should be:
Mutually exclusive
Not overlapping - every observation is assigned to
only one group
Exhaustive
Every observation is assigned to a group
Equal-width (if possible)

First or last group may be open-ended
Frequency Distribution
Table with two columns listing:

Each and every group or class or interval of values
Associated frequency of each group
Number of observations assigned to each group
Sum of frequencies is number of observations
N for population
n for sample
Class midpoint is the middle value of a group or

class or interval
Relative frequency is the percentage of total
observations in each class
Sum of relative frequencies = 1
Example 1-7: Frequency

Distribution
f(x)
xx
f(x)
SpendingClass
Class($)
($) Frequency
Frequency(number
(numberofofcustomers)
customers)
Spending
lessthan
than100
100
00totoless
100totoless
lessthan
than200
200
100
200totoless
lessthan
than300
300
200
300totoless
lessthan
than400
400
300
400totoless
lessthan
than500
500
400
500totoless
lessthan
than600
600
500
f(x)/n
f(x)/n
RelativeFrequency
Frequency
Relative
30
30
38
38
50
50
31
31
22
22
13
13
0.163
0.163
0.207
0.207
0.272
0.272
0.168
0.168
0.120
0.120
0.070
0.070
184
184
1.000
1.000
Example of relative frequency: 30/184 = 0.163

Sum of relative frequencies = 1
Cumulative Frequency
Distribution
F(x)
xx
F(x)
SpendingClass
Class($)
($) Cumulative
CumulativeFrequency
Frequency
Spending
lessthan
than100
100
00totoless
100totoless
lessthan
than200
200
100
200totoless
lessthan
than300
300
200
300totoless
lessthan
than400
400
300
400totoless
lessthan
than500
500
400
500totoless
lessthan
than600
600
500
30
30
68
68
118
118
149
149
171
171
184
184
F(x)/n
F(x)/n
CumulativeRelative
RelativeFrequency
Frequency
Cumulative
0.163
0.163
0.370
0.370
0.641
0.641
0.810
0.810
0.929
0.929
1.000
1.000
Thecumulative
cumulativefrequency
frequencyof
ofeach
eachgroup
groupisisthe
thesum
sumof
ofthe
the
The
frequenciesof
ofthat
thatand
andall
allpreceding
precedinggroups.
groups.
frequencies
Histogram
A histogram is a chart made of bars of

different heights.
Widths and locations of bars correspond to
widths and locations of data groupings
Heights of bars correspond to frequencies or
relative frequencies of data groupings
Histogram Example
Frequency Histogram
Histogram Example
Relative Frequency Histogram
1-6 Skewness and Kurtosis
Skewness
Measure of asymmetry of a frequency distribution
Skewed to left
Symmetric or unskewed
Skewed to right
Kurtosis
Measure of flatness or peakedness of a frequency
distribution
Platykurtic (relatively flat)
Mesokurtic (normal)
Leptokurtic (relatively peaked)
Skewness
Skewed to left
Skewness
Symmetric
Skewness
Skewed to right
Kurtosis
Platykurtic - flat distribution
Kurtosis
Mesokurtic - not too flat and not too peaked
Kurtosis
Leptokurtic - peaked distribution
1-7 Relations between the Mean

and Standard Deviation
Chebyshevs Theorem
Applies to any distribution, regardless of shape

Places lower limits on the percentages of observations
within a given number of standard deviations from the
mean
Empirical Rule
Applies only to roughly mound-shaped and
symmetric distributions
Specifies approximate percentages of observations
within a given number of standard deviations from the
mean
Chebyshevs Theorem
At least 1 k1 of the elements of any

distribution lie within k standard deviations
of the mean
2
At
least
1
1 3
75%
2
4 4
2
1
1 8
89%
2
9 9
3
1
1 15
1 2 1
94%
16
16
4
2
Lie
within
3
4
Standard
deviations
of the mean
Empirical Rule
For roughly mound-shaped and

symmetric distributions, approximately:
68%
95%
All
1 standard deviation
of the mean
Lie
within
2 standard deviations
of the mean
3 standard deviations
of the mean
1-8 Methods of Displaying Data
Pie Charts
Categories represented as percentages of total
Bar Graphs
Heights of rectangles represent group frequencies
Frequency Polygons
Height of line represents frequency
Ogives
Height of line represents cumulative frequency
Time Plots
Represents values over time
Pie Chart
Bar Chart
Fig. 1-11 Airline Operating Expenses and Revenues
12
Average Revenues
Average Expenses
10
8
6
American Continental Delta
Northwest Southwest United

A i r li n e
USAir
Frequency Polygon and Ogive

Relative Frequency Polygon
0.2
0.1
0.0
0
10
20
Sales
30
40
50
Cumulative Relative Frequency
Relative Frequency
0.3
Ogive
1.0
0.5
0.0
0
10
20
Sales
30
40
50
Time Plot
M o n thly S te e l P ro d uc tio n
(P ro b le m 1 -4 6 )
Millions of Tons
8.5
7.5
6.5
5.5
Month
J F M A M J J A S ON D J F M A M J J A S ON D J F M A M J J A S O
1-9 Exploratory Data Analysis EDA

Techniques to determine relationships and trends,
Techniques to determine relationships and trends,

identify outliers
outliers and
and influential
influential observations,
observations, and
and
identify
quickly describe
describe or
or summarize
summarize data
data sets.
sets.
quickly
Stem-and-Leaf Displays
Quick-and-dirty listing of all observations
Conveys some of the same information as a histogram
Box Plots
Median
Lower and upper quartiles
Maximum and minimum
Example 1-8: Stem-and-Leaf

Display
122355567
11 122355567
0111222346777899
22 0111222346777899
012457
33 012457
11257
44 11257
0236
55 0236
02
66 02
Box Plot
Elementsof
ofaaBox
BoxPlot
Plot
Elements
Outlier
Smallest data
point not
below inner
fence
Largest data point

Suspected
not exceeding
inner fence
outlier
Outer
Fence
Inner
Fence
Q1-1.5(IQR)
Q1-3(IQR)
Q1
Median
Interquartile Range
Q3
Inner
Fence
Q3+1.5(IQR)
Outer
Fence
Q3+3(IQR)
Example: Box Plot
1-10 Using the Computer The

Template Output
Using the Computer Template

Output for the Histogram

Output for Histograms for
Grouped Data
Using the Computer Template Output for

Frequency Polygons & the Ogive for
Grouped Data

Output for Two Frequency Polygons
for Grouped Data
Using the Computer Pie Chart

Template Output
Using the Computer Bar Chart

Template Output
Using the Computer Box Plot

Template Output
Using the Computer Box Plot

Template to Compare Two Data
Sets
Using the Computer Time Plot

Template
Using the Computer Time Plot

Comparison Template

Introduction and Descriptive Statistics

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Introduction and Descriptive Statistics

Transféré par

Droits d'auteur :

Formats disponibles

1

Introduction and Descriptive

1-1. Using Statistics (Two

Types of Data - Two Types

Samples and Populations

A population consists of the set of all

Simple Random Sample

Sampling from the population is often

Samples and Populations

1-2 Percentiles and Quartiles

Given any set of numerical observations,

Example 1-2 (Continued) - Sales

Example 1-2 (Continued)

Find the 50th, 80th, and the 90th percentiles of this

Example 1-2 (Continued)

To find the 80th percentile, determine the data point

Example 1-2 (Continued)

To find the 90th percentile, determine the data point

Quartiles Special Percentiles

Quartiles are the percentage points that break down

Quartiles and Interquartile Range

Example 1-3: Finding Quartiles

18+ (.75)(1) = 18.75

Example 1-3: Using the Template

Example 1-3 (Continued): Using

Summary Measures: Population

1-3 Measures of Central

Middle value when

Most frequentlyoccurring value

Example Median (Data is used

See slide # 19 for the template output

The median is the middle

Example - Mode (Data is used

Arithmetic Mean or Average

Example Mean (Data is used

from Example 1-2)

See slide # 19 for the template output

Example - Mode (Data is used

Median and Mode = 16

1-4 Measures of Variability or

Definitions of population variance and sample variance differ slightly .

Example - Range and Interquartile

See slide # 19 for the template output

Variance and Standard Deviation

Calculation of Sample Variance

5403 5024.45 378.55

Example: Sample Variance Using

1-5 Group Data and the

Dividing data into groups or classes or

Equal-width (if possible)

Table with two columns listing:

Class midpoint is the middle value of a group or

Example 1-7: Frequency

Example of relative frequency: 30/184 = 0.163

A histogram is a chart made of bars of

1-6 Skewness and Kurtosis

1-7 Relations between the Mean

Applies to any distribution, regardless of shape

At least 1 k1 of the elements of any

For roughly mound-shaped and

1-8 Methods of Displaying Data

American Continental Delta

Northwest Southwest United