Académique Documents
Professionnel Documents
Culture Documents
Descriptive Statistics
2 - 1 Frequency Distributions
2 – 2 Displaying Data
2 – 3 Measures of the Center
2 - 4 Measures of Dispersion
2 – 5 Measures of Position
2 - 6 Bivariate Data
1
2 What is Statistics
Statistics: The collection, organizing, interpretation and presentation of data
2
2-1 Frequency Distributions
Frequency Distribution
is table or graph that list the classes (or
categories) of values, along with frequencies
(or counts) of the number of values that fall
into each class
3
Qwerty Keyboard Word Ratings
2 2 5 1 2 6 3 3 4 2
4 0 5 7 7 5 6 6 8 10
7 2 2 10 5 8 2 5 4 2
6 2 6 1 7 2 7 2 3 8
1 5 2 5 2 14 2 2 6 3
1 7
4
Frequency Table of Qwerty Word Ratings
Cumulative
Class / Category Frequency
Frequency
0–2 20 20
3–5 14 34
6–8 15 49
9 – 11 2 51
12 - 14 1 52
Total 52 ---
Rating Frequency
0-2 20
Lower Class 3-5 14
Limits 6-8 15
9 - 11 2
12 - 14 1
6
Upper Class Limits
are the largest numbers that can actually belong to
different classes
Rating Frequency
0-2 20
Upper Class
3-5 14
Limits
6-8 15
9 - 11 2
12 - 14 1
7
Class Midpoints
midpoints of the classes
Rating Frequency
0- 1 2 20
Class
3- 4 5 14
Midpoints
6- 7 8 15
9 - 10 11 2
12 - 13 14 1
8
Class Width
is the difference between two consecutive lower
class limits or two consecutive class boundaries
Rating Frequency
3 0-2 20
3 3-5 14
Class Width 3 6-8 15
3 9 - 11 2
3 12 - 14 1
9
Guidelines For Frequency
Distributions
1. Be sure that the categories are mutually exclusive.
10
Example 2
Twenty students were asked how many hours they worked
per day. Their response, in hours, are as follows;
5 , 6, 3, 3, 2, 4, 7, 5, 2, 3, 5, 6, 5, 4, 4, 3, 5, 2, 5, 3.
Cumulative
Data Value Frequency
Frequency
11
Relative Frequency Distribution
category frequency
relative frequency =
sum of all frequencies
12
Relative Frequency Distribution
Relative
Rating Frequency Rating Frequency
Total frequency = 52
13
Cumulative Frequency Distribution
Cumulative
Rating Frequency Rating Frequency
14
Frequency Distributions
Relative Cumulative
Rating Frequency Rating Frequency Rating Frequency
15
2-2 Visualizing Data
Histogram
a bar graph in which the horizontal scale
represents categories and the vertical scale
represents frequencies
16
Histogram of Qwerty Word Ratings
Rating Frequency
0-2 20
3-5 14
6-8 15
9 - 11 2
12 - 14 1
17
Relative Frequency Histogram
of Qwerty Word Ratings
Relative
Rating Frequency
0-2 38.5%
3-5 26.9%
6-8 28.8%
9 - 11 3.8%
12 - 14 1.9%
18
Histogram
and
Relative Frequency Histogram
19
Cummulative Frequency Histogram of
Qwerty Word Ratings
Cumulative
Rating Frequency
0-2 20
0-5 34
0-8 49
0 - 11 51
0 - 14 52
20
Example 2
Twenty students were asked how many hours they worked
per day. Their response, in hours, are as follows;
5 , 6, 3, 3, 2, 4, 7, 5, 2, 3, 5, 6, 5, 4, 4, 3, 5, 2, 5, 3.
Draw Histogram of frequency, cumulative frequency and relative frequency
Cumulative Relative
Data Value Frequency
Frequency Frequency
21
Stem-and Leaf Plot
Stem Leaves
22
Dot Plot
23
Pie Chart
PIE charts and Pareto
charts can illustrate the Firearms
(1400. 1.9%)
same data
Ingestion of food or object
(2900. 3.9%
Fire
(4200. 5.6%)
Motor vehicle
(43,500. 57.8%) Drowning
(4600. 6.1%)
Poison
(6400. 8.5%)
24
2-3 Measures of Center
a value at the
center or middle
of a data set
25
Definitions
Mean
(Arithmetic Mean)
AVERAGE
the number obtained by adding the values
and dividing the total by the number of
values
26
Notation
denotes the addition of a set of values
27
Notation
x is pronounced ‘x-bar’ and denotes the mean of a set
of sample values
x
x =
n
µ is pronounced ‘mu’ and denotes the mean of all values in a
population
x
µ =
N
Calculators can calculate the mean of data
28
Definitions
Median
the middle value when the original
data values are arranged in order of
increasing (or decreasing) magnitude
29
6.72 3.46 3.60 6.44
3.46 3.60 6.44 6.72 (sorted)
(even number of values)
no exact middle -- shared by two numbers
3.60 + 6.44
2
MEDIAN is 5.02
31
Examples
a. 5 5 5 3 1 5 1 4 3 5 Mode is 5
b. 1 2 2 2 3 4 5 6 6 6 7 9 Bimodal - 2 and 6
c. 1 2 3 6 7 8 9 10 No Mode
32
Definitions
Midrange
the value midway between the highest
and lowest values in the original data set
33
Best Measure of Center
Advantages - Disadvantage
Measure How often Takes Affected
used? Every by
Value into Extreme
Account? Values?
Mean Most familiar Yes Yes
Median Commonly No No
Mode Sometimes No No
Midrange Rarely No Yes
34
Definitions
Symmetric
Data is symmetric if the left half of its
histogram is roughly a mirror of its
right half.
Skewed
Data is skewed if it is not symmetric
and if it extends more to one side than
the other.
35
Skewness
Sample data: 2 3 3 4 4 4 5 5 6
Median = 4
Mode = 4
Mean = 4
Frequencies are 1 2 3 2 1
36
Skewness
Summit Bank 6.5 6.6 6.7 6.8 7.1 7.3 7.4 7.7 7.7 7.7
National Bank 4.2 5.4 5.8 6.2 6.7 7.7 7.7 8.5 9.3 10.0
38
Dotplots of Waiting Times
Summit Bank
National Bank
Range
highest lowest
value value
40
Measures of Variation
Standard Deviation
a measure of variation of the scores
about the mean
41
Sample Standard Deviation
Formula
(x - x) 2
S=
n-1
42
Population Standard Deviation
(x - µ) 2
= N
43
Symbols
for Standard Deviation
Sample Population
Most textbook
s
Some graphics
calculators Sx x
xn-1 x n
Some
non-graphics
calculators
44
The Empirical Rule
(applies to bell-shaped distributions)
99.7% of data are within 3 standard deviations of the mean
95% within
2 standard deviations
68% within
1 standard deviation
34% 34%
2.4% 2.4%
0.1% 0.1%
13.5% 13.5%
x - 3s x - 2s x-s x x+s x + 2s x + 3s
45
Measures of Variation
Variance
standard deviation squared
Notation
} s 2
use square key
2 on calculator
46
Variance
Which is the parameter and which is the statistic?
2 (x - x ) 2
Sample
s =
n-1 Variance
(x - µ) 2
2
= Population
Variance
N
47
Measures of Variation
(dispersion) and Measures of
the Center
Center Variation / Dispersion
Mean Range
MidRange Variance
Median
48
Assignment No 1
Twenty students were asked how many hours they worked
per day. Their response, in hours, are as follows;
5 , 6, 3, 3, 2, 4, 7, 5, 2, 3, 5, 6, 5, 4, 4, 3, 5, 2, 5, 3.
49