Vous êtes sur la page 1sur 49

Chapter 2

Descriptive Statistics

2 - 1 Frequency Distributions
2 – 2 Displaying Data
2 – 3 Measures of the Center
2 - 4 Measures of Dispersion
2 – 5 Measures of Position
2 - 6 Bivariate Data

1
2 What is Statistics
Statistics: The collection, organizing, interpretation and presentation of data

2
2-1 Frequency Distributions

 Frequency Distribution
is table or graph that list the classes (or
categories) of values, along with frequencies
(or counts) of the number of values that fall
into each class

3
Qwerty Keyboard Word Ratings
2 2 5 1 2 6 3 3 4 2

4 0 5 7 7 5 6 6 8 10

7 2 2 10 5 8 2 5 4 2

6 2 6 1 7 2 7 2 3 8

1 5 2 5 2 14 2 2 6 3

1 7

4
Frequency Table of Qwerty Word Ratings
Cumulative
Class / Category Frequency
Frequency
0–2 20 20
3–5 14 34
6–8 15 49
9 – 11 2 51
12 - 14 1 52
Total 52 ---

Interpretation : In this sample, there are twenty word rating within


the interval of 0 – 2, fourteen word rating fall within the 3 – 5, fifteen
word rating fall within the interval of 6 – 8, two word rating fall within
the interval of 9 – 11 and only one word rating fall within the interval
of 12 – 14. 5
Lower Class Limits
are the smallest numbers that can actually belong to
different classes

Rating Frequency

0-2 20
Lower Class 3-5 14
Limits 6-8 15
9 - 11 2
12 - 14 1

6
Upper Class Limits
are the largest numbers that can actually belong to
different classes

Rating Frequency

0-2 20
Upper Class
3-5 14
Limits
6-8 15
9 - 11 2
12 - 14 1

7
Class Midpoints
midpoints of the classes
Rating Frequency

0- 1 2 20
Class
3- 4 5 14
Midpoints
6- 7 8 15
9 - 10 11 2
12 - 13 14 1

8
Class Width
is the difference between two consecutive lower
class limits or two consecutive class boundaries

Rating Frequency

3 0-2 20
3 3-5 14
Class Width 3 6-8 15
3 9 - 11 2
3 12 - 14 1

9
Guidelines For Frequency
Distributions
1. Be sure that the categories are mutually exclusive.

2. Include all categories, even if the frequency is zero.

3. Try to use the same width for all categories.

4. Select convenient numbers for category limits.

5.The sum of the category frequencies must equal the


number of original data values.

10
Example 2
Twenty students were asked how many hours they worked
per day. Their response, in hours, are as follows;

5 , 6, 3, 3, 2, 4, 7, 5, 2, 3, 5, 6, 5, 4, 4, 3, 5, 2, 5, 3.
Cumulative
Data Value Frequency
Frequency

11
Relative Frequency Distribution

category frequency
relative frequency =
sum of all frequencies

12
Relative Frequency Distribution
Relative
Rating Frequency Rating Frequency

0-2 20 0-2 38.5% 20/52 = 38.5%


3-5 14 3-5 26.9% 14/52 = 26.9%
6-8 15 6-8 28.8%
etc.
9 - 11 2 9 - 11 3.8%
12 - 14 1 12 - 14 1.9%

Total frequency = 52

13
Cumulative Frequency Distribution
Cumulative
Rating Frequency Rating Frequency

0-2 20 Less than 3 20


3-5 14 Less than 6 34
6-8 15
Cumulative
Less than 9 49
Frequencies
9 - 11 2 Less than 12 51
12 - 14 1 Less than 15 52

14
Frequency Distributions
Relative Cumulative
Rating Frequency Rating Frequency Rating Frequency

0-2 20 0-2 38.5% Less than 3 20

3-5 14 3-5 26.9% Less than 6 34

6-8 15 6-8 28.8% Less than 9 49

9 - 11 2 9 - 11 3.8% Less than 12 51

12 - 14 1 12 - 14 1.9% Less than 15 52

15
2-2 Visualizing Data

Histogram
a bar graph in which the horizontal scale
represents categories and the vertical scale
represents frequencies

16
Histogram of Qwerty Word Ratings

Rating Frequency

0-2 20
3-5 14
6-8 15
9 - 11 2
12 - 14 1

17
Relative Frequency Histogram
of Qwerty Word Ratings
Relative
Rating Frequency

0-2 38.5%
3-5 26.9%
6-8 28.8%
9 - 11 3.8%
12 - 14 1.9%

18
Histogram
and
Relative Frequency Histogram

19
Cummulative Frequency Histogram of
Qwerty Word Ratings
Cumulative
Rating Frequency

0-2 20
0-5 34
0-8 49
0 - 11 51
0 - 14 52

20
Example 2
Twenty students were asked how many hours they worked
per day. Their response, in hours, are as follows;
5 , 6, 3, 3, 2, 4, 7, 5, 2, 3, 5, 6, 5, 4, 4, 3, 5, 2, 5, 3.
Draw Histogram of frequency, cumulative frequency and relative frequency

Cumulative Relative
Data Value Frequency
Frequency Frequency

21
Stem-and Leaf Plot

Stem Leaves

Raw Data (Test Grades)


6 7
7 25
67 72 85 75 89
8 5899
9 09
89 88 90 99 100
10 0

Used to observe the distribution of data

22
Dot Plot

23
Pie Chart
PIE charts and Pareto
charts can illustrate the Firearms
(1400. 1.9%)
same data
Ingestion of food or object
(2900. 3.9%
Fire
(4200. 5.6%)
Motor vehicle
(43,500. 57.8%) Drowning
(4600. 6.1%)

Poison
(6400. 8.5%)

See HW problem #4d


Falls
(12,200. 16.2%)

Accidental Deaths by Type

24
2-3 Measures of Center
a value at the
center or middle
of a data set

25
Definitions
Mean
(Arithmetic Mean)
AVERAGE
the number obtained by adding the values
and dividing the total by the number of
values

26
Notation
 denotes the addition of a set of values

x is the variable usually used to represent the individual


data values

n represents the number of data values in a sample

N represents the number of data values in a population

27
Notation
x is pronounced ‘x-bar’ and denotes the mean of a set
of sample values
x
x =
n
µ is pronounced ‘mu’ and denotes the mean of all values in a
population
x
µ =
N
Calculators can calculate the mean of data
28
Definitions
 Median
the middle value when the original
data values are arranged in order of
increasing (or decreasing) magnitude

 not affected by an extreme value

29
6.72 3.46 3.60 6.44
3.46 3.60 6.44 6.72 (sorted)
(even number of values)
no exact middle -- shared by two numbers

3.60 + 6.44
2
MEDIAN is 5.02

6.72 3.46 3.60 6.44 26.70


3.46 3.60 6.44 6.72 26.70 (sorted)
(odd number of values)

exact middle MEDIAN is 6.44


30
Definitions
 Mode
the score that occurs most frequently
Bimodal
Multimodal
No Mode

31
Examples
a. 5 5 5 3 1 5 1 4 3 5 Mode is 5
b. 1 2 2 2 3 4 5 6 6 6 7 9 Bimodal - 2 and 6

c. 1 2 3 6 7 8 9 10 No Mode

32
Definitions
 Midrange
the value midway between the highest
and lowest values in the original data set

highest score + lowest score


Midrange =
2

33
Best Measure of Center
Advantages - Disadvantage
Measure How often Takes Affected
used? Every by
Value into Extreme
Account? Values?
Mean Most familiar Yes Yes
Median Commonly No No
Mode Sometimes No No
Midrange Rarely No Yes

34
Definitions
 Symmetric
Data is symmetric if the left half of its
histogram is roughly a mirror of its
right half.
 Skewed
Data is skewed if it is not symmetric
and if it extends more to one side than
the other.

35
Skewness

Mode = Mean = Median


SYMMETRIC

Sample data: 2 3 3 4 4 4 5 5 6
Median = 4
Mode = 4
Mean = 4
Frequencies are 1 2 3 2 1
36
Skewness

Mode = Mean = Median


SYMMETRIC

Mean Mode Mode Mean


Median Median

SKEWED LEFT SKEWED RIGHT


(negatively) (positively)
37
Waiting Times of Bank Customers
at Different Banks
in minutes

Summit Bank 6.5 6.6 6.7 6.8 7.1 7.3 7.4 7.7 7.7 7.7

National Bank 4.2 5.4 5.8 6.2 6.7 7.7 7.7 8.5 9.3 10.0

Summit Bank National Bank

Mean 7.15 7.15

Median 7.20 7.20

Mode 7.7 7.7

Midrange 7.10 7.10

38
Dotplots of Waiting Times
Summit Bank

National Bank

So how do we differentiate this data? We need measures


that describes how the data is dispersed.
39
2-4 Measures of Variation

Range

highest lowest
value value

40
Measures of Variation

Standard Deviation
a measure of variation of the scores
about the mean

(average deviation from the mean)

41
Sample Standard Deviation
Formula

 (x - x) 2

S=
n-1

42
Population Standard Deviation

 (x - µ) 2

 = N

43
Symbols
for Standard Deviation
Sample Population
Most textbook
s 
Some graphics
calculators Sx x
xn-1 x n
Some
non-graphics
calculators

Articles in professional journals and reports often use SD


for standard deviation and VAR for variance.

44
The Empirical Rule
(applies to bell-shaped distributions)
99.7% of data are within 3 standard deviations of the mean

95% within
2 standard deviations

68% within
1 standard deviation

34% 34%
2.4% 2.4%
0.1% 0.1%
13.5% 13.5%

x - 3s x - 2s x-s x x+s x + 2s x + 3s
45
Measures of Variation
Variance
standard deviation squared

Notation
} s 2
use square key
 2 on calculator

46
Variance
Which is the parameter and which is the statistic?

2  (x - x ) 2
Sample
s =
n-1 Variance

 (x - µ) 2

 2
= Population
Variance
N
47
Measures of Variation
(dispersion) and Measures of
the Center
Center Variation / Dispersion
Mean Range

Mode Standard Deviation

MidRange Variance

Median

48
Assignment No 1
Twenty students were asked how many hours they worked
per day. Their response, in hours, are as follows;

5 , 6, 3, 3, 2, 4, 7, 5, 2, 3, 5, 6, 5, 4, 4, 3, 5, 2, 5, 3.

• (a) Calculate Relative frequency and cumulative


frequency.
• (b) Draw histogram, dot plot, pie chart
• (c) Calculate the measure of center (mean, median, mode
and midrange)

49

Vous aimerez peut-être aussi