Vous êtes sur la page 1sur 38

Business Statistics

Tenth Edition
Ken Black

Chapter 3

Descriptive Statistics

Copyright ©2020 John Wiley & Sons, Inc.


Learning Objectives
1. Apply various measures of central tendency – including the
mean, median, and mode – to a set of data.
2. Apply various measures of variability – including the range,
interquartile range, variance, and standard deviation (using
the empirical rule and Chebyshev’s theorem) – to a set of
data.
3. Describe a data distribution statistically and graphically
using skewness and box-and-whisker plots.
4. Use descriptive statistics as a business analytics tool to better
understand meanings and relationships in data so as to aid
businesspeople in making better decisions.
Copyright ©2020 John Wiley & Sons, Inc. 2
3.1 Measures of Central Tendency (1 of 9)
Measures of central tendency yield information about
the center (middle) of a group of numbers

Mode: the most frequently occurring value in a data set


• Applicable to all levels of data measurement
• Sometimes no mode exists or there is more than one
mode (bimodal or multimodal)
• Often used with nominal data (e.g., determining the most
common sizes of footwear)

Copyright ©2020 John Wiley & Sons, Inc. 3


3.1 Measures of Central Tendency (2 of 9)
Median: the middle value in an ordered array of numbers
• Array values in order
• The median of the array is the center number, or with an even
number of observations, the average of the middle two terms
• Advantage: not affected by extreme values, so often preferable
to the mean when the data includes some unusually large or
small observations (e.g., income in the U.S., house prices in a
given area)
• Disadvantage: it does not include all of the information in the
data
• Data measurement level must at least be ordinal

Copyright ©2020 John Wiley & Sons, Inc. 4


3.1 Measures of Central Tendency (3 of 9)
Arithmetic Mean: the average of a group of numbers
• Most common measure of central tendency
• Includes all information in the data set

• Population mean:    x i

x1  x2  x3    xN
N N

• Sample mean: x   x i

x1  x2  x3    xn
n n

Copyright ©2020 John Wiley & Sons, Inc. 5


3.1 Measures of Central Tendency (4 of 9)
Demonstration Problem 3.1: Calculate the mean, median, and mode
for the top 13 shopping centers in the UK
Shopping Center Size (1000 m2)
MetroCentre 190.0
Trafford Centre 180.9
Westfield Stratford City 175.0
Bluewater 155.7
Liverpool One 154.0
Westfield London 149.5
Merry Hill 148.6
Manchester Arndale 139.4
Meadowhall 139.4
Lakeside 133.8
St. David’s 130.1
Bullring 127.1
Eldon Square 125.4

Source: https://www.worldatlas.com/articles/the-largest-shoppingmalls-in-the-united-kingdom.html

Copyright ©2020 John Wiley & Sons, Inc. 6


3.1 Measures of Central Tendency (5 of 9)
Demonstration Problem 3.1 Solution:
Shopping Center Size (1000 m2)
• Mode: The mode is 139.4
MetroCentre 190.0
Trafford Centre 180.9
• Median: There are 13 shopping centers,
Westfield Stratford City 175.0
Bluewater 155.7
so the center observation is the 7th. The
Liverpool One 154.0 data are already ordered, so the median
Westfield London 149.5 is 148.6 (Merry Hill)

• Mean: μ   i  1,948.9  149.9


Merry Hill 148.6
x
Manchester Arndale 139.4
Meadowhall 139.4 N 13
Lakeside 133.8
St. David’s 130.1
Bullring 127.1
Eldon Square 125.4

Copyright ©2020 John Wiley & Sons, Inc. 7


3.1 Measures of Central Tendency (6 of 9)
Percentiles: measures of central tendency that divide a group of data into 100 parts
• At least n% of the data lie at or below the nth percentile, and at most (100 - n)% of the
data lie above the nth percentile
• Example: 90th percentile indicates that at least 90% of the data are equal to or less than
it, and 10% of the data lie above it

To calculate the Pth percentile,


1. Organize the numbers into an ascending-order array
 P 
2. Calculate the percentile location: i N
 100 
3. Determine the percentile by either:
a) If i is a whole number, the Pth percentile is the average of the value at the ith
location and the value at the (i + 1)st location
b) If i is not a whole number, the Pth percentile value is located at the whole-
number part of i + 1

Copyright ©2020 John Wiley & Sons, Inc. 8


3.1 Measures of Central Tendency (7 of 9)
Percentile Example:
• Suppose that you want to calculate the 80th percentile of 1240
numbers
80
• First order the numbers. Then i  1240  992
100
• Since this is a whole number, the 80th percentile will be the
average of the 992nd number and the 993rd number
• 80% of the data will lie below this number
• Percentiles are commonly used for standardized test scores like
the ACT and SAT

Copyright ©2020 John Wiley & Sons, Inc. 9


3.1 Measures of Central Tendency (8 of 9)
Quartiles: measures of central tendency that divide a group of data into four subgroups

25% of the data set is below the first quartile


50% of the data set is below the second quartile (also called the median)
75% of the data set is below the third quartile
100% of the data set is below the fourth quartile

To calculate quartiles, find the 25th, 50th, and 75th percentiles

Copyright ©2020 John Wiley & Sons, Inc. 10


3.1 Measures of Central Tendency (9 of 9)
Quartile Example:
• Suppose that you want to calculate quartiles for the following set of numbers: 106, 109,
114, 116, 121, 122, 125, 129
25
 Q1  P25  8   2 Since this is a whole number, average the 2nd and 3rd numbers
100
109  114
Q1   111.5
2
50
 Q2  P50  8   4 Since this is a whole number, average the 4th and 5th numbers
100
116  121
Q2   118.5 This is the median
2
75
 Q3  P75  8   6 Since this is a whole number, average the 6th and 7th numbers
100
122  125
Q3   123.5
2
Copyright ©2020 John Wiley & Sons, Inc. 11
3.2 Measures of Variability (1 of 16)
Measures of variability: describe the spread or dispersion of a set
of data
• Distributions may have the same mean but different variability

Copyright ©2020 John Wiley & Sons, Inc. 12


3.2 Measures of Variability (2 of 16)
Range: the difference between the largest and the smallest values in a set of data
• Advantage – easy to compute
• Disadvantage – affected by extreme values
TABLE 3.1: Offer Prices for the 20 Largest U.S. Initial Public Offerings in a Recent
Year

$14.25 $19.00 $11.00 $28.00

24.00 23.00 43.25 19.00

27.00 25.00 15.00 7.00

34.22 15.50 15.00 22.00

19.00 19.00 27.00 21.00

Range = $43.25 − $7.00 = $36.25

Copyright ©2020 John Wiley & Sons, Inc. 13


3.2 Measures of Variability (3 of 16)
Country Exports ($ billions)

Interquartile range: range of values Canada 282.3

between the first and third quartile Mexico 243.3


China 129.9
• Range of the “middle half”; middle Japan 67.6

50% United Kingdom 56.3


Germany 53.9
• Useful when analysts are interested in South Korea 48.3
the middle 50% and not the extremes Netherlands 41.5
Hong Kong 39.9
• The following data indicate the top 15 Brazil 37.2
trading partners of the United States France 33.6
in exports in 2017 according to the Belgium 29.9

U.S. Census Bureau Singapore 29.8


Taiwan 25.7
India 25.7

Copyright ©2020 John Wiley & Sons, Inc. 14


3.2 Measures of Variability (4 of 16)
25
 Q1  P25  15   3.75 Since this is not Country Exports ($ billions)
100 Canada 282.3
a whole number, use the 4th term Mexico 243.3
China 129.9
from the bottom: Q1 = 29.9 Japan 67.6
75
 Q3  P75  15  11.25 Since this is not United Kingdom 56.3

100 Germany 53.9

a whole number, use the 12th term South Korea 48.3


Netherlands 41.5
from the bottom: Q3 = 67.6 Hong Kong 39.9
Brazil 37.2
• Thus the interquartile range is: France 33.6
Belgium 29.9
Q3 − Q1 = 67.6 – 29.9 =37.7 Singapore 29.8

• The middle 50% of the exports for Taiwan 25.7


India 25.7
the top 15 U.S. trading partners
spans a range $37.7 billion
Copyright ©2020 John Wiley & Sons, Inc. 15
3.2 Measures of Variability (5 of 16)
Introduction to Variance and Standard Deviation
• Suppose that a company has the following data from five weeks of computer
production (machines per week):
5, 9, 16, 17, 18
• The owner could calculate the mean of the data, which is 13
• But there is significant variability from week to week. How can we characterize this
weekly variability?
o Subtracting the mean from each observation
gives the deviation from the mean X X−µ
5 −8
o But the sum of the deviations will always add 9 −4
up to zero 16 3
o One option is to take the absolute value of 17 4
each deviation, but this is often not ideal 18 5
o Thus, we compute the variance

Copyright ©2020 John Wiley & Sons, Inc. 16


3.2 Measures of Variability (6 of 16)
Variance: average of the squared deviations about the
arithmetic mean for a set of numbers
Population variance X X−µ
5 −8
  xi   
2
9 −4
2
  16 3
N
17 4
• For this data, σ 2  130  26 18 5
5
• Since the variance is computed from squared deviations, the result is
measured in squared units
o In the computer example the variance is 26 machines squared, which is
problematic to interpret

Copyright ©2020 John Wiley & Sons, Inc. 17


3.2 Measures of Variability (7 of 16)
Standard Deviation: square root of the variance
Population standard deviation X X−µ (X − µ)2
5 −8 64
x  
2
9 −4 16
 i
16 3 9
N 17 4 16
130 18 5 25
• For this data,    5.1 Total 0 130
5
• The average deviation from mean production of computers is 5.1
machines
• Closely related to the variance but more easily interpretable
• The standard deviation allows us to apply the empirical rule and
Chebyshev’s Theorem
Copyright ©2020 John Wiley & Sons, Inc. 18
3.2 Measures of Variability (8 of 16)
The Empirical Rule
Used to state the approximate percentage of values that lie within a given
number of standard deviations from the mean of a set of data if the data
are normally distributed

Distance from the Mean Values within Distance


μ ± 1σ 68%
μ ± 2σ 95%
μ ± 3σ 99.7%

• Data must be normally distributed


• Since this is common for many things, the empirical rule is widely used

Copyright ©2020 John Wiley & Sons, Inc. 19


3.2 Measures of Variability (9 of 16)
The Empirical Rule: Example
The average price of a gallon of gasoline in California is normally
distributed with a mean of $3.24 and standard deviation $0.08

Copyright ©2020 John Wiley & Sons, Inc. 20


3.2 Measures of Variability (10 of 16)
Chebyshev’s Theorem
1
At least 1 2 values will fall within  k standard deviations
k
of the mean regardless of the distribution, for k > 1
• Unlike the empirical rule, data can have any distribution
• Chebyshev’s theorem tells us at least what percentage of the
data will lie within a certain range; if the distribution is closer to
normal, the actual amount will be greater
• For example, 75% of data will lie within 2 standard deviations
of the data, no matter how the data is distributed

Copyright ©2020 John Wiley & Sons, Inc. 21


3.2 Measures of Variability (11 of 16)
Sample Variance and Standard Deviation
The sample variance and standard deviation are used as
estimators of the population values
  xi  x 
2

S2 
n 1

  xi  x 
2

S
n 1
• The denominator is (n − 1) rather than N, which makes the
sample statistics unbiased estimators of the population
parameters
Copyright ©2020 John Wiley & Sons, Inc. 22
3.2 Measures of Variability (12 of 16)
Example: Partners in Accounting Firms
An analyst takes a sample of the number of partners in six of the
largest accounting firms in the U.S. What are the sample variance
and sample standard deviation?
Firm Number of Partners
PricewaterhouseCoopers 3327
Ernst & Young 3200
Deloitte 3135
KPMG 2178
RSM US 799
CliftonLarsonAllen 735

Copyright ©2020 John Wiley & Sons, Inc. 23


3.2 Measures of Variability (13 of 16)
Example: Partners in Accounting Firms

13,374
xi xi  x  2 x
6
 2229.00

3327 1,205,604 x  x


2
i
s 2

n 1
3200 942,841
7, 248,818

3135 820,836 5

2178 2,601  1,449,763.6

799 2,044,900   xi  x 
2

s
n 1
735 2,232,036
 1, 449,763.6
TOTAL 13,374 7,248,818  1, 204.06

Copyright ©2020 John Wiley & Sons, Inc. 24


3.2 Measures of Variability (14 of 16)
z Scores: represent the number of standard deviations a
value (x) is above or below the mean for normally
distributed data

• For a population: • For a sample:


x xx
z z
 s

• Negative z scores indicate that the raw value (x) is below the
mean; positive z scores indicate x values above the mean

Copyright ©2020 John Wiley & Sons, Inc. 25


3.2 Measures of Variability (15 of 16)
For a normally distributed population with mean of 50 and a
standard deviation of 10, an x value of 70 would have a z score of 2

70  50
z 2
10

This z score signifies that 70 is


2 standard deviations above the
mean

Copyright ©2020 John Wiley & Sons, Inc. 26


3.2 Measures of Variability (16 of 16)
The Coefficient of Variation: ratio of the standard deviation to the mean expressed as a
percentage

CV  100 

• Example: suppose that five weeks of average prices for Stock A have a mean of 64.40
and a standard deviation of 4.84
4.84
CVA  100   0.075 or 7.5%
64.40
• Further suppose that five weeks of average prices for Stock B have a mean of 13 and a
standard deviation of 3.03
3.03
CVB  100   0.233 or 23%
13
The CV can be used as a measure of risk
• Stock A has a higher standard deviation, which is one measure of risk
• However, relative to its mean, Stock B has three times the variability of Stock A, and thus may be
the riskier stock

Copyright ©2020 John Wiley & Sons, Inc. 27


3.3 Measures of Shape (1 of 6)

Measures of shape: tools that can be used to describe the shape of a distribution
of data
Skewness: is when a distribution is asymmetrical or lacks symmetry
• Skewed portion is the long, thin part of the curve
• Skewed left, or negatively skewed:

• Skewed right, or positively skewed:

Copyright ©2020 John Wiley & Sons, Inc. 28


3.3 Measures of Shape (2 of 6)
• The relationship of the mean, median, and the mode relate to skew
• Symmetric: mean, median, and mode are equal
• Negatively skewed: mean is less than the median, which is less than the mode
• Positively skewed: mode is less than the median, which is less than the mean

Copyright ©2020 John Wiley & Sons, Inc. 29


3.3 Measures of Shape (3 of 6)
Box-and-whisker plot: a diagram that utilizes the upper and lower quartiles
along with the median and the two most extreme values to depict a distribution
graphically
• Sometimes called the 5-number summary
• A box is drawn around the median with the upper and lower quartiles as the box endpoints (hinges)
• The interquartile range is used to construct the inner fences, ± 1.5 ∙ IQR
• If data fall outside the inner fences, outer fences are constructed, ± 3.0 ∙ IQR
• A line segment (whisker) is drawn from the lower hinge of the box outward to the smallest data
value
• A second whisker is drawn from the upper hinge to the largest data value

Copyright ©2020 John Wiley & Sons, Inc. 30


3.3 Measures of Shape (4 of 6)
One use of box-and-whisker plots is to find outliers
• Data values that fall outside the mainstream of values in a
distribution are called outliers
o Sometimes merely extremes of the data
o Sometimes due to measurement or recording error
o Sometimes so unusual that they should not be considered with
the rest of the data
• Values that are outside the inner fences but inside the outer fences
are mild outliers
• Values that fall outside the outer fences are extreme outliers

Copyright ©2020 John Wiley & Sons, Inc. 31


3.3 Measures of Shape (5 of 6)
Another use is to determine if the distribution is skewed
• The position of the median in the box gives information
about the skew of the middle 50% of the data
o If the median is to the left, the middle 50% is skewed
right
o If the median is to the right, the middle 50% is skewed
left
• The length of the whiskers shows the skew of the outer
values

Copyright ©2020 John Wiley & Sons, Inc. 32


3.3 Measures of Shape (6 of 6)
Box-and-Whisker Plot Example:
TABLE 3.5: Data in Ordered Array with Quartiles and Median

62 62 63 64 64 65 65 68 68 69

69 70 71 71 71 72 72 73 73 73

73 74 74 74 75 76 77 79 79 80

81 81 81 82 82 82 84 84 85 87

• The median of the data is closer to the lower (or left) hinge, so the data is skewed right
• No numbers are outside the inner fences.
• Q1 – 1.5 . IQR = 69 – 1.5(11.5) = 51.75
• Q3 + 1.5 . IQR = 80.5 +1.5 (11.5) = 97.75
• The lowest value is 62, and the highest value is 87, the endpoints of the whiskers
Copyright ©2020 John Wiley & Sons, Inc. 33
3.4 Business Analytics Using
Descriptive Statistics (1 of 4)
• Descriptive statistics are at the foundation of statistical
techniques and numerical measures that can be used to gain
an initial understanding of data in business analytics

• Descriptive statistics allows a business analyst begin to mine


and understand any meanings and/or relationships that might
exist in data

Copyright ©2020 John Wiley & Sons, Inc. 34


3.4 Business Analytics Using
Descriptive Statistics (2 of 4)
TABLE 3.6: U.S. Production of Finished Motor Gasoline from 1997 through
2018 (1000 Barrels per Day)
Month Year: Year: Year: Year: Year: Year: Year: Year: Year:
1997 1998 1999 2000 … 2015 2016 2017 2018
January 7315 7545 7792 7836 … 9087 8642 9101 9567
February 7330 7508 7866 7830 … 9522 9335 9456 9391
March 7079 7975 7676 8256 … 9729 9430 9515 10,115
April 7737 8083 8327 8214 … 9374 9811 9783 10,045
May 7998 8216 8401 8434 … 9408 9916 10,430 10,433
June 8008 8368 8165 8325 … 10,046 9959 10,365 10,311
July 7959 8272 8156 8556 … 9984 9992 10,295 10,483
August 8207 8403 8384 8295 … 9799 10,021 10,602 10,215
September 8134 8026 8493 8424 … 9674 9988 9853 9950
October 7881 7947 8154 8249 … 9537 9824 10,187 10,364
November 7627 8181 8254 8183 … 9752 9986 10,222 9666
December 8331 8306 8388 8505 … 9921 9467 9682 9533

Source: U.S. Energy Information Administration. Complete table of data given in WileyPLUS.

Copyright ©2020 John Wiley & Sons, Inc. 35


3.4 Business Analytics Using
Descriptive Statistics (3 of 4)
First, construct a visual representation of the data (here, a
histogram – covered in Chapter 2)

Copyright ©2020 John Wiley & Sons, Inc. 36


3.4 Business Analytics Using
Descriptive Statistics (4 of 4)
Second, calculate descriptive statistics to better understand the
data

Copyright ©2020 John Wiley & Sons, Inc. 37


Copyright
Copyright © 2020 John Wiley & Sons, Inc.
All rights reserved. Reproduction or translation of this work beyond that permitted in
Section 117 of the 19 76 United States Act without the express written permission of the
copyright owner is unlawful. Request for further information should be addressed to the
Permissions Department, John Wiley & Sons, Inc. The purchaser may make back-up
copies for his/her own use only and not for distribution or resale. The Publisher assumes
no responsibility for errors, omissions, or damages, caused by the use of these programs or
from the use of the information contained herein.

Copyright ©2020 John Wiley & Sons, Inc. 38

Vous aimerez peut-être aussi