Vous êtes sur la page 1sur 37

Measures of Variability

• Range
• Interquartile range
• Variance
• Standard deviation
• Coefficient of variation
Consider the sample of
starting salaries of business
grads. We would be
interested in knowing if there
was a low or high degree of
variability or dispersion in
starting salaries received.
Range

•Range is simply the difference between the


largest and smallest values in the sample
•Range is the simplest measure of variability.
•Note that range is highly sensitive to the
largest and smallest values.
Example: Apartment Rents

Seventy studio apartments


were randomly sampled in
a small college town. The
monthly rent prices for
these apartments are listed
in ascending order on the next slide.
Range

Range = largest value - smallest value


Range = 615 - 425 = 190

425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Interquartile Range

 The interquartile range of a data set is the difference


between the third quartile and the first quartile.
 It is the range for the middle 50% of the data.

 It overcomes the sensitivity to extreme data values.


Interquartile Range

3rd Quartile (Q3) = 525


1st Quartile (Q1) = 445
Interquartile Range = Q3 - Q1 = 525 - 445 = 80

425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Variance
•The variance is a measure of variability that
uses all the data
•The variance is based on the difference
between each observation (xi) and the
mean ( x for the sample and μ for the
population).
The variance is the average of the
squared differences between the
observations and the mean value

For the population:  ( x   ) 2


2  i
N

 ( x  x ) 2
s2  i
For the sample: n 1
Standard Deviation
• The Standard Deviation of a data set
is the square root of the variance.
• The standard deviation is measured
in the same units as the data, making
it easy to interpret.
Computing a standard deviation

( xi   ) 2
For the population: 
N

( xi  x ) 2
For the sample: s
n 1
Coefficient of Variation
Just divide the
standard deviation
by the mean and
multiply times 100

Computing the coefficient


of variation:

 100 For the population

s
100 For the sample
x
The heights (in inches) of 25 individuals were recorded and the
following statistics were calculated
mean = 70range = 20mode = 73variance = 784median = 74 The
coefficient of variation equals
10
1. 11.2%
2. 1120%
3. 0.4%
4. 40%
5

0
0% 0% 0% 0%

0
1

4
If index i (which is used to determine the
location of the pth percentile) is not an integer,
its value should be
10
1. squared
2. divided by (n -
1)
3. rounded down
4. rounded up
5

0
0% 0% 0% 0%

0
1

4
Which of the following symbols represents the
variance of the population?

1. 2 10

2. 
3. 

0
0% 0% 0%

0
1

3
Which of the following symbols
represents the size of the sample
1. 2 10

2. 
3. N
4. n

0
0% 0% 0% 0%

0
1

4
The symbol s is used to represent

1. the variance of the


population 10

2. the standard
deviation of the
sample
3. the standard
deviation of the
population
4.5 the variance of the
sample
0
0% 0% 0% 0%

0
1

4
The numerical value of the variance
1. is always larger than the
numerical value of the 10
standard deviation
2. is always smaller than the
numerical value of the
standard deviation
3. is negative if the mean is
negative
4. can be larger or smaller
than the numerical value
5 of the standard deviation

0 0% 0% 0% 0%

1 2 3 4
0
If the coefficient of variation is 40% and
the mean is 70, then the variance is

1. 28 10

2. 2800
3. 1.75
4. 784

0
0% 0% 0% 0%

0
1

4
Problem 22, page 94
Broker-Assisted 100 Shares at $50 per Share

 
Range 45.05
Interquartile Range 23.98
Variance 190.67
Standard Deviation 13.8
Coefficient of
Variation 38.02
   
25th percentile 6
75th percentile 18
interquart 25 24.995
interquart 75 48.975
Mean 36.32
Online 500 Shares at $50 per Share

Range 57.50
Interquartile Range 11.475
Variance 140.633
Standard Deviation 11.859
Coefficient of Variation 57.949
   
25th percentile  
75th percentile  
interquart 25 13.475
interquart 75 24.95
Mean 20.46
The variability of
commissions is greater
for broker-assisted
trades
Using Excel to Compute the Sample Variance,
Standard Deviation, and Coefficient of Variation
 Formula Worksheet

A B C D E
Apart- Monthly
1 ment Rent ($)
2 1 525 Mean =AVERAGE(B2:B71)
3 2 440 Median =MEDIAN(B2:B71)
4 3 450 Mode =MODE(B2:B71)
5 4 615 Variance =VAR(B2:B71)
6 5 480 Std. Dev. =STDEV(B2:B71)
7 6 510 C.V. =E6/E2*100
Note: Rows 8-71 are not shown.
Using Excel to Compute the Sample Variance,
Standard Deviation, and Coefficient of Variation
 Value Worksheet

A B C D E
Apart- Monthly
1 ment Rent ($)
2 1 525 Mean 490.80
3 2 440 Median 475.00
4 3 450 Mode 450.00
5 4 615 Variance 2996.16
6 5 480 Std. Dev. 54.74
7 6 510 C.V. 11.15
Note: Rows 8-71 are not shown.
Using Excel’s
Descriptive Statistics Tool

Step 4 When the Descriptive Statistics dialog box


appears:
Enter B1:B71 in the Input Range box
Select Grouped By Columns
Select Labels in First Row
Select Output Range
Enter D1 in the Output Range box
Select Summary Statistics
Click OK
Using Excel’s Descriptive Statistics Tool
• Descriptive Statistics Dialog Box
Using Excel’s
Descriptive Statistics Tool
 Value Worksheet (Partial)
A B C D E
Apart- Monthly
1 ment Rent ($) Monthly Rent ($)
2 1 525
3 2 440 Mean 490.8
4 3 450 Standard Error 6.542348114
5 4 615 Median 475
6 5 480 Mode 450
7 6 510 Standard Deviation 54.73721146
8 7 575 Sample Variance 2996.162319
Note: Rows 9-71 are not shown.
Using Excel’s
Descriptive Statistics Tool
 Value Worksheet (Partial)
A B C D E
9 8 430 Kurtosis -0.334093298
10 9 440 Skewness 0.924330473
11 10 450 Range 190
12 11 470 Minimum 425
13 12 485 Maximum 615
14 13 515 Sum 34356
15 14 575 Count 70
16 15 430
Note: Rows 1-8 and 17-71 are not shown.
Measures of Relative Location
and Detecting Outliers
• z-scores
• Chebyshev’s Theorem
• Detecting Outliers
By using the mean
and standard
deviation together,
we can learn more
about the relative
location of
observations in a
data set
z-score Here we compare
the deviation from
the mean of a single
observation to the
standard deviation

The z-score is compute for each xi :


xi  x
zi 
s
Where
zi is the z-score for xi
x is the sample mean
s is the sample standard deviation
The z-score can be
interpreted as the
number of standard
deviations xi is from
the sample mean
Z-scores for the starting salary data

Graduate Starting Salary xi - x z-score


1 2850 -90 -0.543
2 2950 10 0.060
3 3050 110 0.664
4 2880 -60 -0.362
5 2755 -185 -1.117
6 2710 -230 -1.388
7 2890 -50 -0.302
8 3130 190 1.147
9 2940 0 0.000
10 3325 385 2.324
11 2920 -20 -0.121
12 2880 -60 -0.362
Chebyshev’s Theorem
At least (1-1/z2) of the data values must be
within z standard deviations of the mean,
where z is greater than 1.

This theorem enables us to


make statements about the
proportion of data values
that must be within a
specified number of
standard deviations from
the mean
Implications of Chebychev’s Theorem

• At least .75, or 75 percent of the data values


must be within 2 ( z = 2) standard deviations of
the mean.
• At least .89, or 89 percent, of the data values
must be within 3 (z = 3) standard deviations of
the mean.
• At least .94, or 94percent, of the data values
must be within 4 (z = 4) standard deviations
from the mean.
Note: z must be greater than one but need not be an integer.
Chebyshev’s Theorem
For example:

Let z = 1.5 with x = 490.80 and s = 54.74

At least (1  1/(1.5)2) = 1  0.44 = 0.56 or 56%


of the rent values must be between
x - z(s) = 490.80  1.5(54.74) = 409
and
x + z(s) = 490.80 + 1.5(54.74) = 573

(Actually, 86% of the rent values


are between 409 and 573.)
Detecting Outliers

You can use z-scores to


detect extreme values in the
data set, or “outliers.” In the
case of very high z-scores
(absolute values) it is a good
idea to recheck the data for
accuracy.

Vous aimerez peut-être aussi