Vous êtes sur la page 1sur 21

Business Statistics

Session 3
Measures of central tendency and dispersion
Summary statistics
• Central tendency – is the point of a distribution
• Dispersion – it is the spread of the data
• Skewness – the curves are skewed because values in their frequency
distributions are concentrated at either the low end (negatively
skewed) or high end (positively skewed)
• Kurtosis – measures the peakedness
Central tendency
• Arithmetic mean
• Population mean
• Sample mean
• Mean from a grouped data
• Weighted mean
• Geometric mean
• Median
• Mode
• Multimodal distributions?
Arithmetic mean

Population mean  
 x
N

Sample mean x 
 x
n
Mean from a grouped data
Average Monthly balances of 600 customers

Class (Dollars) Frequency


0 – 49.99 78
50.00 – 99.99 123
100.00 – 149.99 187
150.00 – 199.99 82
200.00 – 249.99 51
250.00 – 299.99 47
300.00 – 349.99 13
350.00 – 399.99 9
400.00 – 449.99 6
450.00 – 499.99 4
600
Mean from a grouped data
Sample Arithmetic mean

x
 ( f  x)
n
x  sample mean
f  frequency in each class
x  midpoint for each class in the sample
n  number of observations in the sample
Solution
Weighted Mean

xw 
 ( w  x)
w
Geometric Mean
G.M  n product of all x values
Median
• Median is a single value from the data set that measures the central
item in the data
• The single item is the middle most or most central in the set of
numbers
n 1
• Median =  2 th item in a data array
Average monthly balances for 600 customers
Calculating the median from a grouped data

~  (n  1) / 2  ( F  1) 
Sample median m    w  Lm
 fm 
n  total number of items in the distribution
F  sum of all the class frequencies upto but not including, the median class
f m  frequency of median class
w  class interval width
L m  lower limit of the median class interval
Mode
• Mode is the value that is repeated most often in the dataset

 d1 
Mode Mo  LMo    w
 d1  d 2 
L Mo  Lower limit of the modalclass
d1  frequency of the modal class minus the frequency of the class directly below it
d 2  frequency of the modal class minus the frequency of the class directly above it
w  width of the modal class interval
Comparison of Mean, Median and Mode
• Symmetrical distributions that contain only mode always have the
same value for the mean, median and the mode.
• In a positively skewed distribution, the mode is at the highest point of
the distribution, median is to the right of that, and the mean is to the
right of both
• In a negatively skewed (skewed to the left) distribution, mode is still
the highest point of the distribution, median is to the left of that and
the mean is to the left of both
• When the population is skewed median is the best measure. Median
is not influenced by the frequency of occurrence of a single value as is
the mode, nor is it pulled by the extreme value as is the mean
Dispersion
Dispersion
• Dispersion is required to determine the reliability of the measure of
central tendency
• If the dispersion is wide, central tendency is less representative of the
central tendency
Ranges
• It may be measured in terms of difference between two values
selected from data set.
• The measures are
• Range
• Interfractile range
• Inter quartile range
Measures of Range
• Range = value of highest observation – value of lowest observation
• In a frequency distribution, a given fraction or proportion of the data lie at
or below a fractile. Median, for example is the 0.5 fractile.
• Interfractile range is a measure of the spread between two fractiles in a
frequency distribution, that is the difference between the values of the two
fractiles
• Fractiles that divide the data into 10 equal parts are called deciles.
Quartiles divide the data into 4 equal parts. Percentiles divide the data
into 100 equal parts
• Quartiles: Data is divided into four parts, each of which contains 25% of the
items. Interquartile range is the difference between the values of first and
third quartile
Average Deviation Measures

Population Variance  2 
 ( x   ) 2


 x 2

 2
N N

Sample Variance s 2 
 ( x  x ) 2


 x 2


nx 2
n 1 n 1 n 1

Population standard deviation


Sample standard deviation
Coefficient of variation
• Standard deviation is an absolute measure
• It cannot be the sole basis for comparing two distributions
Exploratory Data Analysis
• Stem and leaf display
• Box plot

Vous aimerez peut-être aussi