Académique Documents
Professionnel Documents
Culture Documents
-- Mandar Gadre
(July 2013)
A Sample Dataset
Xi,
S
i = 1 to N
Discrepancy
Error
Calculating S
Geometric Mean:
Defined only for dataset with all positive numbers, it is the Nth
root of the product of all N data-points.
G = ( (xi) ) 1/N
It is used while summarizing/aggregating data with different
categories and scales involved. E.g. rating companies on various
metrics taken together.
Or where the data-points show compounding behavior e.g.
summarizing performance of a stock over the past N years.
Harmonic Mean:
Defined only for dataset with all non-zero numbers, it is the
reciprocal of the arithmetic mean of reciprocals of xi.
H = 1 / (1/N(i (1/xi)) )
It is used while summarizing rates. e.g. the average speed of
aircraft between numerous Mumbai-London trips; or the
average rate (in ml/min) at which a blood donor fills a bag over
multiple visits.
Mid-Range
Defined as the arithmetic mean of the maximum and minimum
data-points
Breakdown Point is the largest proportion of contaminated datapoints (e.g. an arbitrarily large data-point) a statistic can handle
before yielding an absurd result (e.g. an arbitrarily large statistic).
Since the arithmetic mean depends on all the values and is swayed by
changing even one value among N, its Breakdown Point is 0.
The median is the strongest statistic, with its Breakdown Point at 50%.
(If more than 50% of the data is contaminated, a statistic cannot be defined
anyway since there is no way to distinguish between the actual underlying
distribution and the contaminated one.)
Robust Statistics