Académique Documents
Professionnel Documents
Culture Documents
Purpose
This is a technical briefing note on the use of the Gini (G) coefficient for summarising
inequality.
Background
The Pan American Health Organisation (2001) promoted a 'Gini-like' statistic as a
summary measure of inequalities in health. UK Public Health Observatories have a
role in monitoring inequalities in health (Department of Health, 2001), and must
therefore consider the statistical properties of inequality measures such as the Gini
coefficient.
History
The Gini coefficient was developed by the Italian Statistician Corrado Gini (1912) as
a summary measure of income inequality in society. It is usually associated with the
plot of wealth concentration introduced a few years earlier by Max Lorenz (1905).
Since these measures were introduced, they have been applied to topics other than
income and wealth, but mostly within Economics (Cowell, 1995, 2000; Jenkins,
1991; Sen, 1973).
The classical definition of G appears in the notation of the theory of relative mean
difference:
n n
i 1 j 1
xi xj
G 2
2n x
- where x is an observed value, n is the number of values observed and x bar is the
mean value.
If the x values are first placed in ascending order, such that each x has rank i, the
some of the comparisons above can be avoided and computation is quicker:
n
2
G i ( xi x)
n2 x i 1
n
i 1
(2i n 1) xi
G n
n i 1
xi
- where x is an observed value, n is the number of values observed and i is the rank
of values in ascending order.
Two types of bootstrap confidence intervals are commonly used, these are percentile
and bias-corrected (Mills and Zandvakili, 1997; Dixon et al., 1987; Efron and
Tibshirani, 1997). The bias-corrected intervals are most appropriate for most
applications. Dixon (1987) describes a refinement of the bias-corrected method
known as 'accelerated' - this produces values very closed to conventional bias
corrected intervals.
p2 2z0 z1 /2
1 *
z0 # (g G) / k
- where g* is a Gini coefficient estimated from a bootstrap sample, G is the observed
Gini coefficient, is (100-confidence level)/100, is the standard normal distribution
and k is the number of re-samples in the bootstrap.
Gini-like statistics
The original Gini formula is presented in many forms, and there are Gini-like
formulae which approximate the Gini coefficient. In the context of measuring
inequalities in health, Brown (1994) presents a Gini-style index, seemingly calculated
from two variables instead of one. The two variables comprise distinct indicators of
health (y, e.g. infant deaths) and population (x, live births) for n groups sorted by a
composite measure of health and population (e.g. infant mortality rate).
n 1
Gb 1 (Yi 1 Yi )( X i 1 Xi )
i 0
Gb based on two variables (e.g. infant deaths and live births) will be very similar to G
calculated from a composite measure (e.g. infant mortality rate). In most situations it
is more natural to think of inequality of the composite measure. Another reason not
to use Gb is that its statistical characteristics are not well studied
The Pan American Health Organisation (2001) gave the following illustration:
Country GNP per capita infant mortality rate (IMR) live births infant deaths
Bolivia 2860 59 250 14750
Peru 4410 43 621 26703
Ecuador 4730 39 308 12012
Colombia 6720 24 889 21336
Venezuela 8130 22 568 12496
Brown's Gb = 0.1904
P roportion of variable
1 .0 0
0 .7 5
0 .5 0
0 .2 5
0 .0 0
0 .0 0 0 .2 5 0 .5 0 0 .7 5 1 .0 0
P roportion of sample
The example above uses too few observations for reliable inference from G, but the
required definition of groups might force this type of situation. G based on few
observations is unreliable for comparing different groups at any one time, but it can
be reasonable for monitoring changes in inequalities over time.
Be aware that G can amplify biases. So, if you are comparing grouped data, make
sure that you are comparing like with like, e.g. equality of readmission rates between
hospitals biased by case-mix differences.
When you are comparing Gini coefficients, particularly over time, note that the Gini
coefficient is insensitive to multiplying all observations by a constant, but it is
sensitive to adding a constant to all observations. An example of this issue could
occur if your were comparing the equality of life-expectancy over time between
geographical areas; here the secular/baseline increase in life expectancy of the
overall population is in effect adding a constant, so there can be a change in Gini
coefficient over time even if the absolute differences in life expectancy between the
areas remain constant.
Software
I have written a function in StatsDirect to produce bootstrap confidence intervals for
Gini coefficients and Lorenz plots. See http://www.statsdirect.com. I plan to extract
this Gini function into a freely-distributable Excel add-in in the future. It is not
currently practical to put this function into a web-based calculator as it is computer-
intensive, but this will become practical in the future with the growth of processing
power.
There is a Stata macro called ineqerr that will calculate bootstrap confidence
intervals for three different measures of inequality, including Gini. The results need
to be multiplied by n/(n-1) to get unbiased estimates (Dixon, 1987). See
http://www.stata.com.
Dixon (1987) supplies a SAS macro for bootstrapping Gini coefficients from
http://www.public.iastate.edu/~pdixon/sas/. See also http://www.sas.com.
Note that each run of bootstrap usually gives slightly different answers due to the
random re-sampling nature of the method. In order to get consistent bootstrap
estimates, you should select at least 2000 replications when bootstrapping any
software.
References
Brown M, Using Gini-style indices to evaluate the spatial patterns of health
practitioners; theoretical considerations and an application based on the Alberta
data. Social Science and Medicine 1994;38(9):1243-1256.
Cowell FA. Measuring Inequality (second edition, draft third edition (May 2000) at
http://darp.lse.ac.uk/Frankweb/Frank/pdf/measuringinequality2.pdf), Hemel
Hempstead: Harvester Wheatsheaf 1995.
Stuart A, Ord JK. Kendall's Advanced Theory of Statistics (6th edition). London:
Edward Arnold 1994.