Académique Documents
Professionnel Documents
Culture Documents
To appear as: Koen Frenken, 2004, Entropy and information theory, in Horst Hanusch
particular moments in time (e.g., market shares) and to analyse evolutionary processes
over time (e.g., technical change). Importantly, entropy statistics are suitable to
decomposition analysis, which renders the measure preferable to alternatives like the
entropy in the realms of industrial organisation and innovation studies. The chapter
contains two sections, one on statistics and one on applications. In the first section, we
4. multidimensional extensions
including:
1. industrial concentration
2
2. corporate diversification
4. income inequality
5. organisation theory
1. Entropy statistics
The origin of the entropy concept goes back to Ludwig Boltzmann (1877) and has been
a probability distribution. Let Ei stand for an event (e.g., one technology adoption of
technology i) and pi for the probability of event Ei to occur. Let there be n events E1 , …,
En with probabilities p1 ,…, pn adding up to 1. Since the occurrence of events with smaller
probability yields more information (since these are least expected), a measure of
1
h ( pi ) = log 2
p
(1)
i
reflects the idea that the lower the probability of an event to occur, the higher the
amount of information of a message stating that the event occurred. Information is here
expressed in bits using 2 as a base of the logarithm, while others express information in
From the n number of information values h (pi ), the expected information content
n
1
H = ∑ pi log 2 (2)
i =1 pi
1
pi log 2 = 0 if pi = 0 (3)
pi
which is in accordance to the limit value of the left-hand term for pi approaching zero
zero corresponding to the case in which one event has unit probability:
4
1
H min =1 ⋅ log 2 = 0 (4)
1
1
When all states are equally probable ( p i = ), the entropy value is maximum:
n
n
1 1
H max = ∑ log 2 (n) = n log 2 ( n) = log 2 (n) (5)
i =1 n n
(proof is given by Theil 1972: 8-10). Maximum entropy thus increases with n, but
decreasingly so.1
prior to the message that an event occurred, the larger the amount of information
conveyed by the message on average. Theil (1972: 7) remarks that the entropy concept
in this regard is similar to the variance of a random variable whose values are real
numbers. The main difference is that entropy applies to qualitative rather than
events.
1
In physics, maximum entropy characterises distributions of randomly moving particles that all have an
equal probability to be present in any state (like a prefect gas). When behaving in a non-random way, for
example, when particles move towards already crowded regions, the resulting distribution is skewed and
entropy is lower that its maximum value (Prigogine and Stengers 1984). In the biological context,
maximum entropy refers to a population of genotypes where all possible genotypes have an equal
frequency. Minimum entropy reflects the total dominance of one genotype in the population (which
would result when selection is instantaneous (cf. Fisher 1930: 39-40).
5
n
q
I (q | p ) = ∑ qi log 2 i (6)
i =1 pi
which equals zero when posterior probabilities equal prior probabilities (no
One of the most powerful and attractive properties of entropy statistics is the way in
entropy formula.
Let Ei stand again for an event, and let there be n events E1 , …, En with
probabilities p1 ,…, pn . Assume that all events can be aggregated into a smaller number
of sets of events S1 , …, SG in such a way that each event exclusively falls under one set
Sg, where g=1,…,G. The probability that event falling under Sg occurs is obtained by
summation:
Pg = ∑p
i∈S g
i
(7)
G 1
H 0 = ∑Pg log 2 (8)
P
g =1 g
the relationship between the between-group entropy H0 at the level of sets and the
n
1 G 1
H = ∑ pi log 2 = ∑∑ pi log 2
i =1 pi g =1 i∈S g pi
G
pi 1 Pg
= ∑Pg ∑P log 2 + log 2
P
g =1 i∈S g g g pi
G P
∑ pi log 2 g
G
p log 2 1
= ∑ Pg ∑ i + ∑ Pg
p
P g =1 i∈S Pg
g =1 i∈S g Pg g g i
G 1 G
= ∑Pg log 2 + ∑Pg ∑ pi log 1
P g =1 i∈S g Pg 2
g =1 g pi / Pg
G
H = H 0 + ∑Pg H g (9)
g =1
where:
pi 1
Hg = ∑P log2
p /P
g = 1,...,G (10)
i∈S g g i g
7
one of the events falling under Sg is bound to occur. Hg thus stands for the entropy
within the set Sg and the term ∑ Pg Hg in (9) is the average within-group entropy.
Entropy thus equals the between-group entropy plus the average within-group entropy.
(i) H ≥ H0 because both Pg and Hg are nonnegative. It means that after grouping
there cannot be more entropy (uncertainty) than there was before grouping.
and only if the grouping is such that there is at most one event with nonzero
probability.
Consider the first message that one of the sets of events occurred. Its expected
information content is H0 . Consider the subsequent message that one of the events
falling under this set occurred. Its expected information content is Hg . The total
Multidimensional extensions
8
Consider a pair of events (Xi , Yj ) and the probability of co-occurrence of both events.
n
pi . = ∑p
j =1
ij (i = 1,...,m) (11)
m
p. j = ∑p
i=1
ij ( j = 1,...,n) (12)
m
1
H ( X ) = ∑ pi . log 2 (13)
i =1 pi .
n 1
H (Y ) = ∑ p. j log 2 (14)
p
j =1 . j
m n 1
H ( X , Y ) = ∑∑ pij log 2 (15)
p
i =1 j =1 ij
The conditional entropy value measures the uncertainty in one dimension (e.g., X),
which remains when we know event Yj has occurred. It is given by (Theil 1972: 116-
117):
9
m pij p. j
H Yj ( X ) = ∑ log 2 (16)
p. j p
i =1 ij
n pij p
H X i (Y ) = ∑ log 2 i. (17)
pi . p
j =1 ij
entropies:
n m n p. j
H Y ( X ) = ∑ p. j H Y j ( X ) = ∑∑ pij log 2 (18)
p
j =1 i =1 j =1 ij
m m n p
H X (Y ) = ∑ pi . H X i (Y ) = ∑∑ pij log 2 i . (19)
p
i =1 i =1 j =1 ij
It can be shown that the average conditional entropy never exceeds the unconditional
entropy and the unconditional entropy are equal if and only if the two events are
this respect it is comparable with the product-moment correlation coefficient in the way
m n pij
J ( X , Y ) = ∑∑ pij log 2 (20)
p ⋅p
i =1 j =1 i. j
131). It can further be derived that the multi-dimensional entropy equals the sum of
H ( X , Y ) = H ( X ) + H (Y ) − J ( X , Y ) (21)
The interpretation is that when mutual information is absent, marginal distributions are
independent and their entropies add up to the total entropy. When mutual information is
positive, marginal distributions are dependent as some combinations occur relatively more
often than other combinations do, and marginal entropies exceed total entropy by an
2. Applications
Applications of entropy statistics were developed mainly during the late 1960s and the
organisation theory.
11
Industrial concentration
(perfect competition). The measure fulfils the seven axioms that are commonly listed as
desirable properties of any concentration index (Curry and George 1983: 205)
(i) An increase in the cumulative share of the ith firm, for all i, ranking firms 1,
(ii) The ‘principle of transfers’ should hold, i.e. concentration should increase
(larger) firm.
(iii) The entry of new firms below some arbitrary significant size should reduce
concentration.
Horowitz and Horowitz (1968) proposed an index of relative entropy by dividing the
entropy by its maximum value log2 (n). In this way, one obtains a concentration index,
which lies between 0 and 1. An important disadvantage of the relative entropy measure
is that axiom (iv) no longer holds. Mergers reduce the value of H, but also reduce the
value of log2 (n). Since there may be a proportionally greater fall in log2 (n) than in H,
Though the list of axioms is also met by the more popular Herfindahl index,
which is equal to the sum of squares of market shares, the entropy formula is sometimes
Jacquemin and Kumps (1971) who analysed (changes in) industrial concentration of
European firms and sets of European firms (a group of British firms and a group of
The decomposition property of the entropy formula has also been exploited to analyse
corporate diversification and its effect on corporate growth (Jacquemin and Berry 1979;
Palepu 1985; Hoskisson et al. 1993). Let pi stand for the proportion of a firm’s total
sales or production in the industry i. Entropy is computed again following (2) and now
diversification across industry groups is most rewarding for corporate growth. The
accordance to the resource-based view and the evolutionary theory of the firm that both
Jacquemin and Berry (1979), for example, considered firms active in n 4-digit
industries, which can be aggregated to G sets of 2-digit industry groups. Pg stands for
the proportion of a firm’s total sales or production in the 2-digit industry group g and pi
stands for the proportion of a firm’s total sales or production in the 4-digit industry i.
Application of (9) means that a firm’s degree of diversification at the 4-digit level H can
be decomposed into between-group diversification at the 2-digit level and the average
within-group diversification at the 4-digit level. In this way, the entropy measure solves
the problem of possible collinearity between 2-digit and 4-digit for Herfindahl and other
indices in regression analysis (Jacquemin and Berry 1979: 366). Collinearity is avoided
component and a within-group component. From the 1970s onwards, evidence seems to
support the thesis that, where diversification generally does not increase profitability,
The entropy measure of diversification has also been applied to patent data and
bibliometric data to analyse the variety in research and innovative efforts at different
environmentally friendly car technology, Frenken et al. (2003) used the entropy
14
average, more varied (18), and, vice versa, whether technologies have become, on
average, patented by a larger variety of firms (19). The first measure indicates the
variety of technologies at each corporate level and the second measure indicates the
strength of competition between firms at the level of each technology. Earlier studies
applied entropy measurements on patents at the level of firms and countries (Grupp
(Leydesdorff 1996).
future research may benefit from new classifications based on more in-depth
Diversifcation in industries has been measured by entropy at the regional level in the
same way as is done for the corporate level (Hackbart and Anderson 1975; Attaran
1985). In most cases, industry employment data are used to compute the shares of
values can be decomposed at several digit-level, for example, in first instance at the
The main interest of this regional indicator is to test whether industrial diversity
reduces unemployment and promotes growth. Diversity is said to protect a region from
unemployment and below average growth rates caused by business cycles operating on
supra-regional levels and by external shocks (e.g., oil prices). Empirical evidence
suggests that diversity indeed reduces unemployment, while evidence on the positive
impact of diversity on per capita income is more often absent (Attaran 1985) (and see a
more recent study by Izraeli and Murphy 2003 using the Herfindhal index). The entropy
measure employed in this way, however, does not capture other aspects commonly
though to affect regional employment and growth including the stage of the product
Technological evolution
measure of technological variety. In this context, Ei stand for the probability of the event
that a firm (or consumer) adopts a particular technology i. When there are n possible
technologies indicates the technological variety. Entropy can be used to indicate the
Utterback 1978). A fall in entropy towards zero would indicate the emergence of such a
product characteristics analogous to genetic strings. For example, a vehicle design with
16
steam engine, spring suspension, and block brakes may be coded as string 000 while a
vehicle design with a gasoline engine, spring suspension, and block brakes by string
comprehensive variety measure. The mutual information indicates the extent to which
value equals zero when there is no dependence between product characteristics. The
higher the value of the mutual information, the more product characteristics co-occur in
‘design families’.
(mutual information) has been analysed using (21), which can be rewritten, for K
K
∑ H ( X k ) = J ( X 1 ,..., X K ) + H ( X 1 ,..., X K )
k =1
From this formula, it can be readily understood that, given a value for the sum of
marginal entropies ΣHk , mutual information J can increase only at the expense of
technologies in consecutive years, the value of ΣHk may increase allowing both entropy
and mutual information to increase both (though not necessarily so). When entropy and
‘speciation’ in biology. This pattern has been found in data of product characteristics of
early British steam engines in the eighteenth century (Frenken and Novulari 2003).
Income inequality
construction of measures of income equality (Theil 1967: 91-134 and 1972: 99-109).
Let pi stands for the income share of individual i. When all individuals earn the same
income, we have complete equality and maximum entropy log2 (n), and when one
individual earns all income, we have complete inequality and zero entropy. To obtain a
obtain:
n
log 2 (n) − H = ∑ pi log 2 n ( pi ) (22)
i =1
Organisation theory
An approach to organisation theory based on the entropy concept has been developed by
Saviotti (1988), who proposed to use entropy to indicate the variety of possible
When n individuals have the same knowledge, and this knowledge enables each
individual to carry out any task in the organization, there is a maximum variety of
contrast, when all individuals have unique and specialised knowledge, and each task
organizational structures then equals log2 (1) (no job rotation). The introduction of
departmental boundaries implying that job-rotation is restricted to take place within the
department but not across department, would imply that, depending on the size of
departments, the entropy will lie somewhere in between the minimum and maximum
value.
References
Attaran, M. (1985) Industrial diversity and economic performance in U.S. areas, The
Boltzmann, L. (1877) Ueber die Beziehung eines allgemeine mechanischen Satzes zum
Curry, B., George, K.D. (1983) Industrial concentration: a survey. Journal of Industrial
concentration to the Clayton Act, The Yale Law Review 76, pp. 677-717
Fisher, R.A. (1930) The Genetical Theory of Natural Selection (Oxford: Clarendon
Press)
Frenken, K., Nuvolari, A. (2003) The early development of the steam engine: An
Frenken, K., Saviotti, P.P., Trommetter, M. (1999) Variety and niche creation in
469-488
Horowitz, A., Horowitz, I. (1968) Entropy, Markov processes and competition in the
Hoskisson, R.E., Hitt, M.A., Johnson, R.A., Moesel, D.D. (1993) Construct-validity of
Izraeli, O., Murphy, K.J. (2003) The effect of industrial diversity on state
unemployment rate and per capita income, The Annals of Regional Science 37, pp.
1-14
Jacquemin, A.P., Berry, C.H. (1979) Entropy measure of diversification and corporate
Jacquemin, A.P., Kumps, A.-M. (1971) Changes in the size structure of the largest
European firms: an entropy measure, Journal of Industrial Economics 20(1), pp. 59-
70
R&D efforts’, pp. 146-167 in: Sigurdson, J. (ed.) Measuring the Dynamics of
Palepu, K. (1985) Diversification strategy, profit performance and the entropy measure,
Prigogine, I., Stengers, I. (1984) Order out of Chaos (New York: Bantam).