Crosswords and Information Theory

Crosswords and Information Theory
Peter Andreasen
De ember 17, 2000
Denition 1 A language L is a set of sequen es

of letters from an alphabet A (say, the letters 'a'
A gentle introdu tion to the wonders of the infor- through 'z' and the symbol ' '). A rossword of
mation theoreti al on ept of entropy through ele- size n is a matrix with the dimensions n n where
mentary al ulation of the number of rosswords. all of the rows are sequen es (of length n) from L
and all of the olumns are sequen es (of length n)
from L.
Abstra t
What is a rossword, really?
So if we want to make a really sophisti ated

rossword, we may let L be all the possible quotes
from Shakespeare. In that ase we should use an alphabet whi h in luded the letters as well as spa e
and the various pun tation symbols. If we wanted to make a ' lassi ' rossword, we would have L
be equal to any sequen e you an make by taking
words from a di tionary and gluing them together
with one or more 's inbetween. In this ase the
alphabet A would just be the letters and the spe ial
symbol .
Most people have solved a rossword puzzle or

played S rabble. The existen e of word-games like
those are not to be taken for granted, though. As
we are going to see, the existen e of rosswords is
entirely at the mer y of the underlying language. In
fa t, there is a onne tion between the information
theoreti on ept \entropy", and the possibility of
reating rossword puzzles!
Before we pro eed, we need to get a few denitions straight. First, what is a rossword? Let us
take a look at one:
g
a
m
e
e
r
a
m
e
t

How many are there?

It is obvious, that very small rosswords are easily onstru ted. Espe ially rosswords whi h have
only one row or one olumn. It is also easy to reate a few very big but very dull rosswords: If you
keep alternating the rows between 'I I I ' and
' I I ', you ertainly get a valid (and as big
as you like) 'Classi English' rossword. So we want
to not only onsider the existen e of big rosswords,
but also he k if there are many dierent of them.
We are going to al ulate the number of big rosswords now.
Assume we have hosen an alphabet A and a language L over whi h the rosswords must be made.
We use the notation jAj for the number of letters
and symbols in the alphabet. Let us introdu e the
following number as well,
h
e
What we see are rows and olumns of words (single

letters are a epted as words) separated by white
squares.1 Now, the words in a rossword need not
be English as in the example above. We might
want to reate a Danish rossword or we might
even want to have the olumns and rows be quotes
from Shakespeare's sonnets. To be able to handle
su h omplex rules for the reation of rosswords
we make the following denition.
1 It is in the white squares you will normally nd the hints
needed to solve the puzzle { and in most rosswords the
topmost row and the leftmost olumn are lled with these
hints. For simpli ity we will make no assumptions about the
pla ement of the white squares.
L (n) = number of sequen es from L of length n:

1
So for onstru ting a square rossword of size n n

over the language L, there are L (n) possible hoi es for the rst row. We will now use a small tri k
and for a moment employ a bit of probability theory: If we pi ked an absolutely random sequen e
of n letters from A, what are the han e that we
got a 'valid' row, that is, a sequen e from L? The
answer is
L (n)
jAjn
be ause there are L (n) valid sequen es and jAjn
possible sequen es. An example: suppose we wanted to reate a normal English rossword. In my
di tionary, there are 1 word (namely 'I') of length
1 and 49 words of length 2. Valid sequen es of
length 2 are '', ' I', 'I ', and then the 49
words of length 2. A total of 52 sequen es, that is,
L (2) = 52. The total number of possible sequen es
of length 2 is jAj2 = 27 27 = 729 (note, that even
though we only have 26 letters, the size of A is 27,
be ause we need the symbol as well). And thus
the probability of getting a valid sequen e of length
2 would be 52=729 0:07, in this example.
However, that was the probability of just one
valid row. What about the rest? The probability of
all n rows being valid equals the above probability
multiplied with itself n times2 ,
(n) n (n)n
L
L
jAjn = jAjn2 :
We now return to our original question: How

many (big) rosswords are there? Well, we know
the probability of a randomly sele ted matrix of
n n letters being a 2 rossword, and there are a
total of jAjnn = jAjn possible n n matri es, so
we may write3
Nn = jAjn
jAL (j n)n2
2
2(
(n)2n
= L n2 :
jAj
This makes Nn our symbol for the mumber of rosswords of size n n.

Explosive numbers
To get to the ore of the matter, we need to do a

bit of mathemati al wizardry, so now is the time to
wear your pointed hat! First, we apply the logarithm4 to Nn :
log Nn = 2n log L (n) n2 log jAj
log (n)

log jAj
L
= 2n2
n
2

2
log

(
n
)
2n
1
jAj L
=
log jAj
n
2
The spe ial symbol logjAj is simply the logarithm to
the base of jAj, that is, jAjlogjAj x = x. Re all that
we are interested in the number Nn when n grows
large. In the expression above, the rst fra tion,
Now for the olumns the situation is identi al. And

be ause the olumns are as high as the rows are
wide, the result is the same: The probability of all
n olumns being valid (that is, from L) equals
2n2
;
log jAj
just grows towards innity as n does the same. The
se ond fra tion,
L (n)n
jAjn2 :
n =
logjAj L (n)
Now we may al ulate the probability of a randomn

ly sele ted matrix of n n letters from A being in
fa t a rossword: We want both its rows and its is more interesting (so we name it nn ). The value
of L (n) must be between 0 and jAj (that should
olumns to be valid, so we multiply:
be lear from the denition of L (n)). So (assuming
(n)n 2 (n)2n
3 This is another appli ation of basi probability theory:
L
= L 2(n2 )
2
The
number of valid rosswords are al ulated as the proban
jAj
jAj
of a random matrix being valid times the total number
of possible matri es.
4 Re all, that taking the logarithm of a produ t yields a
sum (log ab = log a + log b), a logarithm of a fra tion yields
a dieren e (log a=b = log a log b) and the logarithm of a
power turns into a produ t (log ab = b log a).
bility
2 This
is basi probability theory. It is omparable to

when we say that the probability of a oin landing heads up
equals 21 and then pro eed to al ulate the probability of two
heads in a row as 12 12 = 14 . We multiply the probabilities
when we want the probability of both events.
L (n) > 0)) we see that logjAj L (n) is between 1

and n. Thus, when n grows large, value of n stays
between 0 and 1. Let us assume that n in fa t
onverges5 to some number between 0 and 1. We
may now on lude, that if < 12 the value of log Nn
goes towards negative innity ( 1) as n be omes
(whi h was 2 before). That is, we onsider ubi

(for d = 3) or even hyber ubi (for d > 3) rosswords. The probability of one dimension (think:
row) of the rosswords being valid equals
(n) nd
L
(n)n
= L nd :
jAj
1
big. To see this very learly, onsider the formula

jAjn
from above (this time written using the symbol n ,
This is almost the same result as before, but note
but otherwise identi al):
the exponent nd 1 . In the ase d = 3, where we

might imagine the rossword as a ube made up
2n2
1
n
:
log Nn =
of 'sti ks' of sequen es from L, the exponent orrelog jAj
2
sponds to the fa t that in ea h dimension there are
1
1
Assuming < 2 then n will also be less than 2 n2 sti ks. The probability of all dimensions (think:
(provided n is big enough) and hen e we nd that rows and olumns) being valid equals
the value of (n 1=2) be omes negative, while
d 1
d 1 !d
L(n)dn
L (n)n
the rst fra tion as mentioned above grows towards
=
d
d
innity.
jAjn
jAjdn :
1
If, on the other hand, > 2 , the value of log Nn
approa hes positive innity (1), provided we make Again, this should ome as no sho k. The total
number of possible rosswords (think: any matrix)
n big enough.
Now take o your pointed hat (the heavy math is multipli ated with the probability and we nd:
d 1
d 1
is over for now!) and onsider the impli ations of
d L (n)dn
L(n)dn
n
(d)
for Nn . If log Nn is growing unlimited when n
Nn = jAj
jAjdnd = jAjnd (d 1) :
grows, then it indi ates that Nn is very big. If
log Nn de reases to very large negative numbers Applying logarithm yields
when n grows, then Nn must be very lose to zero6 .
log Nn(d) = dnd 1 log L (n) nd (d 1) log jAj
Somehow the number determines if the number
of valid rosswords (Nn ) gets very large as n (the
size of the rosswords) gets large, or if the number and reorganizing the terms,

of valid rosswords be omes almost zero!
dnd logjAj L(n) d 1
(d)
log Nn =
: (1)
It is rather lear, that the value depends on L
log jAj
n
d
and only on L. Thus we will write L and say that
The se ond fra tion in the above expression is
the language L has -value L .
For languages L made up from an English di tio- re ognized from before. We re all that the number
nay based on 26 letters plus , one nd the value is used to denote the limiting value of the fra tion
of L to be around 0:6, in other words, there are as n be omes very big. We nd, that if, say, d = 3
the value of must be at least 23 if we want to
no imminent shortage of rosswords.
have many, big rosswords. As the dimension of
the rosswords grow, the languages L must have
Curiouser and uriouser!
larger and larger -value to sustain the notion of
many rosswords.
As if the madness will see no end, we now approa h
It seems like L expresses something fundamen3-dimensional rosswords, no, let us make it d- tal about the language L. So information theorists
dimensional rosswords! The size of the rosswords have a name for that value:
is now measured by nd where n is the generalized
notion of height or width and d is the dimension Denition 2 Let L be a language. The entropy of
5 This is by no means a trivial assumption. It is, however, L is dened as
beyond the s ope of this arti le to look into these details.
6 This is all due to the behaviour of the logarithm fun tion.
H~ (L) = nlim
!1
3
logjAj L (n)
Note how d0 may be arbitrarily big, even 1 if

the entropy equals 1. How is that for a rossword
~ (L) = 1 it is quite trivial to
puzzle! A tually, if H
reate rossword puzzles (in any dimension). An
example of su h a language L is that whi h is made
up of every integer. The alphabet A is just the digits, and L (n) = jAjn = 10n (be ause any sequen e
of length n whi h is made up from digits, is a valid
number) so learly H~ (L) = 1.
We have arrived at the on ept of entropy by
a quite unusual method. Aside from (hopefully)
some pedagogi al advantages there are other reasons for pi king this approa h: We now have an
entropy on ept dened on any language or, whi h
is the same, any set of sequen es made up of letters
from A. This is not true for the traditional entropy
whi h is introdu ed by the on ept of information
sour es (whi h are also known as sto hasti pro esses, and are based on a quite te hni al probability theoreti framework). In addition, while our
entropy, in its urrent form, does not handle random languages (e.g. the language of all possible sequen es of 0's and 1's reated by ipping a oin),
it is possible to rene our denitions to over (and
indeed generalize the probability theoreti entropy)
these important ases as well.
We re ognize the entropy as the same thing as

we know as L . The little symbol above H~ is there
to remind us that this is a spe ial kind of entropy:
The theory leading up this denition is not as on ise and rigid as many information theorists would
want. But we should not feel nothing has been a omplished: Our entropy aptures some very deep
aspe ts of the on ept.
We may wonder what happens if, say, H~ (L) =
1
.
What kind of rosswords are possible? Why,
4
rosswords of dimension d > 34 of ourse! If d = 43
we have (d 1)=d = 41 . How to visualize a rossword
in 1:333 dimensions is probably better left as an
ex er ise to the reader!
A re apitulation
Let us brie y examine what we have learnt so far:

We introdu ed the on ept of language whi h is
nothing but a set of sequen es of letters. We have
then made a satisfa tory denition of what is a
rossword over a language. Using elementary ombinatori s and probability theory we have al ulated the number of valid rosswords of size n n
(or, in the ase of other dimensions, size nd ). This
number depends on the size of the alphabet, jAj,
as well as the spe ial fun tion L (n). We then observed, that there are essentially two dierent ases:
In the rst ase (L < d d 1 ), the number of rosswords be omes very small as the size, n, grows. In
the se ond ase (L > d d 1 ), the same number be omes innitely big as the size grows.
This alls for a reformulation of our initial question: While we opened this paper asking about the
existen e of rosswords, we are now tempted to ask:
\Given a language L, what is the greatest dimension d for whi h there are many (big) rosswords?".
This move en ourages us to onsider non-integer
values of d, and thus we have denitely left the
realm of ordinary rosswords puzzles, and maybe
the real world as well! The answer to the new question is related to the entropy as we have just seen.
In fa t, ombining the denition of entropy and formula (1), shows
Entropy
Something should probably be said about why entropy is so entral to information theory. But where
to start and where to end! We will look at only two
aspe ts, one of somewhat philosophi al nature and
the other of very pra ti al nature.
Entropy is often said to be a measure of how
' omplex' or even ' aoti ' things are. This orresponds ni ely with the observations given above: A
language made up of every possible integer is devoid
of any form or stru ture. Anything goes. It is impossible to distinguish between the a sequen e from
the language and a sequen e of ompletely random
digits. This language, as explained above, has the
entropy 1. On the other hand, a language made up
of sequen es of only one letter, say 'a', is ompletely
stru tured. No room for hoi es. The fun tion
d
1
1
H~ (L) = 0
and d0 =
;

(
n) is onstantly 1 regardless of the value of n.
L
d0
1 H~ (L)
This orresponds to the ase where H~ (L) = 0.
where d0 is exa tly the largest dimension where it
The other use of entropy whi h we will tou h upis possible to reate (many) rosswords over L.
on is data ompression. We mention the following
4
theorem in sket h form:
is 33%, three-dimensional rossword puzzles should be possible, et .
Theorem 1 (Shannon) Let L be a language over

A. There exists an en oding su h that any sequen e
x 2 L of length n may be en oded into a sequen e
of no more than n(H~ (L) + ) letters. This holds for
any positive number however small, provided the
length n of x is large enough.
For a more detailed dis ussion as well as a bit of

history on the result see the last part of [Immink
et al., 1998. The entropy H~ introdu ed in this
paper is releated to the Hartley entropy and Hausdor dimensions of 'ni e' subsets of A1 ( onsidered as subsets of [0; 1). Or, if one onsiders arbi~ might be intertrary subsets of A1 , the entropy H
pretated as a form of the box ounting dimension,
see e.g. [Fal oner, 1990. The onne tion between
entropy and Hausdor dimension is des ribed in
[Billingsley, 1965 and interesting results in this dire tion an be found in [Ryabko, 1986.
This formulation only aptures the essen e of

Shannon's theorem. What is important is the order in whi h things happen: First we hoose the
value as small as we want it. This determines
how \ lose" to the entropy we want our en oding
to be. Then the theorem tells us that there exists a
number N and a ode so that any sequen e x 2 L
whi h are at least N letters long an be en oded
into just jxj(H~ (L) + ) letters. So if the entropy of
L is 21 we an ompress long sequen es from L by
a fa tor 2.
This on ludes our tour. The onne tion between the omplexity of a language and the ability
to reate rosswords may not ome as a surprise.
But that this onne tion leads dire tly to entropy,
the ornerstone of information theory is, at the very
least, rather neat.
Referen es
Billingsley, P. Ergodi theory and Information.

John Wiley & Sons, 1965.
Fal oner, K. Fra tal Geometry - Mathemati al
Foundations and Appli ations. John Wiley &
Sons, 1990.
Immink, K.A.S., P.H. Siegel and J.K. Wolf. Codes
for digital re orders. IEEE Trans. Inform. Theory, 44(6):2260{2299, 1998.
Notes
Ryabko, B. Y. Noiseless oding of ombinatorial sour es, hausdor dimensoin, and kolmogorov
This se tion ontains some notes about the history
omplexity. Problems of Inform. Trans., 22(3):
of the results. It is probably most interesting to
170{179, 1986.
readers already familiar with the on epts in this
paper. The idea of linking rossword puzzles and
entropy is, in fa t, as old as [Shannon, 1948, from Shannon, C.E. A mathemati al theory of ommuni ation. Te hni al report, Bell System, 1948.
whi h we quote the last paragraph of se tion 7:
The redundan y of a language is related
to the existen e of rossword puzzles. If
the redundan y is zero any sequen e of letters is a reasonable text in the language
and any two-dimensional array of letters
forms a rossword puzzle. I fthe redundan y is too high the language imposes
too many onstraints for large rosswords puzzles to be possible. A more detailed
analysis shows that if we assume the onstraints imposed by the language are of a
rather haoti and random nature, large
rossword puzzles are just pussible when
the redundan y is 50%. If the redundan y

Crosswords and Information Theory

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Crosswords and Information Theory

Transféré par

Droits d'auteur :

Formats disponibles

Crosswords and Information Theory

De nition 1 A language L is a set of sequen es

What is a rossword, really?

So if we want to make a really sophisti ated

Most people have solved a rossword puzzle or

How many are there?

What we see are rows and olumns of words (single

L (n) = number of sequen es from L of length n:

So for onstru ting a square rossword of size n n

We now return to our original question: How

This makes Nn our symbol for the mumber of rosswords of size n n.

To get to the ore of the matter, we need to do a

Now for the olumns the situation is identi al. And

Now we may al ulate the probability of a randomn

is basi probability theory. It is omparable to

L (n) > 0)) we see that logjAj L (n) is between 1

(whi h was 2 before). That is, we onsider ubi

big. To see this very learly, onsider the formula

Note how d0 may be arbitrarily big, even 1 if

We re ognize the entropy as the same thing as

Let us brie y examine what we have learnt so far:

theorem in sket h form:

is 33%, three-dimensional rossword puzzles should be possible, et .

Theorem 1 (Shannon) Let L be a language over

For a more detailed dis ussion as well as a bit of

This formulation only aptures the essen e of

Billingsley, P. Ergodi theory and Information.

Vous aimerez peut-être aussi

Denition 1 A language L is a set of sequen es