Académique Documents
Professionnel Documents
Culture Documents
Claude Shannon, one of the greatest scientists of the 20th century was a key
figure in the development of information science. He is the creator of modern information
theory, and an early and important contributor to the theory of computing.
1. SECRECY SYSTEMS
As a first step in the mathematical analysis of cryptography, it is necessary to
idealize the situation suitably, and to define in a mathematically acceptable way what we
shall mean by a secrecy system. A “schematic” diagram of a general secrecy system is
shown in Fig. 1. At the transmitting end there are two information sources—a message
source and a key source. The key source produces a particular key from among those
which are possible in the system. This key is transmitted by some means, supposedly not
interceptible, for example by messenger, to the receiving end. The message source
produces a message (the “clear”) which is enciphered and the resulting cryptogram sent
to the receiving end by a possibly interceptible means, for example radio. At the
receiving end the cryptogram and key are combined in the decipherer to recover the
message.
E = f(M,K)
1
that is E is function of M and K. It is preferable to think of this, however, not as a
function of two variables but as a (one parameter) family of operations or
transformations, and to write it
E = TiM.
The transformation Ti applied to message M produces cryptogram E. The index i
corresponds to the particular key being used.
We will assume, in general, that there are only a finite number of possible keys, and that
each has an associated probability pi. Thus the key source is represented by a statistical
process or device which chooses one from the set of transformations T1 ,T2, … , Tm with
the respective probabilities p1, p2, … ,pm. Similarly we will generally assume a finite
number of possible messages M1, M2, … ,Mn with associate a priori probabilities
q1, q2, … ,qn.
The possible messages, for example, might be the possible sequences of English letters
all of length N, and the associated probabilities are then the relative frequencies of
occurrence of these sequences in normal English text.
At the receiving end it must be possible to recover M, knowing E and K. Thus the
transformations Ti in the family must have unique inverses Ti-1 such that TiTi-1 =I the
identity transformation. Thus:
M = Ti-1E.
At any rate this inverse must exist uniquely for every E which can be obtained
from an M with key i. Hence we arrive at the definition: A secrecy system is a family of
uniquely reversible transformations Ti of a set of possible messages into a set of
cryptograms, the transformation Ti having an associated probability pi. Conversely any
set of entities of this type will be called a “secrecy system”. The set of possible messages
will be called, for convenience, the “message space” and the set of possible cryptograms
the “cryptogram space”.
Two secrecy systems will be the same if they consist of the same set of
transformations Ti, with the same messages and cryptogram space (range and domain)
and the same probabilities for the keys.
A secrecy system can be visualized mechanically as a machine with one or more
controls on it. A sequence of letters, the message, is fed into the input of the machine and
a second series emerges at the output. The particular setting of the controls corresponds
to the particular key being used. Some statistical method must be prescribed for choosing
the key from all the possible ones.
REPRESENTATION OF SYSTEMS
2
A more common way of describing a system is by stating the operation one
performs on the message for an arbitrary key to obtain the cryptogram. Similarly, one
defines implicitly the probabilities for various keys by describing how a key is chosen or
what we know of the enemy’s habits of key choice. The probabilities for messages are
implicitly determined by stating our a priori knowledge of the enemy’s language habits,
the tactical situation (which will influence the probable content of the message) and any
special information we may have regarding the cryptogram.
The message is divided into groups of length d and a permutation applied to the
first group, the same permutation to the second group, etc. The permutation is the key and
can be represented by a permutation of the first d integers. Thus for d = 5, we might
have 2 3 1 5 4 as the permutation. This means that:
3
m1 m2 m3 m4 m5 m6 m7 m8 m9 m10 …
becomes
m2 m3 m1 m5 m4 m7 m8 m6 m10 m9 … .
Sequential application of two or more transpositions will be called compound
transposition. If the periods are d1, d2, … ,dn it is clear that the result is a transposition
of period d, where d is the least common multiple of d1, d2, … ,dn.
In the Vigenere cipher the key consists of a series of d letters. These are written
repeatedly below the message and the two added modulo 26 (considering the alphabet
numbered from A = 0 to Z = 25. Thus
ei = mi + ki (mod 26)
where ki is of period d in the index i. For example, with the key G A H, we obtain
message NOWISTHE
repeated key G A H G A H G A
cryptogram TODOSANE
The Vigenere of period 1 is called the Caesar cipher. It is a simple substitution in which
each letter of M is advanced a fixed amount in the alphabet. This amount is the key,
which may be any number from 0 to 25. The so-called Beaufort and Variant Beaufort
are similar to the Vigen_ere, and encipher by the equations
ei = ki �mi (mod 26)
ei = mi � ki (mod 26)
respectively. The Beaufort of period one is called the reversed Caesar cipher. The
application of two or more Vigenere in sequence will be called the compound Vigenere.
It has the equation
ei = mi + ki + li + … + si (mod 26)
where ki, li, … ,si in general have different periods. The period of their sum,
ki + li + … + si
as in compound transposition, is the least common multiple of the individual periods.
Rather than substitute for letters one can substitute for digrams, trigrams, etc.
General digram substitution requires a key consisting of a permutation of the 262
digrams. It can be represented by a table in which the row corresponds to the first letter of
the digram and the column to the second letter, entries in the table being the substitutions
(usually also digrams).
4
Amount of Secrecy
There are some systems that are perfect—the enemy is no better off after
intercepting any amount of material than before. Other systems, although giving him
some information, do not yield a unique “solution” to intercepted cryptograms. Among
the uniquely solvable systems, there are wide variations in the amount of labor required
to effect this solution and in the amount of material that must be intercepted to make the
solution unique.
Size of Key
Propagation of Errors
Expansion of Message
In some types of secrecy systems the size of the message is increased by the
enciphering process. This undesirable effect may be seen in systems where one attempts
to swamp out message statistics by the addition of many nulls, or where multiple
substitutes are used. It also occurs in many “concealment” types of systems (which are
not usually secrecy systems in the sense of our definition).
PERFECT SECRECY
Let us suppose the possible messages are finite in number M1, … ,Mn and have
a priori probabilities P(M1), … ,P(Mn), and that these are enciphered into the possible
cryptograms E1, … ,Em by
E = TiM.
The cryptanalyst intercepts a particular E and can then calculate, in principle at
least, the a posteriori probabilities for the various messages, PE(M). It is natural to define
perfect secrecy by the condition that, for all E the a posteriori probabilities are equal to
the a priori probabilities independently of the values of these. In this case, intercepting
the message has given the cryptanalyst no information. Any action of his which depends
5
on the information contained in the cryptogram cannot be altered, for all of his
probabilities as to what the cryptogram contains remain unchanged. On the other hand, if
the condition is not satisfied there will exist situations in which the enemy has certain a
priori probabilities, and certain key and message choices may occur for which the
enemy’s probabilities do change. This in turn may affect his actions and thus perfect
secrecy has not been obtained. Hence the definition given is necessarily required by our
intuitive ideas of what perfect secrecy should mean.
A necessary and sufficient condition for perfect secrecy can be found as follows:We have
by Bayes’ theorem
P ( M ) PE ( M )
PE ( M ) =
P( E )
in which:
P(M) = a priori probability of message M.
PM(E) = conditional probability of cryptogram E if message M is chosen i.e. the sum
of the probabilities of all keys which produce cryptogram E from message M.
P(E) = probability of obtaining cryptogram E from any cause.
PE(M) = a posteriori probability of messageM if cryptogram E is intercepted.
For perfect secrecy PE(M) must equal P(M) for all E and allM. Hence either P(M) = 0,
a solution that must be excluded since we demand the equality independent of the values
of P(M), or
PM(E) = P(E)
Stated another way, the total probability of all keys that transform Mi into a given
cryptogram E is equal to that of all keys transforming Mj into the same E, for all Mi;Mj
and E.
Now there must be as many E’s as there are M’s since, for a fixed i, Ti gives a
one-to-one correspondence between all theM’s and some of the E’s. For perfect secrecy
PM(E) = P(E) 6= 0 for any of these E’s and any M. Hence there is at least one key
transforming any M into any of these E’s. But all the keys from a fixed M to different E’s
must be different, and therefore the number of different keys is at least as great as the
number of M’s. It is possible to obtain perfect secrecy with only this number of keys, as
6
Fig. 3. Perfect system
one shows by the following example: Let the Mi be numbered 1 to n and the Ei the same,
and using n keys let
TiMj = Es
1
where s = i + j (Mod n). In this case we see that PE ( M ) = = P( E )
n
and we have perfect secrecy. An example is shown in Fig. 3 with s = i+j - 1 (Mod 5).
Perfect systems in which the number of cryptograms, the number of messages, and the
number of keys are all equal are characterized by the properties that (1) each M is
connected to each E by exactly one line, (2) all keys are equally likely. Thus the matrix
representation of the system is a “Latin square”.
In MTC it was shown that information may be conveniently measured by
means of entropy. If we have a set of possibilities with probabilities p1, p2, … ,pn, the
entropy H is given by:
H = −∑ p i log p i .
In a secrecy system there are two statistical choices involved, that of the message and of
the key. We may measure the amount of information produced when a message is chosen
by H(M):
H ( M ) = −∑P ( M ) log P ( M ),
the summation being over all possible messages. Similarly, there is an uncertainty
associated with the choice of key given by:
H ( K ) = −∑P ( K ) log P ( K ),
In perfect systems of the type described above, the amount of information in the
message is at most log n (occurring when all messages are equiprobable). This
information can be concealed completely only if the key uncertainty is at least log n. This
is the first example of a general principle which will appear frequently: that there is a
7
limit to what we can obtain with a given uncertainty in key—the amount of uncertainty
we can introduce into the solution cannot be greater than the key uncertainty.
The situation is somewhat more complicated if the number of messages is infinite.
Suppose, for example, that they are generated as infinite sequences of letters by a suitable
Markoff process. It is clear that no finite key will give perfect secrecy. We suppose, then,
that the key source generates key in the same manner, that is, as an infinite sequence of
symbols. Suppose further that only a certain length of key LK is needed to encipher and
decipher a length LM of message. Let the logarithm of the number of letters in the
message alphabet be RM and that for the key alphabet be RK. Then, from the finite case, it
is evident that perfect secrecy requires
RMLM ≤ RKLK.
2. ENTROPY
8
The Shannon entropy or information entropy is a measure of the uncertainty
associated with a random variable. It quantifies the information contained in a message,
usually in bits or bits/symbol. It is the minimum message length necessary to
communicate information.
This also represents an absolute limit on the best possible lossless compression of
any communication: treating a message as a series of symbols, the shortest possible
representation to transmit the message is the Shannon entropy in bits/symbol multiplied
by the number of symbols in the original message.
Definition: The information entropy of a discrete random variable X, that can take on
possible values {x1...xn} is
where
Characterization
Define and .
Continuity
The measure should be continuous — i.e., changing the value of one of the
probabilities by a very small amount should only change the entropy by a small amount.
Symmetry
The measure should be maximal if all the outcomes are equally likely (uncertainty
is highest when all possible events are equiprobable).
9
For equiprobable events the entropy should increase with the number of
outcomes.
Additivity
It can be shown that any definition of entropy satisfying these assumptions has the form
(1)
where is the probability mass function of outcome , and is the base of the
logarithm used. Common values of are 2, , and 10. The unit of the information entropy
is bit for , nat for , dit (or digit) for .
10
To understand the meaning of Eq.(1), let's first consider a set of possible outcomes
(events) , with equal probability . An example
would be a fair die with values, from to . The uncertainty for such set of outcomes
is defined by
(2)
(3)
Thus the uncertainty of playing with two dice is obtained by adding the uncertainty of the
second die to the uncertainty of the first die .
Now return to the case of playing with one die only (the first one); since the probability
of each event is 1 / n, we can write
In the case of a non-uniform probability mass function (or distribution in the case of
continuous random variable), we let
(4)
which is also called a surprisal; the lower the probability , i.e. , the
higher the uncertainty or the surprise, i.e. , for the outcome
(5)
and is used as the definition of the information entropy in Eq.(1). The above also
explained why information entropy and information uncertainty can be used
interchangeably.
11
Example
The entropy of a system is important because it tells us how much we can hope to
compress streams of messages in the system. In principle, we could hope to get the data
to fit into a system with entropy 1, by finding a perfect compression technique. In
practice, we will usually not achieve better than about 99% efficiency.
Shannon calculated that English text has an entropy of about 2.3 bits per
character. Modern analysis has suggested that actually it is closer to 1.1-1.6 bits per
character, depending on the kind of text.
Further properties
• Adding or removing an event with probability zero does not contribute to the
entropy:
Theorem: Suppose X is a random variable having probability distribution p1, p2, pn,
where pi > 0, 1 ≤ i ≤ n. Then H(X) ≤ log2 n, with equality if and only if pi = 1/n, 1 ≤ i ≤ n.
12
PROOF :
n
H ( x) = − ∑ pi log2 pi
i =1
n n
1 1
= ∑ pi log2 ≤ log2 ∑ ( pi × )
i =1 pi i =1 pi
= log2 n
3. Product Cryptosystems
Another innovation introduced by Shannon in his 1949 paper was the idea of combining
cryptosystems by forming their “product.” This idea has been of fundamental importance
in the design of present-day cryptosystems such as the Data Encryption Standard, which
we study in the next chapter.
For simplicity, we will confine our attention in this section to cryptosystems in which
C = P : cryptosystems of this type are called endomorphic. Suppose
S1 = ( P, P, K 1ε1 D1 ) and S 2 = ( P, P, K 2 ε 2 D2 ) are two endomorphic cryptosystems
which have the same plaintext (and ciphertext) spaces. Then the product of S1 and S2,
denoted by S1 × S2, is defined to be the cryptosystem
( P, P, K 1 × K 2 , ε, D ).
A key of the product cryptosystem has the form K = (K1, K2), where K 1 ∈ K 1 and
K 2 ∈ K 2 .The encryption and decryption rules of the product cryptosystem are defined as
follows: For each K = (K1, K2), we have an encryption rule eK defined by the formula
e( k1 ,k 2 ) ( x ) = e k 2 (ek1 ( x )),
d ( k1 ,k 2 ) ( y ) = d k 1 ( d k 2 ( y )).
13
That is, we first encrypt x with e k1 , and then “re-encrypt” the resulting ciphertext with
e k . Decrypting is similar, but it must be done in the reverse order:
2
Recall also that cryptosystems have probability distributions associated with their
keyspaces. Thus we need to define the probability distribution for the keyspace K of the
product cryptosystem. We do this in a very natural way:
p k (k1 , k 2 ) = p k1 (k1 ) × p k 2 ( k 2 ).
In other words, choose K1 using the distribution pk , and then independently choose K2
1
Suppose M is the Multiplicative Cipher (with keys chosen equiprobably) and S is the
Shift Cipher (with keys chosen equiprobably). Then it is very easy to see that M × S is
nothing more than the Affine Cipher (again, with keys chosen equiprobably). It is
slightly more difficult to show that S × M is also the Affine Cipher with equiprobable
keys.
Let’s prove these assertions. A key in the Shift Cipher is an element k ∈ Z 26 , and the
corresponding encryption rule is eK(x) = x + K mod 26. A key in the Multiplicative
Cipher is an element a ∈ Z 26 ,such that gcd(a, 26) = 1; the corresponding encryption rule
is ea(x) = ax mod 26. Hence, a key in the product cipher M × S has the form (a, K), where
e( a , k ) ( x) = ax + k mod 26
14
But this is precisely the definition of a key in the Affine Cipher. Further, the
probability of a key in the Affine Cipher is 1/312 = 1/12 × 1/26, which is the product of
the probabilities of the keys a and K, respectively. Thus M × S is the Affine Cipher.
Now let’s consider S × M. A key in this cipher has the form (K, a), where
e( a ,k ) ( x) = a ( x + k ) = ax + ak mod 26
Thus the key (K, a) of the product cipher S × M is identical to the key (a, aK) of the
Affine Cipher. It remains to show that each key of the Affine Cipher arises with the
same probability 1/312 in the product cipher S × M. Observe that aK = K1 if and only if
K = a-1K1 (recall that gcd(a, 26) = 1, so a has a multiplicative inverse). In other words, the
key (a, K1) of the Affine Cipher is equivalent to the key (a-1K1, a) of the product cipher
S × M. We thus have a bijection between the two key spaces. Since each key is
equiprobable, we conclude that S × M is indeed the Affine Cipher.
We have shown that M × S = S × M. Thus we would say that the two
cryptosystems commute. But not all pairs of cryptosystems commute; it is easy to find
counterexamples. On the other hand, the product operation is always associative: (S1 ×
S2) × S3 = S1 × (S2 × S3).
If we take the product of an (endomorphic) cryptosystem S with itself, we obtain
the cryptosystem S × S, which we denote by S2. If we take the n-fold product, the
resulting cryptosystem is denoted by Sn. We call Sn an iterated cryptosystem.
A cryptosystem S is defined to be idempotent if S2 = S. Many of the
cryptosystems we studied in Chapter 1 are idempotent. For example, the Shift,
Substitution, Affine, Hill, Vigenere and Permutation Ciphers are all idempotent. Of
course, if a cryptosystem S is idempotent, then there is no point in using the product
system S2, as it requires an extra key but provides no more security.
If a cryptosystem is not idempotent, then there is a potential increase in security
by iterating several times. This idea is used in the Data Encryption Standard, which
consists of 16 iterations. But, of course, this approach requires a non-idempotent
cryptosystem to start with. One way in which simple non-idempotent cryptosystems can
sometimes be constructed is to take the product of two different (simple) cryptosystems.
BIBLIOGRAPHY:
15