Académique Documents
Professionnel Documents
Culture Documents
1 Introduction H(X) leads to Zero Redundancy, that is, has the exact
number of necessary bits to represent S.
In order to maintain a secure communication
between remote points it is necessary to guarantee The encoding produced by Huffman’s algorithm is
the integrity and confidentiality of both incoming prefix-free, and satisfies [Stin95]:
and outcoming information. The communication
cost is related to the volume of exchanged H(X) ≤ l(Huffman) < H(X)+1,
information. Hence, information compression is
essential. Besides that, one must also guarantee that where l is the weighted average length.
sniffers can not be able to decipher in transit
messages. Here, we propose two cryptosystems based on Huffman
coding.
To protect data against statistical analysis, Shannon
[Shan49] suggested that the language redundancy Sometimes an alphabet provides multiple substitutions
should be reduced before encryption. We use the for a letter. Thus, a symbol xi of a plaintext X instead of
well-known Huffman codes to achieve this. always being replaced by a codeword c, will be
Huffman codes have optimal average number of replaced by any codeword of a set (c1, c2, ...). These
bits per character, among prefix-free codes. Besides alternates used in the multiple substitution are called
that, Rivest et al [Rive96] tried to cryptanalyse a homophones [Simm91].
file that has been Huffman coded (and not
encrypted) and find that it was “surprisingly Günther et al [Gunt88] introduced a coding technique
difficult”. using homophones such that their encoding tables
generates a stream of symbols which all have the same
First, let us introduce some concepts. Let P be the frequency. Then, Massey et al [Mass89] proposed a
set of possible plaintexts, and S={s1, s2, …, sn} the scheme, based on Günter’s homophonic substitution, to
plaintext alphabet of X, X∈P such that X=x1x2… generate homophones by decomposing the probability
where xi∈S. Let n be the number of symbols in S. If of each symbol in a sum of negative powers of 2,
pi is the probability of si to appear in the plaintext X generating new symbols.
we have the Entropy of X, defined by H(X) = -
SUMi (pi) . log pi, as the average number of bits to Our first cipher is a multiple substitution procedure. We
represent each symbol si∈S. Moreover, we say that substitute each Huffman coded symbol by a string of
“fake” codes followed by the symbol itself. It is a
steganographic technique, that is, to disguise the (right child) shown in figure 1 is created for these
symbol by mixing it with other fake ones. symbols.
c. Diffusion
fi
xi A well-known cipher attack is statistical analysis. In
δ1 any language some characters are used more than
δ2 others. Hence, an attack could be done by counting the
δ3 frequencies of each symbol in a ciphertext and trying to
.. assign each one to characters in the language that have
xi the same distribution frequencies that the ones found in
Li the ciphertext.
Figure 4 – Generation of homophones of η
followed by the effective symbol With fake codes generation, we achieve some diffusion
in the distribution of frequency of codes. Then,
Finally, let β be a fake symbol generation rate, that counting the frequencies of codes can lead to wrong
is, the probability of generating a fake code at each assignments to symbols.
round before outputting the effective code.
It’s obvious that a great disadvantage of this scheme is
So, the coding procedure is the following pseudo- text expansion. But it has great benefits too: diffusion
code: of distribution frequencies and the lack of correlation
between symbols in the language. Both features lead to
a more difficult statistical analysis.
1. Choose a random number p, p in [0,1]
2. If p<β then d. Text Expansion
eRs(xi)=δj
To estimate the text expansion of multiple substitution
return to 1
we observe that:
3. Else
eRs(xi)=xi
(i) The average number of output ciphers
i is increased by 1
generated is the average of a Geometrical
4. If i≤n then return to 1
Distribution, that is, 1/(1-β);
5. End.
(ii) One additional bit per character due to the ID
bit is needed. Since we have H(X) ≤
l(Huffman) < H(X)+1, it leads to H(X) + 2.
The parameter β is used to set the number of fake
symbols generated between real symbol outputs. It
The text expansion can be estimated by the average
is used to balance diffusion and text expansion.
number of characters generated times the average
number of bits. So, the average number of bits B per
character is:
B ≤ [1/(1-β)] . ( H(X) + 2 ) If |k|<|X|, that is, the size of the key is less than the size
of the plaintext, we have two alternatives to generate
For example, using β=0.30 and assuming that we the keystream:
have H(X)=4.19 for monogram parsing, and an
average HL=1.25 for the entropy of english
language we get:
1/(1-β) = 1/(1-0.30) = 1.43, that is, 43% of text Alternative 1: Cyclic keystream generation - when
expansion due to fake codes. the key k ends up we return back to the beginning
and so on. Then, we have the key bit defined by zi
B1 ≤ 1.43 (4.19+2) = 8.85 = f(k, i) = ki mod φ, where φ=|k|;
BL ≤ 1.43 (1.25+2) = 4.65
Alternative 2: Random keystream generator -
So, we should still achieve compression using word function f generates bit-streams using key k as a
parsing in our encrypted Huffman if compared to seed.
standard ASCII representation (8 bits per
character).
We assume alternative 1 as illustrated in figure 5.This
way, we maintain Huffman’s coding synchronism
3.2 Stream Cipher properties. Alternative 2 is not used since it causes
dependency with past.
Suppose that someone has the encoded text, the
Huffman codetable and the number of fake codes keystream k
defined by β. Then, decoding is immediate. 0 0 1 0 0 1 1 0 1 0
Therefore, a secret-key is necessary to add
confusion to the process. The secret-key we use is XOR
fully scalable. It can have any length we desire: 48
bits, 128 bits, 256 bits, etc. ID-bit
0 1 1 0 1
So, we introduce a second procedure, a stream codeword
cipher.
Figure 5 – XOR between the i-th bit of the key k and
A Stream cipher is a cryptosystem (P, C, K, L, F, E, the ID bit of the codeword
D) where we additionally define:
1. L as a finite set called the keystream alphabet;
2. F=(f1, f2, …) as the keystream generator. For i b. Coding
>= 1, fi: K x Pi-1 => L.
We define coding and decoding procedures as ⊕ - XOR
(exclusive-or) operations between the zi bit and all the
a. Keystream bits of the codeword. This is equivalent of exchanging
places of a symbol in the Huffman tree:
Let k∈K and X=x1x2…. The stream cipher
procedure is defined as:
yi = ek(xi) = (zi xor xi,1, zi xor xi,2, …, zi xor xi,|xi|)
(i) Z=z1z2… is the keystream. We have a xi = dk(xi) = (zi xor yi,1, zi xor yi,2, …, zi xor yi,|yi|)
function fi that generates zi from k: zi =
fi(k, x1, x2, …, xi-1);
(ii) zi is used to cipher xi such that yi = ezi(xi); However, we use indeed a simpler procedure defined by
a XOR between the zi bit and only the first bit of the
codeword. This is equivalent of only disguising the ID
We have a new zi key for each xi that comes in, and bit:
this zi key is generated from the past z1, y1, z2, y2,
…, zi-1, yi-1.
yi = ek(xi) = (zi xor h1, xi,2, …, xi,|xi|)
Our method consists of a single constant function xi = dk(xi) = (zi xor h1’, yi,2, …, yi,|yi|)
such that zi = f(k,i).
With this secret key included in the process, we Usually, one make serial procedures to compress and
have an encryption system to use in insecure then encrypt a file. In this work, we proposed simple
communication channels. modifications in Huffman codes to add encryption to its
compression feature.