Académique Documents
Professionnel Documents
Culture Documents
MATTHEW HENNEKE
Contents
1. Introduction 2
2. Basic Terminology 4
3. The Vigenere Cipher 5
4. Kasiskis Method for Finding the Length of the Keyword 9
5. Friedmans Method for Finding the Length of the Keyword 15
6. Constructing the Keyword 23
7. Recovery of the Plain Text Message 29
8. Appendix A: Maple Code 29
References 35
1. Introduction
Written text has been a very vital part of every civilization for thou-
sands of years. Written text was used to record events, tell stories,
and to relay information both for personal and public use. Through
the years though, the need for text to be understandable to only a few
select people has been of the upmost importance. For government and
military officials, their need to send messages to their colleagues unde-
tected to the public or enemy was crucial to national security. Thus,
as time passed, individuals began to toy with the idea of hiding the
true message through intentional alterations and transformations that
were random to the general public. This is where the art of cryptol-
ogy emerged and still resides today. There is evidence that indicates
cryptology has been in existence since at least the second century A.D.
when Caesar was developing his own cipher systems.
Through the years, cryptology has evolved to include many different
techniques and methods by which to encrypt written text. The process
of encryption is to take a piece of written text known as the plain text
i.e. lyrics of the Beatles The Long and Winding Road that is composed
from the plain alphabet. Then apply a rule of shifts, substitutions, or
other transformations to the plain text. The alphabet that is used in
conjunction with the rule is known as the cipher alphabet. What results
from this application of the rule or multitude of rules is referred to as
the cipher text. It is also possible to use the reverse of the rule or rules
which then will decipher the cipher text thus producing the original
plain text. The rule by which text is transformed can be as simple as
a kids decoder ring found as a prize in a cereal box or as complex as
involving modulo arithmetic and other numerical techniques.
Under the general category of cryptology, there are several methods
that head more specific subcategories of methods. One such method
are shift ciphers in which a copy of the plain alphabet is shifted a
number of spaces to the right by which obtaining the cipher alphabet.
For instance, if it was determined that the cipher alphabet would be a
shift of five letters from the plain alphabet, then the letter A would be
encrypted as the letter F, B=G, and so forth. For example the title or
the plain text
THE LONG AND WINDING ROAD
would be encrypted as the cipher text,
YMJ QTSL FSI BNSINSL WTFI.
For the purpose of comparison, the plain text and cipher text are
aligned above each other in Table 1.
SOLVING THE VIGENERE CIPHER 3
T H E L O N G A N D W I N D I N G R O A D
Y M J Q T S L F S I B N S I N S L W T F I
T H E L O N G A N D W I N D I N G R O A D
U J H M Q Q H C Q E Y L O F L O I U P C G
2. Basic Terminology
In this section, the basic terminology of cryptology that is used
throughout this paper is explained. First, a message that is readable
to all people is known as a plain text. The alphabet that is used to
write the plain text is known simply as the plain alphabet. A cipher
SOLVING THE VIGENERE CIPHER 5
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
B C D E F G H I J K L M N O P Q R S T U V W X Y Z A
C D E F G H I J K L M N O P Q R S T U V W X Y Z A B
D E F G H I J K L M N O P Q R S T U V W X Y Z A B C
E F G H I J K L M N O P Q R S T U V W X Y Z A B C D
F G H I J K L M N O P Q R S T U V W X Y Z A B C D E
G H I J K L M N O P Q R S T U V W X Y Z A B C D E F
H I J K L M N O P Q R S T U V W X Y Z A B C D E F G
I J K L M N O P Q R S T U V W X Y Z A B C D E F G H
J K L M N O P Q R S T U V W X Y Z A B C D E F G H I
K L M N O P Q R S T U V W X Y Z A B C D E F G H I J
L M N O P Q R S T U V W X Y Z A B C D E F G H I J K
M N O P Q R S T U V W X Y Z A B C D E F G H I J K L
N O P Q R S T U V W X Y Z A B C D E F G H I J K L M
O P Q R S T U V W X Y Z A B C D E F G H I J K L M N
P Q R S T U V W X Y Z A B C D E F G H I J K L M N O
Q R S T U V W X Y Z A B C D E F G H I J K L M N O P
R S T U V W X Y Z A B C D E F G H I J K L M N O P Q
S T U V W X Y Z A B C D E F G H I J K L M N O P Q R
T U V W X Y Z A B C D E F G H I J K L M N O P Q R S
U V W X Y Z A B C D E F G H I J K L M N O P Q R S T
V W X Y Z A B C D E F G H I J K L M N O P Q R S T U
W X Y Z A B C D E F G H I J K L M N O P Q R S T U V
X Y Z A B C D E F G H I J K L M N O P Q R S T U V W
Y Z A B C D E F G H I J K L M N O P Q R S T U V W X
Z A B C D E F G H I J K L M N O P Q R S T U V W X Y
key: G O D G O D G O D G O D G O D G
plain: B E S T I L L A N D K N O W T H
cipher: H S V Z W O R O Q J Y Q U K W N
key: O D G O D G O D G O D G O D G O
plain: A T I A M G O D I W I L L B E E
cipher: O W O O P M C G O K L R Z E K S
key: G O D G O D G O D G O D G O D G
plain: X A L T E D A M O N G T H E N A
cipher: A G Z W K R D S C Q M H K K B D
key: O D G O D G O D G O D G O D G O
plain: T I O N S I W I L L B E E X A L
cipher: Z W R T G L C W O R P H K L D R
key: D G O D G O D G O D G O D
plain: T E D I N T H E E A R T H
cipher: H H J W Q Z V H K O U Z V
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
G H I J K L M N O P Q R S T U V W X Y Z A B C D E F
O P Q R S T U V W X Y Z A B C D E F G H I J K L M N
D E F G H I J K L M N O P Q R S T U V W X Y Z A B C
letter in the row of the first keyword letter and look up to the A row
for the plain text letter. Then, repeat the process for each cipher text
letter until the plain text has been recovered. For example, the first
cipher text letter in Psalm 46:10 is H. First, look in the G row and find
the letter H and look up to find the corresponding plain text letter B.
Repeat for the cipher text letter S to find E. Continue the process for
the remaining 75 letters to unveil the plain text message.
Second, the idea of addition modulo 26 as presented above can be
used to decrypt cipher text without the use of a table or square. Again,
the notation mentioned above for modular addition will be utilized.
Equation 1 may be modified to produce the equivalent numerical value
of the plain text, and it is yi = (xi qi mod k ) mod 26, i = 0, 1, 2, . . . , n
1. As a reminder from above, k = 3 with q0 = 6, q1 = 14, andq2 = 3
and n = 77. The first cipher text letter is H, so x0 = 7. Thus,
y0 = (7 6) mod 26 = 1 and this translates into the plain text letter
B. Also from above, the cipher text letter A aligns with the keyword
letter D in the position 33 with 32mod3 = 2. Thus, x32 = 0 and
q2 = 3. Now, y32 = (0 3) mod 26 = 3 which results in a move of
three letters left from A to end at X, the respective plain text letter.
Again, this process would be continued onward until the entire cipher
text had been decrypted to reveal the plain text message.
key: L I V E R P O O L L I V E R P O O L
cipher: E P Z P F C U O Y O E D R U X B U C
key: L I V E R P O O L L I V E R P O O L
cipher: Z I Y X Y P H Z P L L N X F N C I C
key: L I V E R P O O L L I V E R P O O L
cipher: O W J V N X Z Z Y P D Z V U X G O A
..
.
key: L I V E R P O O L L I V E R P O O L
cipher: E P Z V R X B K L D P Z H R L O M S
key: L I V E R P O O L L I V E R P O O L
cipher: L A G I W I O D Z Z T J J K T O F D
key: L I V E R P O O L L I V E R P O O L
cipher: N Z T M E V T C C E P Z H R N K V J
The lyrics to The Long and Winding Road are provided in Figure 3
as reference for the following example of estimating the keyword length
by Kasiskis Method.
To aid the process of analysis, each line of the cipher text is composed
of 8 groups of 5 letters per group. When referring to the cipher text, the
lines are numbered from 1 to 11 from top to bottom and the groups are
numbered 1 to 8 from left to right. Thus, the first letter combination
found is ZKS in line 2, groups 4 and 5. Its respective pairing is found
in line 7, group 4. Below in Table 5, the cipher text is presented with
the respective letter combination pair ZKS underlined.
Table 5. Cipher text for The Long and Winding Road
Now, from the Z in the first letter combination to the W just before
the Z in the second letter combination there are 198 letters (not in-
cluding spaces). For the number 198, the potential divisors that could
SOLVING THE VIGENERE CIPHER 13
The first pair, GN, is found in line 1, group 1 and line 2, group 5. The
number of letters separating the respective pair is 58, and its potential
divisors are 1, 2, and 29. Found in line 1, group 2 and line 3, group 2
is LCH, the second pair. The number of letters between this pair is 80
with divisor values of 1, 2, 4, 5, 8, and 10. The last pair found in the
cipher text for this example is ET in line 2, group 1 and line 2, group
7. The value of 31, a prime number, is the number of letters found
between the two occurrences of ET. Again dismissing the last pair due
to its value of letter separations being a prime number, the greatest
common divisor of 58 and 80 is 2. This is not a reasonable value of a
keyword length due to the faint level of security that is provided from
a keyword length of two. So, Kasiskis Method did as charged and
determined a keyword length, but most likely one would have to take
into consideration the other potential divisors of 58 and 80 such as 4,
5, 8, or 10.
For large-sized text, the procedure just presented would be tedious
and would be bound for error in counting and location of letter combi-
nations. This is where the ability of computers to do many operations
or calculations per second can be very beneficial. David Wright has
written Maple Code that performs Kasiskis Method on a given text
(see Appendix A). In his code, the procedure for performing Kasiskis
Method with inputs is rightfully named as kasiski(msg,l,n). Mes-
sage (cipher text), letter blocks of length l and minimum of n repeti-
tions of letter blocks are the parameters used as inputs for kasiski. In
essence, kasiski will scan the cipher text message for common blocks
of letters of length l with at least n repetitions. Based on these pa-
rameters, kasiski will output the block of letters with its number
of repetitions and the number of letters separating each block will be
given in prime-factored form. The cipher text of the song The Long
SOLVING THE VIGENERE CIPHER 15
and Winding Road was analyzed by the maple code. Thus, examples
of output for each procedure discussed here and in remaining sections
is based on this analysis. For example, the following output is based
on a block of length 3 with at least 3 repetitions. In addition, the
second group of output is based on a block of length 4 with at least
2 repetitions. Strong evidence of the keyword length is found in the
second output group where the factor of 32 = 9 occurs in each letter
block.
kasiski(lwr,3,3);
[[[EPZ,4],[(3)^{3}(5),(3)^{2}(5),(2)(29)]]]
kasiski(lwr,4,2);
in the Riverbank Publication No. 22, Friedman wrote in 1920 The In-
dex of Coincidence and Its Applications in Cryptography. The Index of
Coincidence is the fundamental idea upon which Friedmans Method is
based. The Index of Coincidence is basically the probability of picking
two identical letters from a given text. Considering the basic idea of
the Index of Coincidence, it can used in general with all polyalphabetic
ciphers. It is not limited just to the Vigenere Cipher, yet the Vigenere
Cipher is the focus of this paper. Therefore, it is presented with regards
to the Vigenere Cipher.
It is developed using the idea of how many different ways can two
letters be picked from a given number of letters or how many combi-
nations can be found. From counting the combinatoric formula will
be used to determine the number of combinations. Some notation will
be introduced here that will be frequently used in this section of ma-
terial. In the cipher text, the frequencies of A, B, . . . , Z are denoted
by n0 , n1 , . . . , n25 . The total number of letters contained in a text is
denoted by n such that n = n0 + n1 + . . . + n25 . So, the number of ways
of picking two letters from the entire text is
n n(n 1)
= .
2 2
From the total number of letter pairs that can be formed from the
text, it is necessary to determine how many of them are a pair of the
same letter. For instance the number of As is n0 . Thus, the number
of pairs chosen from As only is n20 = n0 (n20 1) . The number of letter
pairs for each letter A to Z can be found the same. Thus, by summing
the results from each of these calculations, the total number of letter
pairs of identical letters would be
25
X ni (ni 1)
.
i=0
2
Therefore, the Index of Coincidence is formed by the number of letter
pairs of identical letters over the total number of letter pairs found in
the text. Thus,
P25 ni (ni 1)
i=0 2
c = n(n1)
2
which then simplifies to
25
X ni (ni 1)
(2) c = .
i=0
n(n 1)
SOLVING THE VIGENERE CIPHER 17
For example, the Index of Coincidence will be calculated for the text
of Psalm 46:10, and for the purpose of referencing, the ciphertext for
this passage is found in Figure 5. For each letter, the frequency is
reported in Table 7 along with each respective numerical value, ni (ni
1). This value ni (ni 1) represents the number of combinations of
picking two identical letters given there are ni of a particular letter in
the set.
Letter A B C D E F G H I J K L M
ni 1 1 3 3 1 0 3 6 0 2 8 3 2
ni (ni 1) 0 0 6 6 0 0 6 30 0 2 56 6 2
Letter N O P Q R S T U V W X Y Z
ni 1 8 2 4 6 3 1 2 3 7 0 1 6
ni (ni 1) 0 56 2 12 30 6 0 2 6 42 0 0 30
Letter A B C D E F G H I
Percent 8.04 1.54 3.06 3.99 12.51 2.30 1.96 5.49 7.26
Letter J K L M N O P Q R
Percent 0.16 0.67 4.14 2.53 7.09 7.60 2.00 0.11 6.12
Letter S T U V W X Y Z
Percent 6.54 9.25 2.71 0.99 1.92 0.19 1.73 0.09
The manner in which these pairs of letters are chosen resembles picking
a pair of identical letters from an ordinary English text. Therefore, the
proportional number of ways to select a pair of identical letters from
the same column is
n(n k)
(5) 0.065 .
2k
Before, the Index of Coincidence was defined as the probability of
picking a pair of identical letters from a given text. So far, the propor-
tional number of ways of choosing a pair of identical letters from both
different columns and from the same column has been determined. The
sum of Equations 4 and 5 account for the total number of ways of pick-
ing a pair of identical letters from a given text. And, the total
number
ways of picking a pair of letters from a text of n letters is 2 = n(n1)
n
2
.
Therefore, the Index of Coincidence is approximately
n2 (k1) n(nk)
0.03846 2k
+ 0.065 2k
c n(n1)
.
2
k1 k2 k3 k4 k5 k6 k7 k8 k9
E P Z P F C U O Y
O E D R U X B U C
Z I Y X Y P H Z P
L L N X F N C I C
O W J V N X Z Z Y
.. .. .. .. .. .. .. .. ..
. . . . . . . . .
C M G I R S A S E
Z G J Y U D C F O
L L V H R S O
Now, the cipher text has been subdivided into k columns or subse-
quences of letters if the whole cipher text is viewed as a sequence of
letters. Because of this subdivision of the cipher text letters, each col-
umn now represents a particular row of the Vigenere Square or more
specifically a Caesar shift. Therefore, the task at hand is to find the key
for each of the k individual Caesar shifts. For these Caesar shifts, the
key will just be the shift value that will correspond to an equivalent
letter value for each keyword letter. The techniques by which these
individual keys may be discovered are now presented.
Once, the cipher text has been arranged into the subsequences, ki , i =
1, 2, . . . , k, then an analysis of the letter frequency and proportion is
performed for each individual subsequence. For k1 of the example, the
frequency of each letter ,fi , i = 0, 1, . . . , 25, and respective proportion
out of 48 letters, pi , i = 0, 1, . . . , 25, are given below in Table 10.
Letter A B C D E F G H I
fi 1 0 2 2 4 0 0 2 0
pi 2.08 0.00 4.17 4.17 8.33 0.00 0.00 4.17 0.00
Letter J K L M N O P Q R
fi 1 0 7 0 1 4 8 0 1
pi 2.08 0.00 14.6 0.00 2.08 8.33 16.7 0.00 2.08
Letter S T U V W X Y Z
fi 0 3 0 1 3 0 4 4
pi 0.00 6.25 0.00 2.08 6.25 0.00 8.33 8.33
SOLVING THE VIGENERE CIPHER 25
Letter A B C D E F G H I
Sample Text 8.04 1.54 3.06 3.99 12.51 2.30 1.96 5.49 7.26
Example Text 2.08 0.00 4.17 4.17 8.33 0.00 0.00 4.17 0.00
Letter J K L M N O P Q R
Sample Text 0.16 0.67 4.14 2.53 7.09 7.60 2.00 0.11 6.12
Example Text 2.08 0.00 14.6 0.00 2.08 8.33 16.7 0.00 2.08
Letter S T U V W X Y Z
Sample Text 6.54 9.25 2.71 0.99 1.92 0.19 1.73 0.09
Example Text 0.00 6.25 0.00 2.08 6.25 0.00 8.33 8.33
deed coincide with the actual keyword of Liverpool. This method does
provide a means by which to determine the key for each Caesar shift,
but there are more mathematically based methods left to explore.
This is then repeated 25 times with a shift of one vector to the vector
of calculated proportion values. Once all 26 calculations are finished,
the goal is to find the smallest sum of distances which would imply that
the calculated proportion values closely match the known proportion
values. The value of the shift that produces this minimal value is also
the value of the key for the Caesar shift of the respective subsequence.
The calculated proportion values from k1 of the The Long and Wind-
ing Road will be used in examples to find actual values based on the
methods. Thus, for the L1 norm, the smallest value calculated was
46.72 in which m = 11. Thus, a key of 11 belongs to the Caesar shift of
k1 . Then, as before, the overall procedure is repeated for each element
of the keyword.
The third method involves the use of a similar idea to what was
used in the previous method. In this method, the mathematical idea
L2 norm is used. P The L
2
1/2 norm is also well-known as Euclidean
N 2
Distance, |x| = i=1 xi . Thus, in this method, the goal is to
minimize the distance between the vectors. Therefore, the minimal
value of
(9) k p(m) k
is desired with respect to the shift value m. It is helpful to first sim-
plify this statement so as to determine the manner by which it can be
minimized. Thus,
k p(m) k2 = p(m) p(m)
= k p(m) k2 + k k2 2p(m) .
result is the dot product that results in the maximum value of p(m) .
Therefore, the key to the Caesar shift is the value m. For the L2 norm,
the maximum value of p(m) calculated was 117.384 and the minimal
value of k p(m) k calculated was 12.5697. Both of these values were
calculated when the shift value was 11. Thus, the first letter of the
keyword would be L. For the remaining k 1 positions in the keyword,
the method is repeated to determine the actual keyword.
The last method to be presented is based on the L norm. It can
be expressed as |x| = max{|xi | , 1 i N }. It merely picks out
the largest absolute valued element of x. Therefore, L norm will
determine the largest proportion value found in the vector of calculated
proportions. Thus,
(0)
(10) |x| = max{pi , 1 i 26}.
output as demonstrated below for The Long and Winding Road with
a keyword of 9 letters.
guesskey2(lwr,9):
LIVENPOOL
vigen_enc:=proc(plaintext,keyword)
local v, k, l, w, i;
v:=numbers(plaintext);
k:=numbers(keyword);
l:=nops(k);
w:=[seq( (v[i]+k[ (i-1 mod l) +1]-2 mod 26) +1,
i=1..nops(v) )];
RETURN(letters(w))
end:
vigen_dec:=proc(ciphertext,keyword)
local v, k, l, w, i;
v:=numbers(ciphertext);
k:=numbers(keyword);
l:=nops(k);
w:=[seq( (v[i]-k[ (i-1 mod l) +1] mod 26) +1,
i=1..nops(v) )];
RETURN(letters(w))
end:
index_of_coincidence:=proc(ciphertext)
local v,x,n,u,ans;
SOLVING THE VIGENERE CIPHER 31
v:=numbers(ciphertext);
n:=nops(v);
u:=[seq( nops(select(has,v,x)), x=1..26 )];
ans:=(evalm(u &* u)-n)/(n*(n-1));
RETURN( evalf(ans) );
end:
# Friedmans formula for the guess for the length of the keyword
friedman:=proc(ciphertext)
local k,n,v;
v:=numbers(ciphertext);
n:=nops(v);
k:=index_of_coincidence(ciphertext);
RETURN( 0.027*n/( (n-1)*k -0.038*n +0.065) );
end:
friedman2:=proc(ciphertext)
local k;
k:=index_of_coincidence(ciphertext);
RETURN( 0.027/( k -0.038) );
end:
numfreq:=proc(msg)
local v;
v:=numbers(msg);
RETURN([seq( nops(select(has,v,x)), x=1..26 )]);
end:
n:=nops(v);
u:=[seq( v[k[1]*x+k[2]], x=0..floor( (n-k[2])/k[1] ) )];
w:=letters(u);
RETURN( freq(w) );
end:
basefreq:=[[A,B,C,D,E,F,G,H,I,J,K,L,M,
N,O,P,Q,R,S,T,U,V,W,X,Y,Z],
[8.167,1.492,2.782,4.253,12.702,2.228,2.015,6.094,6.966,
0.153,0.772,4.025,2.406,6.749,7.507,1.929,0.095,5.987,
6.327,9.056,2.758,0.978,2.360,0.150,1.974,0.074]]:
basechart:=display( seq(
rectangle( [x-0.25,basefreq[2,x]],[x+0.25,0],color=blue),
x=1..26) ):
blockfreq:=proc(s,m)
local v,n,i,ans,block,x,y;
n:=length(s);
v:=sort([seq(substring(s,i..(i+m-1)),i=1..(n-m+1))]);
ans:=[];
block:=v[1];
y:=1;
for x from 2 to n-m+1 do
if v[x] = block then
y:=y+1;
continue;
else
ans:=[op(ans),[block,y]];
block:=v[x];
y:=1;
fi;
od;
ans:=[op(ans),[block,y]];
RETURN(ans)
end:
blockfreqsort:=proc(s,m)
local A,x,y;
A:=blockfreq(s,m);
sort( A, (x,y)->evalb(x[2]>y[2]))
end:
kasiski:=proc(msg,l,n)
local v,x,y,ans,u,facs,i;
v:=select((x,y)->evalb(x[2]>=y), blockfreqsort(msg,l),n);
34 MATTHEW HENNEKE
ans:=[];
for x from 1 to nops(v) do
u:=findstring(msg,v[x][1]);
facs:=[seq( ifactor(u[i+1]-u[i]), i=1..nops(u)-1 )];
ans:=[op(ans), [v[x],facs] ];
od;
RETURN(ans)
end:
guesskey2:=proc(cipher,l)
global freq0;
local nums, i, p, inds, m, j, x, y;
nums:=[];
for i from 1 to l do
p:=convert(row(subfreq(cipher, [l,i]), 2), list);
inds:= [seq( evalf( evalm(p &* cycle(freq0, -j))/norm(p,2) ),
j=0..25)];
m:=max(op(inds));
for j from 1 to 26 do
if inds[j] >= m then break fi;
od;
nums:=[op(nums), j ];
od;
RETURN(letters(nums))
end:
SOLVING THE VIGENERE CIPHER 35
References
[1] Thomas H. Barr. Invitation to Cryptology, Prentice Hall, Upper Saddle River,
NJ, 2002.
[2] F.L. Bauer. Decrypted Secrets: Methods and Maxims of Cryptology, Springer-
Verlag, Berlin, 1997.
[3] Ole Immanuel Franksen Mr. Babbages Secret: The Tale of a Cypherand APL
Prentice-Hall, Englewood Cliffs, New Jersey, 1984.
[4] David Kahn The Code Breakers: The Story of Secret Writing, Scribner, New
York, NY 1996.
[5] Simon Singh The Code Book: The Science of Secrecy From Ancient Egypt to
Quantum Cryptography, Anchor Books, New York, NY, 1999.
[6] Douglas R. Stinson Cryptography: Theory and Practice, CRC Press, Boca
Raton, FL, 1995.
[7] David Wright Project Instructional Meetings, Oklahoma State University, Still-
water, OK, 2002.