Académique Documents
Professionnel Documents
Culture Documents
http://ece.gmu.edu/crypto-text.htm
Motivation
Hardware
Efficiency
Software
Efficiency
Flexibility
Flexibility
Additional key-sizes and block-sizes
Ability to function efficiently and securely in a wide
variety of platforms and applications
low-end smartcards, wireless: small memory requirements
IPSec, ATM small key setup time in hardware
B-ISDN, satellite communication large encryption speed
Round 1
Security
Software efficiency
Flexibility
August 1999
5 final candidates
Mars, RC6, Rijndael, Serpent, Twofish
October 2000
1 winner: Rijndael
Belgium
Round 2
Security
Hardware efficiency
Europe
NESSIE Project
New European Schemes for Signatures,
Integrity, and Encryption
2000-2002
Japan
CRYPTREC Project
2000-2002
NESSIE, CRYPTREC
Multiple types of transformations:
Symmetric-key block ciphers
Stream ciphers
Hash functions
MACs
Asymmetric encryption schemes
Asymmetric digital signature schemes
Asymmetric identification schemes
500
450
400
350
300
250
200
150
100
50
0
Mars
# votes
100
90
80
70
60
50
40
30
20
10
0
RC6
Mars
NSA
ASIC
606
600
500
414
GMU
FPGA
431
400
300
202
177
200
100
0
105
143
103
57
61
Mars
Speed [Mbits/s]
128-bit key
192-bit key
256-bit key
30
25
20
15
10
5
0
Rijndael
RC6
High
Adequate
Serpent
MARS
Twofish
Rijndael
RC6
Simple
Complex
Complexity
Twofish
23
10
Mars
Rijndael
RC6
3
15
16
5
11
32
10
5
20
5
10
15
20
25
30
35
# of rounds in the attack/total # of rounds
Serpent
72%
38%
Twofish
Mars
62%
31%
69%
70%
Rijndael
RC6
30%
25%
75%
0
10
20
30
40
50
60
70
80
90 100
600
500
400
359
300
200
100
0
Complexity
of the best attack
the same as
SHA-1
SHA-512
280
2256
Skipjack
AES-256
Historical view
Secret-key ciphers
Hash functions
1970
DES optimized for hardware
DES-based hash functions
optimized for hardware
1980
1990
2000
time
MD4-family
optimized primarily
for software
Software or hardware?
HARDWARE
SOFTWARE
security of data
during transmission
low cost
flexibility
(new cryptoalgorithms,
protection against new attacks)
speed
random key
generation
access control
to keys
tamper resistance
(viruses, internal attacks)
Efficiency indicators
Software
Speed
Memory
Speed
Area
Power
consumption
Efficiency parameters
Latency
Mi
Encryption/
decryption
Ci
Throughput = Speed
Mi+2
Mi+1
Mi
Time to
encrypt/decrypt Encryption/
a single block
decryption
of data
Ci+2
Ci+1
Ci
Number of bits
encrypted/decrypted
in a unit of time
Block_size Number_of_blocks_processed_simultaneously
Throughput =
Latency
IV+1
IV+2
IV+N-1
IV+N
...
M0
M2
M1
C1
C2
...
MN
MN-1
C3
Ci = Mi E(IV+i)
CN-1
for i=0..N
CN
Encryption/
decryption
unit
Encryption/
decryption
unit
Encryption/
decryption
unit
Encryption/
decryption
unit
Encryption/
decryption
unit
Encryption/
decryption
unit
Cipher 1
round 1
round 1
round 2
target
clock
period,
e.g., 20 ns
...
...
round 10
round 16
block size
Speed =
target_clock_period
clock
cycle
clock
cycle
B1
B2
B1
B3
B2
B1
B4
B3
B2
B1
B5
B4
B3
B2
B6
B5
B4
B3
B7
B6
B5
B4
B8
B7
B6
B5
10
11
12
13
14
15
16
B9
B8
B7
B6
B10
B9
B8
B7
B11
B10
B9
B8
B12
B3
B2
B9
B13
B4
B3
B10
B14
B5
B4
B11
B15
B6
B5
B12
B16
B7
B6
B13
Rijndael
6.4 Gbit/s
Serpent RC6
Twofish
5000
Mars
4000
3000
2000
1000
0
10000 20000
30000
40000
50000
60000
16.8
15.2
16
13.1
14
12.2
12
10
8
6
4
2
0
Serpent
Twofish
RC6
Rijndael
dedicated memory
blocks, RAMs
46,900
35000
30000
25000
20000
19,700
21,000
12,600
80 RAMs
15000
10000
5000
0
Serpent
Twofish
RC6
Rijndael
High
RC6
Mars
Medium
Low
Small
Medium Large
Area
M3
M2
MN-1
MN
...
IV
C1
C2
...
C3
CN-1
C1 = E(Mi IV)
Ci = E(Mi Ci-1)
for i=2..N
CN
Initial transformation
i:=1
Round Key[i]
Cipher Round
i<#rounds?
Round Key[#rounds+1]
Final transformation
i:=i+1
#rounds
times
multiplexer
register
one round
combinational
logic
loop-unrolling
basic architecture
k=2
k=3
k=4
k=5
area
Serpent I8
Rijndael
300
Twofish
Serpent I1
200
100
RC6
Mars
1000
2000
3000
4000
5000
Area [CLB slices]
Throughput [Mbit/s]
700
600
Rijndael
500
400
300
Serpent I1
200
100
0
RC6
0
Mars
Twofish
10
15
20
25
30
35
40
Area [CLB slices]
Before
D0
D0
D1
D1
multiplexer
F
F
D0
D1
D0
register
D1
register
basic architecture
resource sharing
Area
Rijndael Serpent
Twofish
RC6
Low
MARS
Small
Medium Large
Area
Speed [Mbits/s]
128-bit key
192-bit key
256-bit key
30
25
20
15
10
5
0
Rijndael
RC6
Mars
Considered ciphers:
Blowfish, CAST, CAST-128, CAST-256, CRYPTON,
CS-Cipher, DEAL, DES, DFC, E2,
FEAL, FROG, GOST, Hasty Pudding, ICE,
IDEA, Khafre, Khufu, LOKI91, LOKI97,
Lucifer, MacGuffin, MAGENTA, MARS, MISTY1,
MISTY2, MMB, RC2, RC5, RC6,
Rijndael, SAFER K, SAFER+, Serpent, SQUARE,
SHARK, Skipjack, TEA, Twofish, WAKE,
WiderWake
30
10
7
1
S-box
Variable
rotation
Modular
multiplication
GF(2n)
multiplication
Modular
inversion
40
40
35
30
25
20
15
10
5
0
Boolean
(XOR, AND, OR,
etc.)
25
20
?
Fixed
rotation
Modular
addition
& subtraction
Permutation
Hardware
ROM
n-bit address
WORD S[1<<n]=
{ 0x23, 0x34, 0x56
..............
}
2n m
bits
S
m
ASM
m-bit output
direct logic
...
...
S DW 23H, 34H,
56H
..
x1
x2
xn
2n words
y1
y2
ym
S
4
S
4
S
4
...
S
8
...
S
8
S
4
S
4
S
4
...
S
8
...
S
8
S
8
Hardware
Mux-based shifter
A<<<0 A<<<16
C
C = (A << B) | (A >> (32-B));
A <<< B
32
variable rotation
ROL32
ASM
ROL A, B
B[4]
B[3]
B[2]
B[1]
B[0]
A<<<B
High-speed clock
fast clock
CLK
Software
n
Hardware
C
unsigned long A, B, C;
C = A*B;
MUL
n
C
C=AB mod 2n
n=32, 16
ASM
MUL
HalfMultiplier
C = const
8
MUL GF(28)
Hardware
C
x0 x3 x4 x7
x0 x3 x7
<<, ^, |, &
or
...
alog[log[X]+log[C]%255]
ASM
y0
y7
P
n
Permutation
Hardware
x1 x2 x3
C
complex
sequence of
instructions
<<, |, &
ASM
complex
sequence of
instructions
ROL, OR, AND
xn-1 xn
...
...
y1 y2 y3
yn-1 yn
order of wires
Hardware
C
C = (A << n) | (A >> (32-n));
fixed rotation
ROL32
...
...
A <<< n
32
xn-1 xn
x1 x2 x3
ASM
ROL A, n
y1 y2 y3
yn-1 yn
order of wires
XOR, AND, OR
Hardware
a0
an-1
b0
A^ B
A&B
A| B
...
yn-1
y0
ASM
Y
bn-1
a0
an-1
b0
XOR A, B
AND A, B
OR A, B
bn-1
...
y0
yn-1
Software
n
Hardware
C
unsigned long A, B, C;
C = A+B;
ADD
n
C
C=A+B mod 2n
n=32, 16
ASM
ADD
Adder/subtractor
Area
Basic operations
Delay and area in HARDWARE
Delay
modular
multiplication
addition (RC)
GF(2n)
multiplication
Boolean
permutation
fixed rotation
modular
inverse
variable
rotation
addition (CLA)
S-box
4x4
S-box
8x8
S-box
9x32
Area
Basic operations
Delay and area in SOFTWARE
Delay
modular inverse
permutation
GF(2n)
multiplication
variable rotation
fixed rotation
multiplication
addition
Boolean
S-box
4x4
S-box
8x8
S-box
9x32
Memory
RC6
Mars
RC6
Mars
addition (RC)
GF(2n)
multiplication
Boolean
permutation
fixed rotation
modular
inverse
variable
rotation
addition (CLA)
S-box
4x4
S-box
8x8
S-box
9x32
Area
addition (RC)
GF(2n)
multiplication
variable
rotation
addition (CLA)
S-box
permutation 4x4
fixed rotation
Boolean
modular
inverse
S-box
8x8
S-box
9x32
Area
addition (RC)
GF(2n)
multiplication
variable
rotation
addition (CLA)
S-box
permutation 4x4
fixed rotation
Boolean
modular
inverse
S-box
8x8
S-box
9x32
Area
permutation
GF(2n)
multiplication
variable rotation
fixed rotation
multiplication
addition
Boolean
S-box
4x4
S-box
8x8
S-box
9x32
Memory
Slow or
big
permutation
GF(2n) multiply
Fast &
compact
S-box
modular inverse
variable rotation
Boolean
fixed rotation
addition
Slow or big
multiplication
Slow & big Hardware
Types of ciphers
Deal
LOKI97
Magenta
SubstitutionLinear Transformation
Networks
Rijndael
Serpent
Safer+
Crypton
Modified Feistel
Network
RC6
MARS
CAST-256
Others
Frog
HPC
D[3] D[2]
K2r+8 K2r+9
<<< 1
F - function
>>> 1
D[3] D[2]
D[1] D[0]
D[2]
D[1]
D[0]
k
k=K[4+2i],
k = K[5+2i],
i - round no.
out1
out2
in
out3
<<<13
D[3]
D[2]
D[1]
D[0]
S-boxes
Linear Transformation
K[i]
128
128
initial permutation
128
128
encryption
block
128
decryption
block
128
128
128
final permutation
128
encryption
Inversion in GF(28)
affine
transformation
InvShiftRow
subkey
ShiftRow
InvMixColumn
MixColumn
subkey
Triple DES
40
Serpent
Mars
30
20
10
RC6
DES
Twofish
Rijndael
Complexity of a round
20
40
60
80
regular round
S-box 4x4 XOR7 MUX2
100
K. Gaj, P. Chodowiec
April 2000
Rijndael
S-box 8x8 XOR6 XOR5 XOR4 2 MUX2
Twofish
2 ADD32 6 S-boxes 4x4 9 XOR2 XOR5 XOR4 2 MUX2
RC6
SQR32 2 ADD32 ROT32 4 MUX2
Mars
ADD32 MUL32 ROT32 ADD32 2XOR2 4 MUX2
Twofish
23
10
Mars
Rijndael
RC6
3
15
16
5
11
32
10
5
20
5
10
15
20
25
30
35
# of rounds in the attack/total # of rounds
K0
round 0
32 x S-box 0
linear transformation
one implementation
round of Serpent
=
K7
round 7
32 x S-box 7
linear transformation
K32
128
output
8 regular cipher
rounds
Ki
128
32 x S-box 1
128
128
linear transformation
K32
128
output
128
32 x S-box 7
128
Serpent I8
Twofish
Serpent I1
200
100
RC6
Mars
1000
2000
3000
4000
5000
Area [CLB slices]
Parallelism
Parallelism in SHA-1
A
A
32
32
ROTL5
A
ROTL5
B
32
B
32
ROTL30
ROTL30
C
32
32
ft
ft
32
32
32
Kt
Wt
32
Kt
Wt
step n+1
ROL30
ROL1
step n+2
ROL30
ROL1
ROL5
step n+3
ROL30
ROL1
ROL5
step n+4
ROL30
ROL1
ROL30
ROL5
ROL1
ROL5
ROL30
8
7
6
5
4
3
2
1
0
SHA-1
MD5
MD4
Optimization tricks
T0
T1
T2
k2
b0 b1
b2 b3
T3
x3,2
x2,2
x1,2
Speed-up in software:
Speed-up in hardware:
x0,2
b2
~ 100 times
~ 20%
(0)
(1) (1)
(1)
(2) (2)
(3)
(31) (31)
(31)
(2) (2)
(0)
(3)
(3)
(1)
x
x1 x2 x3 x4 x1 x2 x3 x4 x1 x2 x3 x4 x1 x(3)
x
4
3
2
(31)
x1 x2 x3 x4
y1(0)
y1(1)
y1(2)
y1(3)
y1(31)
(k)
1
(k) (k)
1
2
(k) (k)
3
4
e.g.
y = f (x , x , x , x ) =
(1) (0)
x
. . . x1(3) x(2)
x1
1
1
AND
(30)
(1) (0)
x(31)
. . . x2(3) x(2)
2 x2
2 x2 x2
=
(31) (30)
(1) (0)
u1 u1 . . . u1(3) u(2)
u
u1
1
1
XOR
(k)
(k )
(1) (0)
. . . x3(3) x(2)
3 x3 x3
OR
(30)
(1) (0)
x(31)
. . . x4(3) x(2)
4 x4
4 x4 x4
=
(31) (30)
(1) (0)
v1 v1 . . . v1(3) v(2)
1 v1 v1
(30)
x(31)
3 x3
(30)
x(31)
1 x1
(30)
y(31)
1 y1
(k) (k)
x1 x2 (x3 x4 )
(1) (0)
y1(3) y(2)
1 y1 y1
Mathematicians
Security
Flexibility
Software
efficiency
Computer
scientists
Hardware
efficiency
Computer
Engineers
$A100 Challenges
For mathematicians:
Prove or disprove that Serpent with
all S-boxes identical
16 rounds
is at least as secure as Rijndael
For computer scientists:
Is there a way of using instruction level parallelism
to speed-up software implementation of
[modified] Serpent to make it as fast as Rijndael?
$A50 Challenge
For mathematicians:
Is there a way of changing Serpent into
a modified Feistel network cipher
without loosing its security properties?