AES Paper

21st International Conference on VLSI Design
Single Chip Encryptor/Decryptor Core

Implementation of AES Algorithm
Monjur Alam Santosh Ghosh Dipanwita RoyChowdhury Indranil Sengupta
Department of Computer Science and Engineering
Indian Institute of Technology, Kharagpur
{monjur, santosh, drc, isg}@cse.iitkgp.ernet.in
Abstract—This paper presents a single chip encryp- architecture which exploits composite field GF((24 )2 ) for S-
tor/decryptor core implementation of Advanced Encryption Stan- box optimization. To our knowledge, it is the most compact
dard (AES-Rijndael) cryptosystem. The suggested architecture AES implementation so far. Single chip implementation of
is capable of handling all possible combinations of standard bit
lengths (128,192,256) of data and key. The fully rolled inner- Rijndael with hardware sharing was introduced by Zhang et
pipelined architecture ensures lesser hardware complexity. The al. [19]. It did rearrange the MixColumn operation and shared
architecture does reutilize precomputed blocks, in the sense that maximum hardware. All the above mentioned approaches are
the same hardware is shared during encryption and decryption capable of handling fixed bit length (128-bit) of data and key.
as much as possible. The design has been implemented on Xilinx
XCVe1000-8bg560 device. The performance of the architecture Rijndael encryptor capable of handling all possible data and
has been compared with existing results in the literature and has key lengths was first introduced by Verbauwhede et al. [18].
been found to be the most efficient (throughput/area) implemen- Although it supports all possible combination of data and key
tation of the AES algorithm. lengths (128, 192 and 256-bit), it does not consider the design
Index Terms—Reconfigurable Architecture, AES, Rijndael, S- of decryption unit. A highly regular approach was presented
box, Composite fields. by Mangard et al. [13]. This 32-bit architecture performs
encryption and decryption for various key sizes, but fixed
I. I NTRODUCTION block size (128-bit). The most area optimized reconfigurable
AES-Rijndael implementation to date was demonstrated by
The Rijndael1 block cipher [4] was chosen as the Ad- Monjur et al. [3]. This work developed a FSMD model based
vanced Encryption Standard (AES) by the National Institute controller which is ideal for such iterative implementation of
for Standard and Technology (NIST) in 2000. Compared to AES. But, it has also not explored the design at decryption
their relatively slower software [2] counterparts, VLSI (FPGA unit. In a follow-up article the authors came up with a latency
and ASIC) implementations of AES-Rijndael have become optimized version of their architecture [21]. The architecture
attractive. Although the popularly used AES-Rijndael is of can process data during both edges of clock transition. At
128-bit, originally the algorithm was proposed for 128, 192 each clock cycle, the key scheduler generates a round key,
and 256-bit. To keep up the market demand it is absolutely which ensures enhanced performance. Also the fully rolled
incumbent that the proposed AES architecture should work encryption unit can complete a round transformation in a
under any possible (128, 192 and 256-bit) key or data bit single clock cycle with more combinational path delay.
frame sizes, so that they can be used in multifarious fields
The reconfigurable AES-Rijndael architecture proposed in
(Viz: E-banking, Identity card, SIM card). Thus the issue of
this paper is the enhanced version of the works [3] and [21].
reconfigurability is of immense practical importance over the
The architecture can process all 9 possible combinations of
last few years.
key and data lengths. All computations are done in composite
Over the recent years many FPGA [9], [10], [14], [16], [17], field GF(((22 )2 )2 ) rather than GF(28 ). The effective use of
[21] and ASIC [3], [5], [6], [7], [8], [11], [18] implementations composite field GF(((22 )2 )2 ) helps to reduce the hardware
for Rijndael has been reported. Most of them have used look- complexity. It reutilizes precomputed blocks and the same
up tables to implement S-Boxes. However, none of them hardware is shared during encryption and decryption as much
are able to offer the reconfigurability feature. The advent as possible. Due to its rolling technique the design supports all
of composite field GF((24 )2 ) arithmetic in S-box operation standard modes of encryption operation including CBC, OFB
was first noted in the works of Rijmen [4] and Rudra et etc.
al. [6]. Among the designers who tried to produce an area
The remainder of this paper is organized as follows. In
optimized implementation using composite field arithmetic,
Section II, the AES algorithm is briefly described. Section
the works [3], [7], [8] are of importance. Feldhofer et al. [7]
III discusses our proposed AES architecture with several sub-
implemented 128-bit AES on a grain of sand. It is a 8-bit
unit optimizations. Section IV presents the implementation
1 The terms AES, Rijndael and AES-Rijndael are used in the same meaning
results and compares our results with existing ones, followed
through out the paper by concluding remarks in section V.
1063-9667/08 $25.00 © 2008 IEEE 697

693
DOI 10.1109/VLSI.2008.82
TABLE I
II. AES-R IJNDAEL A LGORITHM S HIFT OFFSETS FOR DIFFERENT BLOCKS
The AES is a symmetric block cipher [1]. It operates on DataBlock Row1 Row2 Row3
128-bit blocks of data. The algorithm can encrypt and decrypt 128 1-Byte 2-Byte 3-Byte
192 1-Byte 2-Byte 3-Byte
blocks using secret keys of size either 128-bit, 192-bit, or 256 1-Byte 3-Byte 4-Byte
256-bit. The Rijndael block cipher [4] was designed by John
Daemen and Vincent Rijmen. It operates on various blocks
(128, 192, or 256-bit) with the combination of different key The MixColumn transformation operates on each column
and data length (128, 192 or 256-bit). of State (X) individually. If Fm (X) is defined as a function of
the transformation of MixColumn that operates on State X,
then:
10/ 12/ 14 10/ 12/ 14
Round Transformation
times
Round Transformation
InvShiftRow
times Fm (Xi j ) = (m1 X1 j + m2 X2 j + m3 X3 j + m4 X4 j ) modulo
ByteSub
ShiftRow InvByteSub
x4 + 1, where (m1 , m2 , m3 , m4 ) is a permutation of
State MixColumn
State State AddRoundKey
State ({01}16 , {01}16 , {02}16 , {03}16 ) and Xi j is the (i, j)th
Plaintext AddRoundKey Ciphertext Ciphertext InvMixColumn Plaintext element of the State matrix. In matrix form the MixColumn
transformation can be expressed as
(a) (b)
    
Fig. 1. Architecture of (a) Encryption and (b) Decryption datapath
X0, j {02}16 {03}16 {01}16
{01}16 X0, j
  
 X1, j   {01}16 {02}16 {01}16 
{03}16 
  X1, j 

 =
{03}16   X2, j 

Figure 1 outlines the basic structure (128-bit) of the al-  X2, j  {01}16 {01}16 {02}16
gorithm. The round transformation consists of four different
X3, j {03}16 {01}16 {01}16
{02}16 X3, j
steps: ByteSub, ShiftRow, MixColumn and AddRoundKey. They (1)
are performed in this order with the exception of the final
In case of inverse MixColumn (InvMixColumn), the same
round which is slightly different. All transformations are
polynomial is used. If Fm−1 (X) is defined as a function of the
based on byte-oriented operations. AddRoundkey consists of
transformation of InvMixColumn that operates on State X then:
bitwise XOR operations. The transformations operate on the
Fm−1 (Xi j ) = (m−1 −1 −1 −1
1 X1 j + m2 X2 j + m3 X3 j + m4 X4 j ) modulo
intermediate result, which is called the State. The initial State 4 −1 −1 −1 −1
x + 1, where (m1 , m2 , m3 , m4 ) is a permutation of
is the input plaintext and the final State, after the round
({0E}16 , {0B}16 , {0D}16 , {09}16 ) and Xi j is the (i, j)th ele-
transformations, is the output ciphertext. The state is organized
ment of the State matrix.
as a 4×4 (for 128-bit block) or 4×8 (for 256-bit block) matrix
of bytes. The round transformation scrambles the bytes of In matrix form the InvMixColumn transformation can be
the State either individually, row-wise, or column-wise by expressed as
applying the functions ByteSub, ShiftRow, MixColumn and
    
AddRoundKey sequentially. X0, j {0E}16 {0B}16 {0D}16 {09}16 X0, j
The ByteSub2 transformation is a non-linear byte substitu-   
 X1, j   {09}16 {0E}16 {0B}16 {0D}16 

 X1, j 

tion, also called S-box. The S-box is invertible and consists  =
{0B}16   X2, j 

 X2, j  {0D}16 {09}16 {0E}16
of the following two operations:
{0B}16 {0D}16 {09}16 {0E}16 X3, j
X3, j
• Inversion in the GF(28 ) field, modulo the irreducible (2)
polynomial m(x) = x8 + x4 + x3 + x + 1.
• Affine transformation defined as: Y = AX−1 + B, where III. P ROPOSED AES A RCHITECTURE
A is an 8 × 8 fixed matrix and B is an 8× 1 vector-matrix. During encryption, the data are organized conceptually in
The matrix A is defined in [4]. an 4 × 8 matrix of bytes (Figure 2(a)). This organization is
The inverse ByteSub called InvByteSub (InvS-box), operates used for data block sizes of 256 bits. For smaller data block
upon the bytes in the reverse order, that is, first an inverse sizes (128 or 192 bits), the leftmost columns of the matrix
affine block followed by the inverse operation, and the field are unused. The encryption data path processes a full 32-byte
polynomial is same as m(x). block in parallel. A complete round transformation executes
in four clock cycles.
In ShiftRow, the rows of the State are cyclically shifted over
different offsets (Table I [4]); row 0 is not shifted. The inverse The decryption structure can be derived by inverting the
ShiftRow performs the circular shift in the opposite direction. encryption structure directly. However, the sequence of the
Shiftrow implementations do not require FPGA resources as sub-units operations will be different from that in encryption
they can be implemented by rewiring. (see Fig. 1). This feature prohibits resource sharing between
encryptor and decryptor. To make maximum resource sharing
2 The terms ByteSub(s), SubBytes(s), S-box(es) carry same meaning through we need some technique and some sort of re-arrangement of
out this paper sub-units operations.
694
698
Plain Text / Cipher Text
256 Cipher Key / Round Key
Plain Text
Cipher Key Cipher Text 256
256 256 Clock Round Key Mux Data_Enable
256 256 Clock
256
MUX 256 REG 1 reg 1
Data_enable MUX ByteSub InvByteSub
REG 2 reg 2
Data_enable
REG1 REG1 Mux Mode

ByteSub
InvByteSub REG2 ShiftRow InvShiftRow
REG2
Inner−pipelining Mux Mode

Inner−pipelining
ShiftRow
InvShiftRow
Mix Column InvMix Column
MixColumn
InvMixColumn
Addkey_enable
Addkey_enable Mux Mode
REG3
REG3
AddRoundKey RoundKey Addkey_Enable
InvMixColumn REG3 Clock
RoundKey RoundKey
MixRoundKey
InvMix COlumn
MixRoundKey
Mux Mode
REG4 REG4
Last_round Last_round
256 Cipher Text 256 Plain Text
(a) (b)
Fig. 2. (a) Architecture of Encryption Datapath, (b) Architecture of

Decryption Datapath Clock
REG4 Last_Round
It can be observed from the operations involved in the Cipher Text / Plain Text
decryption transformations that, the inverse ShiftRow (In-
Fig. 3. Single Chip Encryptor/Decryptor Core Design
vShiftRow) and the InvByteSub can be exchanged without
affecting the decryption process. The InvMixcolumn can be
moved before the AddRoundKey, provided that the InvMix- the schematic illustrated in Figure 4. S-box or Inverse S-box
Column is applied to the roundkeys before it is added. Taking operation is selected by a user specified control signal called
these into consideration, an equivalent decryption structure can Mode.
be used (Figure 2(b)). In this figure, the MixRoundkey is the Mode
Input Mode
modified roundkey resulting from applying InvMixColumn to
8 8
the AddRoundKey. The equivalent decryption structure has the 8 Mux Finding A
(A )−1 8
REG1 Inverse REG2 Mux
S−box or
same sequence of transformations as that in the encryption Inverse S−box
structure, and thus, resource sharing between encryptor and

decryptor are possible. Figure 3 shows the block diagram of Fig. 4. Block diagram of 2-stage sub-pipelined S-box. The block “Finding
Inverse” is similar as described in [21]
Encryption/Decryption core into a single chip. The operation
either of encryption or decryption is selected by user defined
control signal called Mode. In the Figure 3, REG1 and REG2 A feature that can be exploited to gain higher throughput
are used for sub-pipelined S-box or InvS-box implementations is pipelining. Pipelining is a technique which subdivides
and is discussed in Section III-A. Hardware sharing of each the critical path by insertion of storing elements (flipflops).
sub-unit for Encryption/Decryption core implementation is Subdividing the S-box functionality into a number of stages
shown in the respective sections. is easy to accomplish since flip flops can be inserted nearly
anywhere when S-boxes are implemented with combinational
logic. Pipelining introduces latency but the additional clock
A. S-box/InvS-box Design
cycles are made up by an increased clock frequency. To reduce
The major computation inside the S-box is to find out the the critical path delay, let us insert buffers at appropriate points
multiplicative inverse of an element in the finite field GF(28 ). of S-box such that the delay is equally distributed. We have
The Inverse S-box (InvS-box) operates upon the bytes in the implemented 2-stage pipelined S-box (Figure 4).
reverse order, that is, first an inverse affine block followed by
B. Implementation of MixColumn/InvMixColumn Transforma-
the inverse operation . For encryptor/decryptor core design, the
tion
S-box or Inverse S-box can share multiplicative inversion unit.
In our design, the Finding Inverse unit is same as described Various architectures have been proposed for the implemen-
in [21]. S-box/InvS-box can be implemented according to tation of the MixColumn/InvMixColumn transformation [6],
695
699
[19]. The following observations are useful in the implemen- XTime blocks. For encryption operation only Part1 of the Fig.
tation [6]: 5 is required and combination of Part1 and part2 are used for
• If x ∈ GF(((22 )2 )2 ) then Fm (01) × x = x as the identity decryption operation.
element is mapped to the identity element in a homomor-
phism. C. Key Scheduler
• Fm (03) = Fm (02) + Fm (01). The RoundKeys are derived from the Cipher Key by means
By applying these technique, equation 1 can be rewritten as of the key schedule. This consists of two components: the Key
Expansion and the Round Key Selection. The basic principle
 is the following:

 X0, j = {02}16 (X0, j + X1, j ) + (X2, j + X3, j ) + X1, j

 X • The total number of Round Key bits is equal to the block
1, j = {02}16 (X1, j + X2, j ) + (X3, j + X0, j ) + X2, j
(3) length multiplied by the number of rounds plus 1. For
 X
 2, j = {02}16 (X2, j + X3, j ) + (X0, j + X1, j ) + X3, j

 example, for a block length of 128 bits and 10 rounds,
X3, j = {02}16 (X3, j + X0, j ) + (X1, j + X2, j ) + X0, j 1408 Round Key bits are needed.
• The Cipher Key is expanded into an Expanded Key.
Similarly, in the InvMixColumn transformation, equation 2
• RoundKeys are taken from this Expanded Key in the
can be rewritten as
 following way: the first Round Key consists of the first
 X0, j = ({02}16 (X0, j + X1, j ) + (X2, j + X3, j ) + X1, j )

 Nb words where Nb denotes the length of the data block

 +({02}16 ({04}16 (X0, j + X2, j )

 divided by 32, the second one of the following Nb words,

 +{04}16 (X1, j + X3, j )) + {04}16 (X0, j + X2, j ))

 and so on.


X1, j = {02}16 (X1, j + X2, j ) + (X3, j + X0, j ) + X2, j





 +({02}16 ({04}16 (X0, j + X2, j ) Algorithm 1 KeyExpansion(byte Key[4 ∗ Nk ] word W[Nb ∗

+{04}16 (X1, j + X3, j )) + {04}16 (X1, j + X3, j )) (Nr + 1)])

 X2, j =
 {02}16 (X2, j + X3, j ) + (X0, j + X1, j ) + X3, j for i = 0 to Nk − 1 do



 +({02}16 ({04}16 (X0, j + X2, j ) W[i] = (key[4*i], key[4*i+1], key[4*i+2], key[4*i+3])



 +{04} 16 (X1, j + X3, j )) + {04}16 (X0, j + X2, j )) end for



 X

= {02} (X3, j + X0, j ) + (X1, j + X2, j ) + X0, j for i = Nk to Nb ∗ (Nr + 1) do
 3, j
 16

 +({02}16 ({04}16 (X0, j + X2, j ) temp = W[i - 1]


+{04}16 (X1, j + X3, j )) + {04}16 (X1, j + X3, j )) if (i mod Nk is 0) then
(4) temp = SubByte(RotByte(temp)) ⊕ Rcon[i/Nk ]
end if
X0, j X1, j X2, j X3, j if (i mod Nk is 4) then
temp = SubByte(temp)
end if
W[i] = W[i - Nk ] ⊕ temp
end for
XTime XTime XTime XTime
The Key Expansion algorithm is given by Algorithm 1. The

Part1
Expanded Key is a linear array of 4-byte words and is denoted

by W [Nb ∗ (Nr + 1)] where Nr is the number of round. The
Part2
X4Time X4Time
values of Nr are determined from the Table II [4]. The first Nk
words contain the Cipher Key where Nk denotes the length of
X4Time
the key divided by 32. All other words are defined recursively

X0, j X1, j X2, j in terms of words with smaller indices. In this description,

X3, j SubByte(W) is a function that returns a 4-byte word in which

each byte is the result of applying the Rijndael S-box to the

X0, j X1, j X2, j X3, j byte at the corresponding position in the input word. The
function RotByte(W) returns a word in which the bytes are
Fig. 5. Block diagram of the MixColumn/InvMixColumn a cyclic permutation of those in its input such that the input
word (a,b,c,d) produces the output word (b,c,d,a).
Using substructure sharing, equations 3 and 4 can be imple- The KeyExpan unit is shown in Figure 7. W’s are the 32-bit
mented by the architecture illustrated in Fig. 5. The function of shift registers. R is a 256-bit register to store initial key. The
block XTime is to compute constant multiplication by {02}16 . architecture is almost same as the KeyExpan unit described
The block X4Time computes the constant multiplication of in [3]. P/P−1 block performs the operation of S-box/InvS-box
{04}16 and can be implemented by two serially concatenated with maximum hardware sharing as discussed in Section III-A.
696
700
TABLE II
N UMBERS OF ROUNDS (Nr ) AS A FUNCTION OF THE BLOCK AND KEY key. In case of decryption, it takes extra Nb × Nr clock cycles
Nr Nb =4 Nb =6 Nb =8
for the operation on the initial data input. From the operation
Nk =4 10 12 14 on the next data input, it requires the same clock cycles as for
Nk =6 12 12 14 encryption.
Nk =8 14 14 14
IV. E XPERIMENTAL R ESULTS
select The proposed design has been implemented on Xilinx
XCVe10002-8bg560 device and simulated by ModelSim8.1i.
256 Mux
256
KeyExpan Unit The performances (throughput and frequency) are shown at
256
Table III. Throughput (τ) is calculated as:
Buffer 32 Γ = (β × f )/(ψ), where β, f and ψ stand for block length,
Last_Round
256 clock frequency and number of clock cycles respectively.
<< << << Latency is defined as the time required to encrypt a single
W[Nk+ 7] W[N k+ 1] W[N k] block of data. It can be measured in terms of total clock cycles.
In our approach Nb × Nr clock cycles are needed to generate
a block of cipher text.
Fig. 6. The KeyScheduler
Γ = (Nb × 32 × f )/(Nb × Nr ) = (32 × f )/Nr . For example
(in FPGA), Γ = 32 × 135/14 = 432 Mb/s for 256 bits block
W[N k−1] 256 bits
k
W[N kj−8] length, as the clock frequency is 135 MHz and Nr is 14.
R
TABLE III
Data_enable
T HROUGHPUT IN FPGA (XCV1000 E -8)
M4 (8:1 Mux) Counter
3 Clk Clock Frequency = 135 MHz
32 W[N k− 1] W[N k− 4] W[N k− 6] W[N k− 8] Results shown using 2-Stage Sub-Pipelined S-box are shown
M1 >> >> >> >>
Throughput (Mb/s) Nb =4 Nb =6 Nb =8
(2:1) 32 Nk =4 432 360 308.6
Nk =6 360 360 308.6
32
T/ T/ M3 2
Nk =8 308.6 308.6 308.6
Initial_ Rcon T −1 T −1 (3:1 Mux)
key Gen C1
32
M2 A. Comparison with Other Designs
(3:1)
i 32 The simulated or synthesized results for a particular design
2 may vary if targeted devices or technology change. Different
W[M k ] modes of operation (CBC, OFB, ECB, etc.) or different
Fig. 7. The KeyExpan Unit
choices of operation (like encryption/decryption, different
block or key length) may lead to have different throughput
For decryption operation the KeyScheduler should generate last and hardware overheads of a particular design in a particular
RoundKey first using initial cipher key. This last RoundKey device. So it is quite difficult to have a fair comparison of
key may be the round keys of either 10, 12 or 14 rounds. the existing designs with our suggested design. Still, we have
For example, in case of 128-bit decryption operation, first explicitly mentioned the targeted devices, different modes of
round key should be the keys of W[43], W[42], W[41] and operation and different choices of operation of the existing
W[40]. And these keys are stored to generate all round keys, designs along with our suggested design to make relatively
i.e., W [39], · · · ,W [0]. In our design we store the round keys fair comparison.
W [43], · · · ,W [40] (in case of 128-bit block) using the Buffer In Table IV we illustrate the comparative analysis of differ-
shown in the Figure 6. This Buffer changes its values when ent existing AES architectures along with our suggested one.
Last round signal is reset to value 1. The Mux (Figure 6) The symbol (∗) means Mb/s per Slice. The implementation
selects either 256-bit initial keys or last round keys depending by Zhang at el. [19] (Table IV) seems to be the better one
upon the control signal select. The signal select is set to 1 in terms of throughput/slice. But it does not support the all
when Last round is set to 1 for decryption mode. One point combinations of key and block lengths. As it is pipelined
worth mentioning is that the status (either set or reset) of the architecture, to support 256-bit AES it should need 40 percent
Last round signal depends on another signal Mode which is more hardware (as there is 14 round for 256-bit block length).
user specified. All the other control signals (like Last round, It means throughput/area ratio becomes 0.57 (0.95 × 100/140)
Data enable, Key enable, etc.) are generated by a control unit. keeping throughput same.
The P or P−1 are same like S-box or InvS-box.
V. C ONCLUSIONS
The architecture takes Nb clock cycles to generate single
round key. A maximum of 8 × 14 + 8 = 120 clock cycles are In this paper we have presented a single chip encryp-
required to generate complete round key for 256 bits data and tor/decryptor of reconfigurable AES algorithm. The design
697
701
TABLE IV
P ERFORMANCES OF COMPARED CORES IN FPGA
τ stands for throughput. E, C, O, Cf sequentially stand for

electronic code book, cipher block chaining, output feedback and cipher feedback
Design Device Area τ τ E/D Data Length Key Length Mode
Slice RAM (Mb/s) /Slice (∗) (128/192/ (128/192/
256-bit) 256-bit)
[9] Spartan II 606 3 166 0.27 E 128 128 C,O
[10] Vertex-E 13416 0 3136 0.24 E 128 128 E
[16] Spartan II 264 2 20.2 0.1 E 128 128 C,O,Cf
[12] Vertex-E 4389 4 1019 0.27 E all all C,O
[19] XCV1000e-8 11022 0 21556 0.95 all 128 128 C,O
[15] XCV2000e 20300 100 6810 0.79 all 128 128 C,O
[20] XCV4000 1780 0 1000 0.56 all 128 128 C,O
[17] XC3S50-4 547 0 208 0.38 all 128 128 C,O
[3] XCV1000 520 0 120.74 0.23 E all all C,O
[21] SXC3S5000 1760 0 600 0.34 E all all C,O
Our (with sub-pipelined S-box)
0-Stage XCV1000e-8 480 0 320 0.67 E all all C,O,
2-Stage XCV1000e-8 510 0 432 0.84 E all all C,O
Encryptor/ XCV1000e-8 622 0 432 0.70 all all all C,O
Decryptor
exploits the theory of composite field arithmetic GF(((22 )2 )2 ) [9] P. Chodowiec and K. Gaj, “Very Compact FPGA Implementation of
to compute all nonlinear operations of S-boxes and thus the AES Algorithm”, in Cryptographic Hardware and Embeded System
(CHES 2003), LNCS Vol. 2779, pp.319-333.
optimizes the hardware complexity. It does reutilize precom- [10] N. Saqib, F. Henriquez, A. Perez, ”Two Approaches for a Single-
puted blocks. The same hardware is shared in encryption Chip FPGA Implementation of an Encryptor/Decryptor AES Core,” in
and decryption as much as possible. After exhaustive survey Cryptographic Hardware and Embeded System (CHES 2005), LNCS Vol.
2779, pp.319-333.
in literature we have seen that this is the first work of [11] R. Sever, A. Neslin, Y. Tekmen, M. Asker, ”A High Speed ASIC
single chip encryptor/decryptor core implementation of AES- Implementation of the Rijndael Algorithm,” in International Symphosium
Rijndael which can work under any possible (128, 192 and of Circuit and System (ISCAS-2004), IEEE Vol.2, pp.541-4.
[12] R. Sever, A. Neslin, Y. Tekmen, M. Asker, B. Okcan, “A High Speed
256-bit) key or data bit frames. FPGA Implementation of the Rijndael Algorithm” in Proc. EUROMICRO
Systems on Digital System Design (DSD 2004), IEEE Vol.2, pp.541-554.
[13] S. Mangard, M. Aigner and S. Dominikus, “A highly regular and scalable
Acknowledgement AES hardware architecture”, IEEE Trans. Comput., 2003, Vol. 52 (4),
pp.483-491
I would like to give special thanks to Dr. Debdeep Mukhopad- [14] N. Praustaller, S. Mangard, S, Dominikus, J. Wolkerstorfer, “Efficient
hyay, Assistant Professor at IIT Madras, for constant source AES implementation on ASIC’s and FPGA’s”, in Fourth Workshop on the
Advanced Encryption Standard (AES 2004), LNCS Vol. 3373, pp.98-112.
of inspiration and suggestions behind this work. [15] G. Saggese, A. Mazzeo, N. Mazocca, A. Strollo, “An FPGA based
performance analysis of the unrolling, tiling and pipelining of the AES
R EFERENCES algorithm” in Field Programmable Logic (FPL 2003), pp.292-302, Por-
tugal, 2003.
[1] National Institute of Standards and Technology (NIST). [16] T. Good, M. Benaissa, “AES on FPGA from the Fastest to the Smallest”,
FIPS-197: Advanced Encryption Standard, November 2001. in Cryptographic Hardware and Embeded System (CHES 2005), LNCS
http://www.itl.nist.gov/fipspubs/ Vol. 3659, pp.427-440, Springer 2005.
[2] Guido Bertoni et al, “Efficient Software Implementation of AES on 32- [17] N. Pramstaller, J. Wolkerstorfer, ”A Universal and Efficient AES Co-
bits Platforms”, in Cryptographic Hardware and Embeded System (CHES processor for Field Prograble Logic Arrays,” in Field Prograble Logic
2002), Revised Papers, LNCS Vol. 2523, pp.159-171. (FPL 2004), LNCS Vol. 3203, pp.565-574, Springer 2004.
[3] M. Alam, S. Ray, D. Mukhopadhyay, S. Ghosh, D. Roychowdhury and [18] I. Verbauwhede, P. Schaumont, H. Kuo, “Design and Performance
I. Sengupta: “An Area Optimized Reconfigurable Encryptor for AES- Testing of a 2.29-GB/s Rijndael Processor”, in IEEE Journal of Solid
Rijndael”, in the proceeding of Design, Automation and Test in Europe State Circuit, Vol. 38, No. 3, pp.569-572, March 2003.
(DATE 2007), pp.1116-1121, April 16-21, Nice, France. [19] X. Zhang, K. Parhi, “igh-Speed VLSI Architectures for the AES
[4] J.Daemen and V.Rijmen, “The Design of Rijndael: AES-The Advanced Algorithm”, in IEEE Transactions on Very Large Scale Integration (VLSI)
Encryption Standard”, Springer-Verlag, Berlin, New York, 2002. Systems, Vol. 12, No. 3, pp.957-967, 2004
[5] D. Mukhopadhyay, D. RoyChowdhury, “An Efficient End to End Design [20] J. Zambreno, D. Nguyen, and A. Choudhary, “Exploring Area/Delay
of Rijndael Cryptosystem in 0.18µ CMOS”, The 18th International Tradeoffs in an AES FPGA Implementation” in Field Programmable
Conference on VLSI Design and The 4th International Conference on Logic (FPL 2004), LNCS Vol. 3203, pp.575-585.
Embedded Systems (VLSID 2005), pp.405-410, Kolkata. [21] M. Alam, S. Ghosh, D. Mukhopadhyay, D. Roychowdhury and I.
[6] A. Rudra, P. K. Dubey, C. S. Jutla, V. Kumar, J. R. Rao and P. Sengupta: “Latency Optimized AES-Rijndael with Flexible Mode of
Rohatgi, “fficient Rijndael Encryption Implementation with Composite Operation”, 11th IEEE VLSI Design And Test Symposium (VDAT 2007),
Field Arithmetic” in Cryptographic Hardware and Embeded System pp.413-420, August 8-11, Kolkata, India.
(CHES 2001), LNCS Vol. 2162, pp.171-184.
[7] M. Feldhofer, J. Wolkerstorfer, J. Rijmen, ”AES implementation on a
grain of sand,” in IEE Procidings in Information Security, July, 2005.
[8] A. Satoh, S. Morioka, K. Takona and S. Munetoh, “A Compact Rijn-
dael Hardware Architecture with S-Box optimization”, in Advances in
Cryptography-ASIACRYPT 2001, LNCS Vol. 2248, pp.239-254.
698
702

AES Paper

Transféré par

Informations du document

Description originale:

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

AES Paper

Transféré par

Droits d'auteur :

Formats disponibles

21st International Conference on VLSI Design

Single Chip Encryptor/Decryptor Core

1063-9667/08 $25.00 © 2008 IEEE 697

REG1 REG1 Mux Mode

Inner−pipelining Mux Mode

Fig. 2. (a) Architecture of Encryption Datapath, (b) Architecture of

structure, and thus, resource sharing between encryptor and

The Key Expansion algorithm is given by Algorithm 1. The

Expanded Key is a linear array of 4-byte words and is denoted

τ stands for throughput. E, C, O, Cf sequentially stand for

Vous aimerez peut-être aussi