Vous êtes sur la page 1sur 107


1.1 Introduction
In this research work, we propose the design and implementation of a real-time FPGA
based application, which demonstrates the creation of real-time process tasks in FPGA
systems for successful real-time communication between multiple FPGA systems. We have
chosen the RSA based encryption and decryption algorithm for this implementation, as
security is one of the most important need for data communication. The recent development
of Field-Programmable Gate Array (FPGA) architectures, with soft core (Micro Blaze) and
hard core (PowerPC) processors, embedded memories and IP cores, offers the potential for
high computing power. Presently FPGAs are considered as a major platform for high
performance embedded applications as it provides the opportunity for reconfiguration as well
as good clock speed and design resources.
As the complexities in the embedded applications increase, use of an operating system
brings in a lot of advantages. In present day application scenarios most embedded systems
have real-time requirements that demand the use of Real-time operating systems (RTOS),
which creates a suitable environment for real time applications to be designed and expanded
easily. In an RTOS the design process is simplified by splitting the application code into
separate tasks and then the scheduler executes them according to a specific schedule, meeting
the real-time deadline. In this research work, we propose the design and implementation of a
real-time FPGA based application, which demonstrates the creation of real-time process tasks
in FPGA systems for successful real-time communication between multiple FPGA systems.
We have chosen the RSA based encryption and decryption algorithm for this
implementation, as security is one of the most important need for data communication. At
first we demonstrate the real time execution of multiple process tasks in a single FPGA
system for the encryption and decryption of data. Next we describe the most challenging part
of our work, where we establish the real time communication between two FPGA systems,
each running the encryption engine and decryption engine respectively and communicating
with one another via an RS232 communication link. The results show that our design is better
in terms of execution speed in comparison with the existing research works.

At first we demonstrate the real time execution of multiple process tasks in a single
FPGA system for the encryption and decryption of data. Next we describe the most
challenging part of our work, where we establish the real time communication between two
FPGA systems, each running the encryption engine and decryption engine respectively and
communicating with one another via an RS232 communication link. The results show that
our design is better in terms of execution speed in comparison with the existing research
works. It achieves the real time secured information between the systems implemented in
multiple FPGAs by using RTOS (Real Time Operating System). This information sharing is
based on RSA algorithm (encryption and decryption). Very large Scale Integrations in the
recent trends of design. Network Security in the Techniques of Very large Scale Integrations
Plays Very Vital Role. FPGA, logic circuits, operating systems (computers), Micro Blaze
FPGA architectures, embedded memory, multiple FPGA systems and soft core processors.
It Design of The Present System using microcontroller with RTOS. So the system
operation speed will be less when compared to the FPGA. Then the information sending
between the systems is not secured. The proposed technology has been implemented over
here is based on RSA algorithm (encryption and decryption). This process is communicated
between multiple FPGAs in multitasking using RTOS (real time communication system)
with high execution speed compared to the existing system.
To demonstrate a 128-bit Advanced Encryption Standard (AES) both symmetric key
encryption and decryption algorithm by developing suitable hardware and software design on
Xilinx Spartan- 3EDK (XC3S200) device, the implementation has been tested successfully
The system is optimized in terms of execution speed and hardware utilization. It design using
application is Security purposes, Medical field. Network Security, online bank security. It
develop similar approaches for the implementation of AES, we can implement double AES
for more security and will less encryption speed .
In todays world most of the communication is done using electronic media. Data
Security plays a vital role in such communication. Hence, there is a need to protect data from
malicious attacks. Cryptography is the science of secret codes, enabling the confidentiality of
communication through an insecure channel. It protects against unauthorized parties by
preventing unauthorized alteration of use. Generally speaking, it uses a cryptographic system
to transform a plaintext into a cipher text, using most of the time a key.

1.2 Aim of the project

The aim of this project is to communicate the data secretly using AES Algorithm i.e,
we first send the data(plain text) which is of 128 bits and the key which can be of 128 or 192
or 256 bits into the encryption process .The output of this process will be cipher text. This
cipher text is then fed into the decryption process and then the data(plain text) is got as
output, since we add the key and shuffle the data it is very hard for the unknown person to
find out the original data. Since for each key there will be a change in the cipher text and so
the person has to know the key in order to find out the original data.
This project is to give the security for the Data. The Data which is transmitted and
received by the sender and receiver. When the data encrypted and decrypted at that instant the
hacker may hack the data. To avoid these types of the problems we use some security
Algorithms like AES, SHA-0, SHA-1, SHA-2, and RSA. To achieve this aim the following
task are carried out.
To achieve the real time secured information between the systems implemented in
multiple FPGAs by using RTOS (Real Time Operating System).
This information sharing is based on RSA algorithm (encryption and decryption).
The Object is to develop Low Power, security, less time and Expensive.
System C and Synthesis results are generated for observing in the Xilinx platform
Each Block of the Design is written in System C code for each module.
Dump the code on a FPGA kit in order to see the output. But the Accepts only 8 bits
we can able to see the Red LEDs, which are only in active High mode.

1.3 Motivation of the project

Message authentication codes (MACs) are much like cryptographic hash functions,
except that a secret key is used to authenticate the hash value.
The Keys are Public Key and Private Key. Applications of cryptography include ATM
Cards, Computer Applications and Electronic commerce.
Cryptography is the study of hiding information. Modern Cryptography intersects the
disciplines of mathematics, computer science and engineering.
Securities often require that data be kept safe from unauthorized access. And the best
line of defense is physical security (placing the machine to be protected behind
physical walls).

However, physical security is not always an option(due to cost and/ or efficiency

Instead, most computers are interconnected with each other openly, there by exposing
them and the communication channels that they use.

1.4 Literature Survey

For realtime applications, there are several factors (time, cost, power) that are
moving security considerations from a function centric perspective into a system architecture
(hardware/software) design issue. Advanced Encryption Standard (AES) is used nowadays
extensively in many network and multimedia applications to address security issues. The
AES algorithm specifies three key sizes: 128, 192 and 256 bits offering different levels of
security. To deal with the amount of application and intensive computation given by security
mechanisms, we define and develop a QoSS (Quality of Security Service) model for
reconfigurable AES processor. QoSS has been designed and implemented to achieve a
flexible tradeoff between overheads caused by security services and system performance.
The proposed architecture can provide up to 12 AES block cipher schemes within a
reasonable hardware cost. We envisage a security vector in a fully functional QoSS request to
include levels of service for the range of security service and mechanisms. Our unified
hardware can run both the original AES algorithm and the extended AES algorithm (QoSS
AES). A novel onthefly AES encryption/ decryption design is also proposed for 128, 192,
and 256bit keys.

1.5. Design and Implementation

Hardware implementation mainly deals with implementation of AES algorithm on a
single-chip FPGA using pipelined approach, area throughput trade of f or an ASIC
implementation in a 0:18um CMOS technology, crypto-memory and SRAM architecture,
high speed non-pipelined FPGA, a fully sub-pipelined encryptor to achieve a throughput of
21.56 Gbps on Xilinx device, a proto type chip implemented using 0:35_m CMOS
Software implementation deals with fast implementation of algorithm in smart cards,
PDA secure communication with Java on optimum construction of composite fields for the
AES, evaluation of different implementations for high end servers, implementation
approaches for AES algorithm in C, C++ and MATLAB, security protocol for automobile
remote key less system.

The algorithm is composed of three main parts: Cipher, Inverse Cipher and Key
Expansion. Cipher converts data, commonly known as plaintext, to an unintelligible form
called cipher. Key Expansion generates a key schedule that is used in the Cipher and the
Inverse Cipher procedure. Cipher and Inverse Cipher are composed of specific number of
rounds For the AES algorithm; the number of rounds to be performed during the execution of
the algorithm is dependent on the key length. AES operates on a 4x4 array of bytes (referred
to as state). The algorithm consists of four different simple operations. These operations

Sub Bytes
Shift Rows
Mix Columns
Add Round Key

The Encryption and decryption process consists of a number of different

transformations applied consecutively over the data block bits, in a fixed number of
iterations, called rounds. The number of rounds depends on the length of the key used for the
encryption process. Block cipher with block size of 128 bit organized as a 4x4 byte matrix
(State) Key size either 128, 192, or 256 bit 10, 12, or 14 similar rounds with 4 round

1.6. Application of the project

This section presents the Application experimental results that are carried out to
evaluate the performance of the QoSSAES processor in the case of an MPEG4 decoder. A
comparative study has been done between the proposed QoSSAES processor and the
conventional video encryption schemes (Sub band Shuffle, Block Shuffle). The results
demonstrate that the QoSSAES processor is well suited to provide high security with very
low latency.
The application works in following way:
1. The user opens the application and authenticates using pattern lock.
2. User can either type new message or reply to an existing message.
3. If new message is selected, user enters the message and presses encrypt button after
inserting the recipients name. The user has to enter a cipher key before the message is sent.
The cipher key is auto-generated if the user does not enter one.
4. If the user selects to reply to an existing message, he first decrypts the message by long
pressing the message and then types in the reply. The user is asked to enter cipher key before
the message is sent.

5. Once the cipher key is entered, the message is successfully sent and is shown in encrypted
form in the thread.
6. All messages in thread are displayed in encrypted format to both sender and receiver.
7. Long pressing the thread wills pop-up an action box wherein the user can delete, view
contact details or call the recipient.
8. Long pressing any message in the thread will pop-up an action box wherein the user can
delete, forward or decrypt the message.
9. The cipher key is randomly generated if the user does not enter it.
10. Various settings such as notification settings, Display settings, Encryption settings, Tone
settings, Personalization settings are available for the users convenience.
11. This application is developed on Android platform. The reason behind using Android
platform is similar to other operating systems for mobile devices; Android OS supports
connectivity, messaging, language support, media support, Bluetooth etc. The main feature of
android would be open source technology and JAVA support. It also supports multitasking,
multi touch, Wi-Fi, tethering, 3G services, and very importantly security and privacy.

1.7 Organization of the sis

The complete dissertation work is divided into seven chapters.
The second chapter deals with the Description of the project end with conclusion.
The Third chapter presents the Design Analysis and end with conclusion.
The Fourth chapter deals with the Hardware Implementation of the project end with
The Fifth chapter Mathematical Analysis of the project, end with conclusion.
The sixth chapter output verification of the project, end with conclusion.
The seventh chapter deals with the conclusion of the project followed by future scope.
After completion of the 7 chapters this thesis deals with the references require for the design
and implementation of the project.
Finally this thesis deals with the Appendices. The appendices deal with the code.

2.1 Introduction

The secret-key is assumed to be safe in hands and be known to two individual

communicating parties namely, the Sender and the Receiver alone. If we further assume that
the data communication is duplex, then each side needs to have their own FPGA based AES
processor for encryption and decryption. The process of the data communication from Sender
to the Receiver is as follows:
1. The Sender configures his FPGA processor by the Encryption Module with the known
Plain-text and the Cipher-key using the Configuration tool (say SANDS Software v1.1)
and encrypts the Plain-text to obtain the Cipher-text.
2. Then the output buffer collects and sends the Cipher-text over the communication
channel. Every client in between can see the cipher-text, but none other than the Receiver
having the Secret-key can make use of it.
3. Then the Receiver having configured his FPGA processor with the Decryption Module
can decrypt the Cipher-text to obtain the Inverse Cipher i.e., the original Plain-text.
Alternatively, if the current Receiver wants to send the sensitive data to the initial
Sender, then the above processes repeat with the roles of the Sender and the Receiver
mutually interchanged. Thus, the aim of the project, the FPGA implementation of secure data
communication using aes algorithm can practically be realized very effectively through the
efforts of our project work, by employing two FPGA processors, one at each side of the data
transfer with the condition that both parties must have known the Cipher-key used.

Fig 2.1. General Block diagram of AES

2.2. Preface

The following document provides a detailed and easy to understand explanation of the
implementation of the AES (RIJNDAEL) encryption algorithm. The purpose of this paper is
to give developers with little or no knowledge of cryptography the ability to implement AES.

2.3. Terminology
There are terms that are frequently used throughout this paper that need to be
Block: AES is a block cipher. This means that the number of bytes that it encrypts is fixed.
AES can currently encrypt blocks of 16 bytes at a time; no other block sizes are presently a
part of the AES standard. If the bytes being encrypted are larger than the specified block then
AES is executed concurrently. This also means that AES has to encrypt a minimum of 16
bytes. If the plain text is smaller than 16 bytes then it must be padded. Simply said the block
is a reference to the bytes that are processed by the algorithm.
State: Defines the current condition (state) of the block. That is the block of bytes that are
currently being worked on. The state starts off being equal to the block, however it changes
as each round of the algorithms executes. Plainly said this is the block in progress.
XOR:Refers to the bitwise operator Exclusive Or. XOR operates on the individual bits in a
byte in the following way:
0 XOR 0 = 0
1 XOR 0 = 1
1 XOR 1 = 0
0 XOR 1 = 1
For example the Hex digits D4 XOR FF
XOR 11111111
= 00101011 (Hex 2B)
Another interesting property of the XOR operator is that it is reversible.
So Hex 2B XOR FF = D4.
Table.2.1: Most programming languages have the XOR operator built in.

HEX: Defines a notation of numbers in base 16. This simply means that; the highest number
that can be represented in a single digit is 15, rather than the usual 9 in the decimal (base 10)
Table 2.2 Hex to Decimal table:

















For example using the above table HEX D4 = DEC 212 All of the tables and
examples in this paper are written in HEX. The reason for this is that a single digit of Hex
represents exactly 4 bits. This means that a single byte can always be represented by 2 HEX
digits. This also makes it very useful in creating lookup tables where each HEX digit can
represent a table index.

2.4. AES Brief History

Effective May 26, 2002 the National Institute of Science and Technology (NIST) has
selected a block cipher called RIJNDAEL (named after its creators Vincent Rijmen and Joan
Daemen) as the symmetric key encryption algorithm to be used to encrypt sensitive but
unclassified American federal information. RIJNDAEL was originally a variable block (16,
24, 32 bytes) and variable key size (16, 24, 32 bytes) encryption algorithm. NIST has
however decided to define AES with a block size of 16 bytes while keeping their options
open to future changes.

2.5. AES Algorithm

AES is an iterated symmetric block cipher, which means that:
o AES works by repeating the same defined steps multiple times.
o AES is a secret key encryption algorithm.
o AES operates on a fixed number of bytes.

Fig 2.1: Advanced Encryption Algorithm flow

AES as well as most encryption algorithms is reversible. This means that almost the
same steps are performed to complete both encryption and decryption in reverse order. The
AES algorithm operates on bytes, which makes it simpler to implement and explain. This key
is expanded into individual sub keys, a sub keys for each operation round. This process is
called KEY EXPANSION, which is described at the end of this document.
For both its Cipher and Inverse Cipher, the AES algorithm uses a round function that
is composed of four different byte-oriented transformations: 1) Byte substitution using a
substitution table (S-box), 2) Shifting rows of the State array by different offsets, 3) Mixing
the data within each column of the State array, and 4) Adding a Round Key to the State. As
mentioned before AES is an iterated block cipher. All that means is that the same operations
are performed many times on a fixed number of bytes.

These operations can easily be broken down to the following functions:




An iteration of the above steps is called a round. The amount of rounds of the

algorithm depends on the key size.

Table 2.3: key size



The only exception being that in the last round the Mix Column step is not
performed, to make the algorithm reversible during decryption.

2.6. Encryption and Decryption

Data that can be read and understood without any special measures is called plaintext
or clear text. The method of disguising plaintext in such a way as to hide its substance is
called encryption. Encrypting plaintext results in unreadable gibberish called cipher text. You
use encryption to ensure that information is hidden from anyone for whom it is not intended,
even those who can see the encrypted data. The process of reverting cipher text to its original
plaintext is called decryption.

2.7. Encryption
Table 2.4 :AES encryption cipher using a 32 byte key.



Add Round Key(State)

Add Round Key(Mix Column(Shift Row(Byte Sub(State))))
Add Round Key(Mix Column(Shift Row(Byte Sub(State))))
Add Round Key(Mix Column(Shift Row(Byte Sub(State))))
Add Round Key(Mix Column(Shift Row(Byte Sub(State))))
Add Round Key(Mix Column(Shift Row(Byte Sub(State))))
Add Round Key(Mix Column(Shift Row(Byte Sub(State))))
Add Round Key(Mix Column(Shift Row(Byte Sub(State))))
Add Round Key(Mix Column(Shift Row(Byte Sub(State))))
Add Round Key(Mix Column(Shift Row(Byte Sub(State))))
Add Round Key(Mix Column(Shift Row(Byte Sub(State))))
Add Round Key(Mix Column(Shift Row(Byte Sub(State))))
Add Round Key(Mix Column(Shift Row(Byte Sub(State))))
Ad Round Key(Mix Column(Shift Row(Byte Sub(State))))
Ad Round Key(Shift Row(Byte Sub(State)))

2.8. Decryption
Table2.5: AES decryption cipher using a 32 byte key.

Add Round Key(State)
Mix Column(Add Round Key(Byte Sub(Shift Row(State))))
Mix Column(Add Round Key(Byte Sub(Shift Row(State))))
Mix Column(Add Round Key(Byte Sub(Shift Row(State))))
Mix Column(Add Round Key(Byte Sub(Shift Row(State))))
Mix Column(Add Round Key(Byte Sub(Shift Row(State))))
Mix Column(Add Round Key(Byte Sub(Shift Row(State))))
Mix Column(Add Round Key(Byte Sub(Shift Row(State))))
Mix Column(Add Round Key(Byte Sub(Shift Row(State))))
Mix Column(Add Round Key(Byte Sub(Shift Row(State))))
Mix Column(Add Round Key(Byte Sub(Shift Row(State))))
Mix Column(Add Round Key(Byte Sub(Shift Row(State))))
Mix Column(Add Round Key(Byte Sub(Shift Row(State))))
Mix Column(Add Round Key(Byte Sub(Shift Row(State))))
Add Round Key(Byte Sub(Shift Row(State)))

2.9. AES Cipher Functions

2.9.1. Add Round Key
Each of the 16 bytes of the state is XORed against each of the 16 bytes of a portion of
the expanded key for the current round. The Expanded Key bytes are never reused. So once
the first 16 bytes are XORed against the first 16 bytes of the expanded key then the expanded
key bytes 1-16 are never used again. The next time the Add Round Key function is called
bytes 17-32 are XORed against the state.
The first time Add Round Key gets executed

Exp Key
































The second time Add Round Key is executed





















Exp Key






















And so on for each round of execution. During decryption this procedure is reversed.
Therefore the state is first XORed against the last 16 bytes of the expanded key, then the
second last 16 bytes and so on. The method for deriving the expanded key is described in
section 6.0

2.9.2. Sub Byte

During encryption each value of the state is replaced with the corresponding SBOX
Table 2.6.AES S-Box encryption Lookup Table

For example HEX 19 would get replaced with HEX D4

During decryption each value in the state is replaced with the corresponding inverse of the
Table 2.7.AES S-Box decryption Lookup Table


For example HEX D4 would get replaced with HEX 19

2.9.3. Shift Row
Arranges the state in a matrix and then performs a circular shift for each row. This is
not a bit wise shift. The circular shift just moves each byte one space over. A byte that was in
the second position may end up in the third position after the shift. The circular part of it
specifies that the byte in the last position shifted one space will end up in the first position in
the same row. In Detail: The state is arranged in a 4x4 matrix (square).
The confusing part is that the matrix is formed vertically but shifted horizontally. So the first
4 bytes of the state will form the first bytes in each row.
So bytes 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Will form a matrix:








Each row is then moved over (shifted) 1, 2 or 3 spaces over to the right, depending on
the row of the state. First row is never shifted
Row1 0
Row2 1
Row3 2
Row4 3


The following table shows how the individual bytes are first arranged in the table and
then moved over (shifted). Blocks 16 bytes long:


1 5 9 13

1 5 9 13

2 6 10 14

6 10 14 2

3 7 11 15

11 15 3 7

4 8 12 16

16 4 8 12

During decryption the same process is reversed and all rows are shifted to the left:


1 5 9 13

1 5 9 13

2 6 10 14

14 2 6 10

3 7 11 15

11 15 3 7

4 8 12 16

8 12 16 4

2.9.4. Mix Column

This is perhaps the hardest step to both understand and explain. There are two parts to
this step. The first will explain which parts of the state are multiplied against which parts of
the matrix. The second will explain how this multiplication is implemented over whats called
a Galois Field.

2.10. Matrix Multiplication

The state is arranged into a 4 row table (as described in the Shift Row function). The
multiplication is performed one column at a time (4 bytes). Each value in the column is
eventually multiplied against every value of the matrix (16 total multiplications). The results
of these multiplications are XORed together to produce only 4 result bytes for the next state.
Therefore 4 bytes input, 16 multiplications 12 XORs and 4 bytes output. The multiplication is
performed one matrix row at a time against each value of a state column.


Multiplication Matrix

The first result byte is calculated by multiplying 4 values of the state column against 4
values of the first row of the matrix. The result of each multiplication is then XORed to
produce 1 Byte.
b1 = (b1 * 2) XOR (b2*3) XOR (b3*1) XOR (b4*1)
The second result byte is calculated by multiplying the same 4 values of the state
column against 4 values of the second row of the matrix. The result of each multiplication is
then XORed to produce 1 Byte.
b2 = (b1 * 1) XOR (b2*2) XOR (b3*3) XOR (b4*1)
The third result byte is calculated by multiplying the same 4 values of the state
column against 4 values of the third row of the matrix. The result of each multiplication is
XORed to produce 1 Byte.
b3 = (b1 * 1) XOR (b2*1) XOR (b3*2) XOR (b4*3)
The fourth result byte is calculated by multiplying the same 4 values of the state
column against 4 values of the fourth row of the matrix. The result of each multiplication is
XORed to produce 1 Byte.
b4 = (b1 * 3) XOR (b2*1) XOR (b3*1) XOR (b4*2)
This procedure is repeated again with the next column of the state, until there are no
more state columns. Putting it all together: The first column will include state bytes 1-4 and
will be multiplied against the matrix in the following manner:
b1 = (b1 * 2) XOR (b2*3) XOR (b3*1) XOR (b4*1)
b2 = (b1 * 1) XOR (b2*2) XOR (b3*3) XOR (b4*1)
b3 = (b1 * 1) XOR (b2*1) XOR (b3*2) XOR (b4*3)
b4 = (b1 * 3) XOR (b2*1) XOR (b3*1) XOR (b4*2)
(b1= specifies the first byte of the state)

The second column will be multiplied against the second row of the matrix in the
following manner.
b5 = (b5 * 2) XOR (b6*3) XOR (b7*1) XOR (b8*1)
b6 = (b5 * 1) XOR (b6*2) XOR (b7*3) XOR (b8*1)
b7 = (b5 * 1) XOR (b6*1) XOR (b7*2) XOR (b8*3)
b8 = (b5 * 3) XOR (b6*1) XOR (b7*1) XOR (b8*2)
And so on until all columns of the state are exhausted.

2.10.1 Mix Column Inverse

During decryption the Mix Column the multiplication matrix is changed to:

Other than the change to the matrix table the function performs the same steps as
during encryption.
2.10.2 Mix Column Example During Encryption
The following examples are denoted in HEX.
Input = D4 BF 5D 30

= (D4 * 2) XOR (BF*3) XOR (5D*1) XOR (30*1)

= E(L(D4) + L(02)) XOR E(L(BF) + L(03)) XOR 5D XOR 30
= E(41 + 19) XOR E(9D + 01) XOR 5D XOR 30
= E(5A) XOR E(9E) XOR 5D XOR 30
= 04


= (D4 * 1) XOR (BF*2) XOR (5D*3) XOR (30*1)

= D4 XOR E(L(BF)+L(02)) XOR E(L(5D)+L(03)) XOR 30
= D4 XOR E(9D+19) XOR E(88+01) XOR 30
= D4 XOR E(B6) XOR E(89) XOR 30
= D4 XOR 65 XOR E7 XOR 30
= 66



= (D4 * 1) XOR (BF*1) XOR (5D*2) XOR (30*3)

= D4 XOR BF XOR E(L(5D)+L(02)) XOR E(L(30)+L(03))
= D4 XOR BF XOR E(88+19) XOR E(65+01)
= D4 XOR BF XOR E(A1) XOR E(66)
= 81


= (D4 * 3) XOR (BF*1) XOR (5D*1) XOR (30*2)

= E(L(D4)+L(3)) XOR BF XOR 5D XOR E(L(30)+L(02))
= E(41+01) XOR BF XOR 5D XOR E(65+19)
= E(42) XOR BF XOR 5D XOR E(7E)
= 67 XOR BF XOR 5D XOR 60
= E5

2.10.3. Mix Column Example during Decryption

Input 04 66 81 E5

= (04 * 0E) XOR (66*0B) XOR (81*0D) XOR (E5*09)

= E(L(04)+L(0E)) XOR E(L(66)+L(0B)) XOR E(L(81)+L(0D)) XOR
= E(32+DF) XOR E(1E+68) XOR E(58+EE) XOR E(20+C7)
= E(111-FF) XOR E(86) XOR E(146-FF) XOR E(E7)
= E(12) XOR E(86) XOR E(47) XOR E(E7)
= 38 XOR B7 XOR D7 XOR 8C
= D4


= (04 * 09) XOR (66*0E) XOR (81*0B) XOR (E5*0D)

= E(L(04)+L(09)) XOR E(L(66)+L(0E)) XOR E(L(81)+L(0B)) XOR
= E(32+C7) XOR E(1E+DF) XOR E(58+68) XOR E(20+ EE)
= E(F9) XOR E(FD) XOR E(C0) XOR E(10E-FF)
= E(F9) XOR E(FD) XOR E(C0) XOR E(0F)
= 24 XOR 52 XOR FC XOR 35= BF


= (04 * 0D) XOR (66*09) XOR (81*0E) XOR (E5*0B)

= E(L(04)+L(0D)) XOR E(L(66)+L(09) XOR E(L(81)+L(0E)) XOR
= E(32+EE) XOR E(1E+C7) XOR E(58+DF) XOR E(20+68)
= E(120-FF) XOR E(E5) XOR E(137-FF) XOR E(88)
= E(21) XOR E(E5) XOR E(38) XOR E(88)


= 34 XOR 7B XOR 4F XOR 5D

= 5D

= (04 * 0B) XOR (66*0D) XOR (81*09) XOR (E5*0E)

= E(L(04)+L(0B)) XOR E(L(66)+L(0D)) XOR E(L(81)+L(09)) XOR
= E(32+68) XOR E(1E+EE) XOR E(58+C7) XOR E(20+DF)
= E(9A) XOR E(10C-FF) XOR E(11F-FF) XOR E(FF)
= E(9A) XOR E(0D) XOR E(20) XOR E(FF)
= 2C XOR F8 XOR E5 XOR 01
= 30

2.11. AES Key Expansion

Prior to encryption or decryption the key must be expanded. The expanded key is
used in the Add Round Key function defined above. Each time the Add Round Key function
is called a different part of the expanded key is XORed against the state. In order for this to
work the Expanded Key must be large enough so that it can provide key material for every
time the Add Round Key function is executed. The Add Round Key function gets called for
each round as well as one extra time at the beginning of the algorithm. Therefore the size of
the expanded key will always be equal to: 16 * (number of rounds + 1).
The 16 in the above function is actually the size of the block in bytes. This provides
key material for every byte in the block during every round +1
Table 2.8.key size, block size, expanded key



Since the key size is much smaller than the size of the sub keys, the key is actually
stretcheout to provide enough key space for the algorithm. The key expansion routine
executes a maximum of 4 consecutive functions. These functions are:






An iteration of the above steps is called a round. The amount of rounds of the key
expansion algorithm depends on the key size.
Table 2.9.key expansion algorithm depends on the key size.


Expansion Expanded
Bytes /

Key Copy


The first bytes of the expanded key are always equal to the key. If the key is 16 bytes
long the first 16 bytes of the expanded key will be the same as the original key. If the key size
is 32 bytes then the first 32 bytes of the expanded key will be the same as the original key.
Each round adds 4 bytes to the Expanded Key. With the exception of the first rounds each
round also takes the previous rounds 4 bytes as input operates and returns 4 bytes. One more
important note is that not all of the 4 functions are always called in each round. The algorithm
only calls all 4 of the functions every:
4 Rounds for a 16 byte Key
6 Rounds for a 24 byte Key
8 Rounds for a 32 byte Key
The rest of the rounds only a K function result is XORed with the result of the EK
function. There is an exception of this rule where if the key is 32 bytes long an additional call
to the Sub Word function is called every 8 rounds starting on the 13th round.

2.12. AES Key Expansion Functions

Rot Word (4 bytes)

This does a circular shift on 4 bytes similar to the Shift Row Function.
1,2,3,4 to 2,3,4,1
Sub Word (4 bytes): This step applies the S-box value substitution as described in
Bytes Sub: Function to each of the 4 bytes in the argument.


Rcon((Round/(KeySize/4))-1): This function returns a 4 byte value based on the

following table

= 01000000
= 02000000
= 04000000
= 08000000
= 10000000
= 20000000
= 40000000
= 80000000
= 1B000000
= 36000000
= 6C000000
= D8000000
= AB000000
= 4D000000
= 9A000000

For example for a 16 byte key Rcon is first called in the 4th round: (4/(16/4))-1=0
In this case Rcon will return : 01000000
For a 24 byte key Rcon is first called in the 6th round: (6/(24/4))-1=0
In this case Rcon will also return : 01000000

EK(Offset): EK function returns 4 bytes of the Expanded Key after the specified
offset. For example if offset is 0 then EK will return bytes 0,1,2,3 of the Expanded
K(Offset): K function returns 4 bytes of the Key after the specified offset. For
example if offset is 0 then K will return bytes 0,1,2,3 of the Expanded Key
2.13. AES Key Expansion Algorithm
Since the expansion algorithm changes depending on the length of the key, it is
extremely difficult to explain in writing. This is why the explanation of the Key Expansion
Algorithm is provided in a table format. There are 3 tables, one for each AES key sizes (16,
24, and 32). Each table has 3 fields:
Table 2.10. Three fields of AES key sizes

A counter representing the current step in the key
algorithm, think of this as a loop counter
Expanded key bytes effected by the result of the
Expanded Key Bytes function(s)
The function(s) that will return the 4 bytes written to the
effected expanded key bytes

2.13.1. 32 byte Key Expansion

Each round (except rounds 0, 1, 2, 3, 4, 5, 6 and 7) will take the result of the
previous round and produce a 4 byte result for the current round. Notice the first 8 rounds
simply copy the total of 32 bytes of the key.
Table 2.11: 32 byte key Expansion

Expanded Key
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
16 17 18 19
20 21 22 23
24 25 26 27
28 29 30 31
32 33 34 35
36 37 38 39
40 41 42 43
44 45 46 47
48 49 50 51
52 53 54 55
56 57 58 59
60 61 62 63
64 65 66 67
68 69 70 71
72 73 74 75
76 77 78 79
80 81 82 83
84 85 86 87
88 89 90 91
92 93 94 95
96 97 98 99
100 101 102 103
104 105 106 107
108 109 110 111
112 113 114 115
116 117 118 119
120 121 122 123
124 125 126 127
128 129 130 131
132 133 134 135
136 137 138 139
140 141 142 143
144 145 146 147
148 149 150 151
152 153 154 155
156 157 158 159
160 161 162 163
164 165 166 167
168 169 170 171
172 173 174 175

Sub Word(Rot Word(EK((8-1)*4))) XORRcon((8/8)-1) XOR EK((8-8)*4)
EK((9-1)*4)XOR EK((9-8)*4)
EK((10-1)*4)XOR EK((10-8)*4)
EK((11-1)*4)XOR EK((11-8)*4)
Sub Word(EK((12-1)*4))XOR EK((12-8)*4)
EK((13-1)*4)XOR EK((13-8)*4)
EK((14-1)*4)XOR EK((14-8)*4)
EK((15-1)*4)XOR EK((15-8)*4)
Sub Word(Rot Word(EK((16-1)*4))) XORRcon((16/8)-1) XOR EK((16-8)*4)
EK((17-1)*4)XOR EK((17-8)*4)
EK((18-1)*4)XOR EK((18-8)*4)
EK((19-1)*4)XOR EK((19-8)*4)
Sub Word(EK((20-1)*4))XOR EK((20-8)*4)
EK((21-1)*4)XOR EK((21-8)*4)
EK((22-1)*4)XOR EK((22-8)*4)
EK((23-1)*4)XOR EK((23-8)*4)
Sub Word(Rot Word(EK((24-1)*4))) XORRcon((24/8)-1) XOR EK((24-8)*4)
EK((25-1)*4)XOR EK((25-8)*4)
EK((26-1)*4)XOR EK((26-8)*4)
EK((27-1)*4)XOR EK((27-8)*4)
Sub Word(EK((28-1)*4))XOR EK((28-8)*4)
EK((29-1)*4)XOR EK((29-8)*4)
EK((30-1)*4)XOR EK((30-8)*4)
EK((31-1)*4)XOR EK((31-8)*4)
Sub Word(Rot Word(EK((32-1)*4))) XORRcon((32/8)-1) XOR EK((32-8)*4)
EK((33-1)*4)XOR EK((33-8)*4)
EK((34-1)*4)XOR EK((34-8)*4)
EK((35-1)*4)XOR EK((35-8)*4)
Sub Word(EK((36-1)*4))XOR EK((36-8)*4)
EK((37-1)*4)XOR EK((37-8)*4)
EK((38-1)*4)XOR EK((38-8)*4)
EK((39-1)*4)XOR EK((39-8)*4)
Sub Word(Rot Word(EK((40-1)*4))) XORRcon((40/8)-1) XOR EK((40-8)*4)
EK((41-1)*4)XOR EK((41-8)*4)
EK((42-1)*4)XOR EK((42-8)*4)
EK((43-1)*4)XOR EK((43-8)*4)







Sub Word(EK((44-1)*4))XOR EK((44-8)*4)

EK((45-1)*4)XOR EK((45-8)*4)
EK((46-1)*4)XOR EK((46-8)*4)
EK((47-1)*4)XOR EK((47-8)*4)
Sub Word(Rot Word(EK((48-1)*4))) XORRcon((48/8)-1) XOR EK((48-8)*4)
EK((49-1)*4)XOR EK((49-8)*4)
EK((50-1)*4)XOR EK((50-8)*4)
EK((51-1)*4)XOR EK((51-8)*4)
Sub Word(EK((52-1)*4))XOR EK((52-8)*4)
EK((53-1)*4)XOR EK((53-8)*4)
EK((54-1)*4)XOR EK((54-8)*4)
EK((55-1)*4)XOR EK((55-8)*4)
Sub Word(Rot Word(EK((56-1)*4))) XORRcon((56/8)-1) XOR EK((56-8)*4)
EK((57-1)*4)XOR EK((57-8)*4)
EK((58-1)*4)XOR EK((58-8)*4)

2.14. Project Overview

Our project initially aims at understanding a conventional cryptographic standard
known as Advanced Encryption Standard (AES), which is the most-sought after secret-key
security algorithm that is to be effectively employed in the future for the greatest security
deals, in its various forms, namely simple AES (128 bit key and plain-text), APES (512 bit
data using parallel 128 bit AES), ADES (512 bit data using 64 bit DES and 128 bit AES), etc.,
all of which have been opened up by the tremendous growth and significant breakthroughs in
the recent history of conventional cryptography. Later, the Advanced Encryption Standard is
discussed in all its mathematical preliminaries and scientific depiction of the approved
algorithm. After this, the limitations in software implementation are analyzed and the various
hardware approaches are studied exhaustively.
A highly parallelized and low cost hardware architectural solution is proposed based
on relative merits of FPGA architecture; the architectural details and the functionality are
fully elucidated in its top-to-bottom modular hierarchy; simulated using the Mentor Graphics
VHDL simulator, Model Sim XE II v5.8C, synthesized using the Xilinx Synthesis tool,
Xilinx ISE 6 (Integrated Software Environment), and finally configured and implemented in
FPGA using SANDS FPGA/CPLD development platform.


Thus, in this project we ultimately aim at developing a cost-effective but highly

secure and parallelized solution for implementing the AES algorithm in hardware, by
effectively integrating the potential advantages, major capabilities and micro compactness of
the VLSI to revolutionize the major area of Secure Data Communications through Computers
Networks, which is now-a-days a major concern not only to the giant federal organizations
but also to the private individuals in this strategic world.
Project Requirements - Summary
1. Design Entry:System C / VHDL
2. Simulator:XilinxPlatformStudio(XPS)
Synthesis and Implementation by Xilinxs Webpack

XC-Xilinx Commercial,
1, 50,000 Gate Count,
plastic quad package,
Speed Grade: -5.

2.15. Conclusion
The above document provides you with only the basic information needed to
implement the AES encryption algorithm. The mathematics and design reasons behind AES
were purposely left out. For more information on these topics in Rijndael.


3.1 Introduction
MicroBlaze Soft Core processor is used to Synthesis using EDK10.1 on Spartan3E.
The Embedded Development Kit (EDK) from Xilinx allows the designer to build a complete
processor system on Xilinx's FPGAs. The systems that can be produced using EDK ranges
from simple single processor architecture to a complex multi-processor system with multiple
hardware accelerators. The tool mainly supports two types of processors:

MicroBlaze which is a reconfigurable soft-core processor and


Power-PC which is a hardcore processor implemented in some FPGAs from


Depending on the FPGA chip we are using, multiple MicroBlazes and Power-PCs can
be integrated together in a single design. EDK provides C/C++ compilers for both
MicroBlaze and Power-PC along with several tools for debugging/profiling of the
applications running on each processor. Besides, using ISE, you can perform several types of
simulations for the generated architectures which allow the estimation of both the
performance and power consumption of the architecture. This tutorial will demonstrate the
process of creating and testing a MicroBlaze system design using the Embedded
Development Kit (EDK) and Spartan 3E starter board from Xilinx.

3.2 Objectives
The project contains these sections:

System Requirements

MicroBlaze System Description


The following steps are described in this project:

Starting XPS

Using the Base System Builder Wizard

Create or Import IP Peripheral

Design Modification using Platform Studio

Implementing the Design

Defining the Software Design


3.3 System Requirements:

You must have the following software installed on your PC to complete this project:
Windows 2000 SP2/Windows XP

EDK 10.1i.

ISE 10.1i.

Familiarity with Xilinx ISE 10.1 design flow.

Spartan 3e starter kit and Xilinx USB download cable.

Update pin assignments in the system. elf file

Update board JTAG chain specified in the download.cmd

Fig. 3.1. Secure data communication controlled by an FPGA processor

3.4. FPGA Trainer Kit:

This topic is on requirements and specifications to get started with the FPGA trainer
Kit. The kit has the following Deliverables:
FPGA Kit in a box containing inbuilt units:
3.4.1. Key component features:

The key features of the Spartan-3E Starter Kit board are:

Xilinx XC3S200E Spartan-3E FPGA

Up to 232 user-I/O pins

320-pin FBGA package

Over 10,000 logic cells

Xilinx 4 Mbit Platform Flash configuration PROM

Xilinx 64-macrocell XC2C64A Cool Runner CPLD


64 MByte (512 Mbit) of DDR SDRAM, x16 data interface, 100+ MH

16 MByte (128 Mbit) of parallel NOR Flash (Intel Strata Flash)

FPGA configuration storage
MicroBlaze code storage/shadowing

16 Mbits of SPI serial Flash (STMicro)

FPGA configuration storage
MicroBlaze code shadowing

Fig 3.2: Xilinx Spartan 3E FPGA kit

2-line, 16-character LCD screen

PS/2 mouse or keyboard port

VGA display port

10/100 Ethernet PHY (requires Ethernet MAC in FPGA)

Two 9-pin RS-232 ports (DTE- and DCE-style)

On-board USB-based FPGA/CPLD download/debug interface

50 MHz clock oscillator

SHA-1 1-wire serial EEPROM for bit stream copy protection

Hirose FX2 expansion connector


Three Diligent 6-pin expansion connectors

Four-output, SPI-based Digital-to-Analog Converter (DAC)

Two-input, SPI-based Analog-to-Digital Converter (ADC) with programmablegain pre-amplifier

Chip Scope Soft Touch debugging port

Rotary-encoder with push-button shaft

Eight discrete LEDs

Four slide switches

3.5. Spartan-3E starter kit

The Spartan-3E Starter Kit board highlights the unique features of the Spartan-3E
shown below in figure 3.2 FPGA family and provides a convenient development board for
embedded processing applications.
The board highlights these features:
Spartan-3E specific features
Parallel NOR Flash configuration
Multi Boot FPGA configuration from Parallel NOR Flash PROM
SPI serial Flash configuration
Embedded development
MicroBlaze 32-bit embedded RISC processor
PicoBlaze 8-bit embedded controller
DDR memory interfaces
The main blocks here include the Micro-controller, CPLD, and the FPGA Spartan-II.
Micro-controller and the CPLD can tolerate 5 V whereas the FPGA operates with 2.5 V.
Micro-controller acts as booting interface between the whole kit architecture and the
SANDS software.

It converts the serial data obtained into parallel data as needed for

processing in the FPGA. It also takes control over the FPGA by acting as a Master over the
slave till configuration gets completed. Once it successfully configures FPGA then it releases
hold over it to make the FPGA function independently based upon the inputs provided. The
function of the CPLD is to coordinate and provide separate access to address and data bus
values attained from a common bus. Moreover, it also acts as a voltage controller to provide

the FPGA with the necessary 2.5 V from its input supply of 5 V. From the programmable
port, the hex file will be driven into the Micro-controller and from there to CPLD and then to
the target device.
Though this is a round-about process rather than programming the chip directly from
the JTAG port, it eliminates the need for costlier cables and high speed configuring software
by sacrificing the configuration speed to some extent, which is in fact affordable in many
Now, with the necessary inputs and the clock, we can run the configured gate-level
extracted circuit to achieve the functionality that we have designed and downloaded which
may be either encryption or decryption. The output generated before the UART software
module goes into the transmitter state-machine and the data will be converted from parallel to
serial which is collected at the serial communication port. The data obtained now can be
communicated to the other side using serial cable RS-232 which is connected directly to the
COM port of the other PC wherein the encipher or decipher can be seen. Thus, the FPGA
based processor achieves the implementation of the desired algorithm very effectively.

3.6 Xilinx platform studio (XPS)

The Xilinx Platform Studio (XPS) is the development environment or GUI used for
designing the hardware portion of your embedded processor system in figure 6.1. Embedded
Development Kit Xilinx Embedded Development Kit (EDK) is an integrated software tool
suite for developing embedded systems with Xilinx MicroBlaze and PowerPC CPUs. EDK
includes a variety of tools and applications to assist the designer to develop an embedded
system right from the hardware creation to final implementation of the system on an FPGA.
System design consists of the creation of the hardware and software components of the
embedded processor system and the creation of a verification component is optional.
A typical embedded system design project involves: hardware platform creation,
hardware platform verification (simulation), software platform creation, software application
creation, and software verification.


Base System Builder is the wizard that is used to automatically generate a hardware
platform according to the user specifications that is defined by the MHS (Microprocessor
Hardware Specification) file.
The MHS file defines the system architecture, peripherals and embedded processors].
The Platform Generation tool creates the hardware platform using the MHS file as input.
The software platform is defined by MSS (Microprocessor Software Specification)
file which defines driver and library customization parameters for peripherals, processor
customization parameters, standard 110 devices, interrupt handler routines, and other
software related routines. The MSS file is an input to the Library Generator tool for
customization of drivers, libraries and interrupts handlers.

Figure 3.3.: Xilinx Platform Studio Set up.

XPS includes a graphical user interface (GUI), along with a set of tools that aid in
project design. From the XPS GUI, you can design a complete embedded processor system
for implementation within a Xilinx FPGA device. The XPS main window is shown in the
figure below.


Note that the XPS main window is divided into three areas:

The Project Information Panel


The System Assembly Panel


The Connectivity Panel

Figure 3.4.: Xilinx Platform Studio GUI

XPS Features Include

Base System Builder allows creation of a fully functional processor system in minutes

System Assembly View allows user to quickly customize and configure design details

IP configuration dialogs open automatically when new IP is added to a design

Auto bus connectivity on AXI based designs

Extensive catalog of AXI and PLB based processors, peripherals, and utility IP

Tightly integrated with ISE Project Navigator, ISim, and Chip Scope

Create / Import IP wizard automates creation of custom IP templates, and provides

mechanism to import user IP into XPS, and Bus Functional Model simulation support
for custom IP.

Debug Wizard automates hardware / software cross triggering and Chip Scope

Hardware project export to the Software Development Kit (SDK)


3.7. Projection information panel

The Project Information Area panel offers control over and information about your
project. The Project Information panel provides Project, Applications, and IP Catalog tabs
shown in figure 6.3.

The Project Tab lists references to project related files. Information is grouped in the
following general categories:
1. Project Files: All project-specific files such as the Microprocessor Hardware
Specification (MHS) files, Microprocessor Software Specification (MSS) files,
User Constraints File (UCF) files, Impact Command files, Implementation Option
files, and Bitgen Option files.
2. Project Options: All project specific options, such as Device, Net,
Implementation, Hardware Description Language (HDL), and Sim Model options.
3. Reference Files: All log and output files produced by the XPS implementation

Figure 3.5. Project Information Area:Project Tab

Application tab:

The Applications tab lists all software application option settings, header files, and
source files associated with each application project. With this tab selected, you can:

Create and add a software application project, build the project, and load it to the
block RAM.

Set compiler options.

Add source and header files to the project.

IP catalog tab:
The IP Catalog tab lists all the EDK IP cores and any custom IP cores you
created as shown in figure 6.4. If a project is open, only the IP cores compatible with
the target Xilinx device architecture are displayed.
The catalog lists information about the IP cores, including release version,
status (active, early access or deprecated), lock (not licensed, locked, or unlocked),
processor support, and a short description. Additional details about the IP core,
including the version change history, data sheet, and Microprocessor Peripheral
Description (MPD) file, are available in the right-click menu. By default, the IP cores
are grouped hierarchically by function.

Figure 3.6. Project Information Area: IP Catalog Tab


The system assembly panel:

The System Assembly Panel is where you view and configure system block elements.
If the System Assembly Panel is not already maximized in the main window, click the
System Assembly tab at the bottom of the pane to open it.
bus interface, ports, and address filters: XPS provides Bus Interface, Ports, and
Addresses radio buttons in the System Assembly Panel (shown in the figure below), which
organize information about your design and allow you to edit your hardware platform more


Fig 3.7. System Assembly Panel Views


The connectivity panel

With the Bus Interface filter selected, youll see the Connectivity Panel, highlighted
by the dashed line. The Connectivity Panel is a graphical representation of the
hardware platform interconnects.

A vertical line represents a bus, and a horizontal line represents a bus interface to an

IP core.

If a compatible connection can be made, a connector is displayed at the intersection

between the bus and IP core bus interface.

The lines and connectors are color-coded to show the compatibility.

Differently shaped connection symbols indicate mastership of the IP core bus


A hollow connector represents a connection that you can make, and a filled connector
represents a connection made. To create or disable a connection, click the connector

3.8. Integrated software environment (ISE)

ISE is the foundation for Xilinx FPGA logic design. Because FPGA design can be an
involved process, Xilinx has provided software development tools that allow the designer to
circumvent some of this complexity. Various utilities such as constraints entry, timing
analysis, logic placement and routing, and device programming have all been integrated into


3.8.1. Steps for Setup

Spartan3E starter board with a RS-232 terminal connected to the serial port and
configured for 57600 baud, with 8 data bits, no parity and no handshakes.
Creating the Project File in XPS

The first step in this tutorial is using the Xilinx Platform Studio (XPS) to create a project file.
XPS allows you to control the hardware and software development of the MicroBlaze system,
and includes the following:

An editor and a project management interface for creating and editing source code

Software tool flow configuration options

You can use XPS to create the following:

(i) A Project Navigator project file that allows you to control the hardware implementation
(ii) A Microprocessor Hardware Specification (MHS) file
(iii) Microprocessor Software Specification (MSS) file
XPS supports the software tool flow associated with these software specifications.
Additionally, you can use XPS to customize software libraries, drivers, and interrupt
handlers, and to compile your programs.

Starting XPS
(a)To open XPS, select Start All Programs Development Xilinx ISE Design
Suite10.1 EDK Xilinx Platform Studio.
(b) Select Base System Builder Wizard (BSB) to open the \Create New XPS Project
Using BSB Wizard" dialogue box shown in Figure6.1.

Fig 3.8: starting window of XPS


(c) Click Ok.

(d) Use the Project File Browse button to browse to the folder you want as your
project directory.
(e) Click Open to create the system.xmp file then Save.

Fig 3.9: Create New XPS Project Using Base System Builder Wizard
(f) Click Ok to start the BSB wizard. The wizard window will appear, which will be
used to build the design as will be discussed in following sections.

3.9. Defining the system hardware

3.9.1 MHS and MPD Files
The next step is defining the embedded system hardware with the Microprocessor
Hardware Specification (MHS) and Microprocessor Peripheral Description (MPD) files.
MHS File:
The Microprocessor Hardware Specification (MHS) file describes the following:

Embedded processor: either the soft core MicroBlaze processor or the hard core
PowerPC (only available in Virtex-II Pro and Virtex-4 FX devices)

Peripherals and associated address spaces


Overall connectivity of the system

The MHS file is a readable text file that is an input to the Platform Generator (the

hardware system building tool). Conceptually, the MHS file is a textual schematic of the
embedded system. To instantiate a component in the MHS file, you must include information
specific to the component.
MPD File:
Each system peripheral has a corresponding MPD file. The MPD file is the symbol of
the embedded system peripheral to the MHS schematic of the embedded system. The MPD


file contains all of the available ports and hardware parameters for a peripheral. The MPD file
is located in the following directory:
$XILINX EDK= hw =Xilinx Processor IPLib= Pcores = < Peripheral name > =data
EDK provides two methods for creating the MHS file. Base System Builder Wizard
and the Add/Edit Cores Dialog assist you in building the processor system, which is defined
in the MHS file. This illustrates the Base System Builder.

3.9.2 Using the Base System Builder Wizard

Use the following steps to create the processor system:

In the Base System Builder - Select I would like to create a new design" then click

In the Base System Builder - Select Board Dialog select the following, as shown in
Figure 6.8:

Board Vendor: Xilinx

Board Name: Spartan-3E Starter Board

Board Revision: C

Click next. Select the MicroBlaze

Click Next. You will now specify several processor options as shown in Figure 6.8:

The following is an explanation of the settings specified in Figure

System Wide Setting:

Reference clock frequency: This is the on board frequency of the clock.

Processor-Bus clock frequency: This is the frequency of the clock driving the
processor system.

Processor Configuration:

Debug I/F:

On-Chip H/W Debug module: When the H/W debug module is selected; a PLB MDM
module is included in the hardware system. This introduces hardware intrusive
debugging with no software stub required. This is the recommended way of
debugging for MicroBlaze system.

XMD with S/W Debug stub: Selecting this mode of debugging interface introduces a
software intrusive debugging. There is a 1200-byte stub that is located at 0x00000000.
This stub communicates with the debugger on the host through the JTAG interface of
the PLB MDM module.

No Debug: debugging is disabled.

Fig 3.10: BSB: Select a Board

Users can specify the size of the local instruction and data memory.

Cache setup:

No Cache: No caching will be used

Enable cache link: Caching will be used through the FSL bus

You can also specify the use of the floating point unit (FPU).

Click Next.
Select the peripheral subset (Configure IO Interfaces wizard) as shown in Figure 6.5.

It should be noted that the number of peripheral shown on each dialogue box is dynamic
based upon your computers resolution.


Fig 3.11: configure processor

In the first page of the Configure IO Interfaces wizard", Figure 6.10:

RS232_DTE deselect

RS232_DCE select

XPS UARTLITE baud-rate 57600, data bits 8 and Parity NONE

LEDs 8Bit select

Click Next

In the second page of the Configure IO Interfaces wizard", Figure 6.11:

DIP Switch 4Bit select

Buttons 4Bit deselect

FLASH deselect

SPI FLASH deselect

Click Next

Fig 3.12: Configure I/O Interfaces 1

In the third page of the Configure IO Interfaces wizard", Figure 6.11:

DDR SDRAM select

Ethernet Mac deselect


Click Next through the Add Internal Peripherals page as we will not add any in this

Click Next

This completes the hardware specification and we will now configure the software
settings. Using the

Software Setup dialogue box as shown in Figure 6.13, specify the following software

Standard Input (STDIN) RS232

Standard Output (STDOUT) RS232

Boot Memory ilmbcntlr

Sample Application Selection Memory Test

Click Next.

Configure I/O
Interfaces 2
Configure I/O Interfaces 3
Using the Configure Memory Test Application dialogue box as shown in Figure 6.8, specify
the following software settings:

Instructions ilmbcntlr

Data dlmbcntlr

Stack/Heap dlmbcntlr

Click Next.


Fig 3.15: Software Setup

The simple memory test application will illustrate system aliveness and perform a
basic read/write to your memory devices.
The completed system including the memory map will be displayed as shown in Figure6.9.
Currently the memory map cannot be changed or updated in the BSB. If you want to change
the memory map you can do this in XPS.

Click Generate and then Finish, to complete the design.

Select Start Using Platform Studio" and click OK.

3.10. Review
The Base System Builder Wizard has created the hardware and software specification
files that define the processor system. When we look at the project directory, shown in Figure
6.10, we see these as system.mhs and system.mss. There are also some directories created:

data - contains the UCF (user constraints file) for the target board.

etc - contains system settings for JTAG configuration on the board that is used when
downloading the bit file and the default parameters that are passed to the ISE tools.

pcores - is empty right now, but is utilized for custom peripherals.

TestApp Memory - contains a user application in C code source, for testing the
memory in the system.

3.10.1. Project Options

To see the project options that Base System Builder has configured select:
Project Project Options, the device information is specified. Select: Hierarchy and Flow.
This window provides the opportunity to export the processor system into an ISE project as
either the top level system or a sub-module design.

Fig 3.16: Configure Memory Test Application

Fig 3.16: Generated Processor System

Click finish to build project


Fig 3.17: BSB Finish Setup

To continue with XPS Project click start using Platform Studio

Fig. 3.18: start using XPS

3.10.2 Implementing the Design
Now that the hardware has been completely specified in the MHS file, you can run the
Platform Generator. Platform Generator elaborates the MHS _le into a hardware system
consisting of NGC files that represent the processor system. Then the Xilinx ISE tools will be
called to implement the design for the target board. To generate a netlist and create the bit
file, follow these steps:
3.10.3 Defining the Hardware Design

Start Generating Netlist and Bitstream of Microprocessor Hadware Specification file

to read the hardware

Fig 3.19: Starting MHS Netlist and Bitstream Generation

Select Hardware Generate Netlist. This will elaborate the MHS file and generate a
netlist for the complete system (this will take a while!).


Select Hardware Generate Bitstream. This will call ISE tools to implement the
design and generate a bit file that could be downloaded into the FPGA.
At the end of this step the XPS output screen should look like Figure 6.14. The bit file

that is generate is called system.bit which contains all the required information to configure
the FPGA except the contents of the block ram (application/data). The bit file will be updated
with the application code after defining the software design.
3.10.4 Defining the Software Design
Now that the hardware design is completed, the next step is defining the software
design. There are two major parts to software design, configuring the Board Support Package
(BSP) and writing the software applications. The configuration of the BSP includes the
selection of device drivers and libraries.

Fig 3.20: after H/W and S/W Specification netlist generated the block diagram

3.11. Generating the linker script file

From the system assemble view copy the address of DDR_SDRAM starting address.

On project information area, in application select project right click select

compiler options.

In compiler options paste the starting address

Generating the linker script by selecting the Generate linker option from the same


3.12. Building the User Application

In EDK 10.1, XPS provides for the user with the ability to create multiple software
projects. These projects can include source _les, header _les, and linker scripts. Unique
software projects allow the designer to specify the following options for each software

Specify compiler options

Specify which projects to compile

Specify which projects to download

Build entire projects

Software application code development can be managed by selecting the Applications

tab as shown. The Base System Builder (BSB) generates a sample application which tests a
subset of the peripherals included in the design.

Compiling the Code

Using the GNU GCC Compiler, compile the application code as follows:

Select Software Build All User Applications to run mb-gcc. Mb-gcc compiles the
source files.

3.13. Downloading the Design

Now that the hardware and software designs are completed, the device can be
configured. Follow these steps to download and configure the FPGA:

Connect the host computer to the target board, including connecting the Xilinx USB
download cable and the serial cable.

Start a hyper-terminal session with the following settings: -com1 . This is

dependant on the com port your serial cable is connected to.-Bits per
second: 57600

Connect the board power


In EDK, select Device Configuration Update Bit-stream. This will update the bit
file with the application compiled code. Repeat this step each time the application

Select Device Configuration Download Bit-stream. This will start device

configuration software (iMPACT) within EDK and executes the download command
file etc/download.cmd.

iMPACT will download the file download.bit on the FPGA.

Fig 3.21. FPGA Physical diagram

Fig 3.22. FPGA hardware output diagram


.ELF file generation

After downloading both Hardware and Software .bit generation .elf file will be generated by

Delecting option Debug in menu

Before debugging set options for JTAG activation

Select option Debug in the menu lunch XMD

ELF file window will be shown as below fig 3.20


3.15. Conclusion

The implementation requirement which includes the primary input and primary output
of the design and the proper notation and conventions were discussed.

General implementation flow of the design were represented and explained in order to
understand the proper flow.

Implementation details have been discussed which includes implementation style of

each process.

Finally the synthesis process was discussed which gives that in which FPGA family,
the design has been implemented.


4.1. Introduction
The purpose of the Design is to walk you through a complete hardware and software
processor system design. In this process, you will use the BSB of the XPS system to
automatically create a processor system and then add a custom OPB peripheral (adder circuit)
to that processor system which will consist of the following items:

Fig.4.1: FPGA Internal Diagram

Micro Blaze Processor
Local Memory Bus (LMB) Bus
LMB BRAM controllers for BRAM
BRAM BLOCK (On-chip memory)
On-chip Peripheral Bus (OPB) BUS

Debug Module (OPB_MDM)

2 - General Purpose Input / Output Pheriphals (OPB_GPIOs)
Push Buttons
Dip Switches
Custom peripheral (32-bit adder circuit)

4.2 MicroBlaze Processor Design

Field-programmable gate arrays (FPGA's) are flexible and reusable high-density
circuits that can be easily re-configured by the designer, enabling the VLSI design /
validation /simulation cycle to be performed more quickly and less expensive. Increasing
device densities have prompted FPGA manufacturers, such as Xilinx and Altera, to
incorporate larger embedded components, including multipliers, DSP blocks and even
embedded processors. One of the recent architectural enhancements in the Xilinx Spartan,
Virtex family architectures is the introduction of the MicroBlaze (Soft IP) and PowerPC405
hard-core embedded processor. The MicroBlaze processor is a 32-bit Harvard Reduced
Instruction Set Computer (RISC) architecture optimized for implementation in Xilinx FPGAs
with separate 32-bit instruction and data buses running at full speed to execute programs and
access data from both on-chip and external memory at the same time.

4.3. MicroBlaze System Description

In general, to design an embedded processor system, you need the following:

Hardware components

Memory map

Software application

4.3.1. Design Hardware

The MicroBlaze (MB) tutorial design includes the following hardware components:


Local Memory Bus (LMB)




Multi-Port Memory Controller (MPMC)




4.3.2. Design Memory Map

The following table shows the memory map for the tutorial design as created by Base
System Builder.









16K bytes
64K bytes
64K bytes
64K bytes
64K bytes

LMB Memory

0X8600_0000 0x87FF_FFFF 32Mbytes

Table 4.1: Design memory map


4.4. Background
The backbone of the architecture is a single-issue, 3-stage pipeline with 32 generalpurpose registers (does not have any address registers like the Motorola 68000 Processor), an
Arithmetic Logic Unit (ALU), a shift unit, and two levels of interrupt. This basic design can
then be configured with more advanced features to tailor to the exact needs of the target
embedded application such as: barrel shifter, divider, multiplier, single precision on floatingpoint unit (FPU), instruction and data caches, exception handling, debug logic, Fast Simplex
Link (FSL) interfaces and others.
This flexibility allows the user to balance the required performance of the target
application against the logic area cost of the soft processor MicroBlaze also supports reset,
interrupt, user exception, and break hardware exceptions. For interrupts, MicroBlaze supports
only one external interrupt source (connecting to the Interrupt input port). If multiple

interrupts are needed, an interrupt controller must be used to handle multiple interrupt
requests to MicroBlaze shown in figure4.2. An interrupt controller is available for use with
the Xilinx Embedded Development Kit (EDK) software tools. The processor will only react
to interrupts if the Interrupt Enable (IE) bit in the Machine Status Register (MSR) is set to 1.
On an interrupt the instruction in the execution stage will complete, while the instruction in
the decode stage is replaced by a branch to the interrupt vector (address Ox 10).
The interrupt return address (the PC associated with the instruction in the decode
stage at the time of the interrupt) is automatically loaded into general-purpose register. In
addition, the processor also disables future interrupts by clearing the IE bit in the MSR. The
IE bit is automatically set again when executing the RTlD instruction.

Fig 4.2: MicroBlaze architecture block diagram

Due to the advancement in the fabrication technology and the increase in the density
of logic blocks on FPGA, the use of FPGA is not limited anymore to debugging and
prototyping digital electronic circuits. Due to the enormous parallelism achievable on FPGA
and the increasing density of logic blocks, it is being used now as a replacement to ASIC
solutions in a few applications where the time to market is critical and also entire embedded
processor systems are implemented on these devices with soft core processors embedded in
the system. Soft cores are technology independent and require only simulation and timing
verification after synthesized to a target technology. This reduces the design cycle
development time by a major factor as compared to the development cycle for a hard core
processor and has the advantage of customizing the soft core design for a specific application.

4.5. Features
The MicroBlaze soft core processor is highly configurable, allowing you to select a
specific set of features required by your design.
The fixed feature set of the processor includes:
Thirty-two 32-bit general purpose registers
32-bit instruction word with three operands and two addressing modes
32-bit address bus
Single issue pipeline
In addition to these fixed features, the MicroBlaze processor is parameterized to allow
selective enabling of additional functionality. Older (deprecated) versions of MicroBlaze
support a subset of the optional features described here. Only the latest (preferred) version of
MicroBlaze (v7.00) supports all options. Xilinx recommends that all new designs use the
latest preferred version of the MicroBlaze processor.

4.6 Pipeline Architecture

MicroBlaze instruction execution is pipelined. For most instructions, each stage takes
one clock cycle to complete. Consequently, the number of clock cycles necessary for s
specified instruction to complete is equal to the number of pipeline stages, and one
instruction is completed in every cycle. A few instructions require multiple clock cycles in the
execute stages to complete. This is achieved by stalling the pipeline.When executing from
slower memory, instruction fetches may take multiple cycles. This additional latency directly
affects the efficiency of the pipeline. MicroBlaze implements an instruction prefetch buffer
that reduces the impact of such multi-cycle instruction memory latency. While the pipeline is
stalled by a multi-cycle instruction in the execution stage, the prefetch buffer continues to
load sequential instructions. When the pipeline resumes execution, the fetch stage can load
new instructions directly from the prefetch buffer instead of waiting for the instruction
memory access to complete.

4.7 Three Stage Pipeline

When area optimization is enabled, the pipeline is divided into three stages to
minimize hardware cost: Fetch, Decode, and Execute.
Cycle 1

cycle 2

cycle 3

cycle 4

cycle 5

cycle 6

cycle 7

Instruction 1


Instruction 2












Instruction 3


4.8 Five Stage Pipeline

When area optimization is disabled, the pipeline is divided into five stages to
maximize performance: Fetch (IF), Decode (OF), Execute (EX), Access Memory (MEM),
and Write back (WB).

4.8.1. Memory Architecture

Micro Blaze is implemented with Harvard memory architecture; instruction and data
accesses are done in separate address spaces. Each address space has a 32-bit range (that is,
handles up to 4- GB of instruction and data memory respectively). The instruction and data
memory ranges can be made to overlap by mapping them both to the same physical memory.
The latter is useful for software debugging. Both instruction and data interfaces of
MicroBlaze are 32 bits wide and use big endian, bit-reversed format. MicroBlaze supports
word, halfword, and byte accesses to data memory. Data accesses must be aligned (word
accesses must be on word boundaries, half word on halfword boundaries), unless the
processor is configured to support unaligned exceptions. All instruction accesses must be
word aligned.
Micro Blaze does not separate data accesses to I/O and memory (it uses memory
mapped I/O). The processor has up to three interfaces for memory accesses. The LMB
memory address range must not overlap with PLB, OPB or XCL ranges. MicroBlaze has a
single cycle latency for a accesses to local memory (LMB) and for cache read hits, except
with area optimization enabled when data side accesses and data cache read hits require two


clock cycles. A data cache write normally has two cycles of latency (more if the posted-write
buffer in the memory controller is full).
The MicroBlaze instruction and data caches can be configured to use 4 or 8 word
cache lines. When using a longer cache line, more bytes are pre-fetched, which generally
improves performance for software with sequential access patterns.
However, for software with a more random access pattern the performance can
instead decrease for a given cache size. This is caused by a reduced cache hit rate due to
fewer available cache lines.

Local Memory Bus (LMB)

Processor Local Bus (PLB)

On-chip Peripheral Bus (OPB)

Xilinx Cache Link (XCL).

4.9 MicroBlaze I/O Overview

The core interfaces shown in Figure 1-1 are defined as follow:
DPLB: Data interface, Processor Local Bus
DOPB: Data interface, On-chip Peripheral Bus
DLMB: Data interface, Local Memory Bus (BRAM only)
IPLB: Instruction interface, Processor Local Bus
IOPB: Instruction interface, On-chip Peripheral Bus
ILMB: Instruction interface, Local Memory Bus (BRAM only)
MFSL 0-15: FSL master interfaces
SFSL 0-15: FSL slave interfaces
IXCL: Instruction side Xilinx Cache Link interface (FSL master/slave pair)
DXCL: Data side Xilinx Cache Link interface (FSL master/slave pair)
Core: Miscellaneous signals for: clock, reset, debug, and trace.

Processor Local Bus (PLB) Interface Description: The MicroBlaze PLB interfaces are

implemented as byte-enable capable 32-bit masters.

On-Chip Peripheral Bus (OPB) Interface Description: The MicroBlaze OPB interfaces are

implemented as byte-enable capable masters.

Local Memory Bus (LMB) Interface Description: The LMB is a synchronous bus used
primarily to access on-chip block RAM. It uses a minimum number of control signals and a


simple protocol to ensure that local block RAM are accessed in a single clock cycle. LMB
signals and definitions are shown in the following table. All LMB signals are active high.

4.10. Experimental setup

4.10.1 Xilinx Platform Studio
The Xilinx Platform Studio (XPS) is the development environment or GUI used for
designing the hardware portion of your embedded processor system. Embedded Development
Kit Xilinx Embedded Development Kit (EDK) is an integrated software tool suite for
developing embedded systems with Xilinx MicroBlaze and PowerPC CPUs. EDK includes a
variety of tools and applications to assist the designer to develop an embedded system right
from the hardware creation to final implementation of the system on an FPGA. System
design consists of the creation of the hardware and software components of the embedded
processor system and the creation of a verification component is optional. A typical
embedded system design project involves: hardware platform creation, hardware platform
verification (simulation), software platform creation, software application creation, and
software verification. Base System Builder is the wizard that is used to automatically
generate a hardware platform according to the user specifications that is defined by the MHS
(Microprocessor Hardware Specification) file. The MHS file defines the system architecture,
peripherals and embedded processors. The Platform Generation tool creates the hardware
platform using the MHS file as input.

Fig 4.3. Embedded Development Kit Design Flow


The creation of the verification platform is optional and is based on the hardware
platform. The MHS file is taken as an input by the Sim-gen tool to create simulation files for
a specific simulator. Three types of simulation models can be generated by the Sim-gen tool:
behavioral, structural and timing models.
Some other useful tools available in EDK are Platform Studio which provides the
GUI for creating the MHS and MSS files. Create / Import IP Wizard which allows the
creation of the designer's own peripheral and import them into EDK projects. Bit stream
Initializer tool initializes the instruction memory of processors on the FPGA. GNU Compiler
tools are used for compiling and linking application executables for each processor in the
system [8]. There are two options available for debugging the application created using EDK
namely: Xilinx Microprocessor Debug (XMD) for debugging the application software using a
Microprocessor Debug Module (MDM) in the embedded processor system, and Software
Debugger that invokes the software debugger corresponding to the compiler being used for
the processor. Software Development Kit Xilinx Platform Studio Software Development Kit
(SDK) is an integrated development environment, complimentary to XPS, that is used for
C/C++ embedded software application creation and verification. The software application can
be written in a "C or C++" then the complete embedded processor system for user application
will be completed, else debug & download the bit file into FPGA. Then FPGA behaves like
processor implemented on it in a Xilinx Field Programmable Gate Array (FPGA) device.

Fig 4.4: Hardware and Software flow

4.11. Design Flow

To build an embedded system on Xilinx FPGAs, the embedded development kit
(EDK) is used to complete the reconfigurable design Figure3.2 shows the design flow. Unlike
the design flow in the traditional software design using C/C++ language or hardware design

using hardware description languages, the EDK enables the integration of both hardware and
software components of an embedded system.
For the hardware side, the design entry from VHDL/Verilog is first synthesized into a
gate-level netlist, and then translated into the primitives, mapped on the specific device
resources such as Look-up tables, flip-flops, and block memories. The location and
interconnections of these device resources are then placed and routed to meet with the timing
Constraints. A downloadable .bit file is created for the whole hardware platform. The
software side follows the standard embedded software flow to compile the source codes into
an executable and linkable file (ELF) format. Meanwhile, a microprocessor software
specification (MSS) file and a microprocessor hardware specification (MHS) file are used to
define software structure and hardware connection of the system. The EDK uses these files to
control the design flow and eventually merge the system into a single downloadable file. The
whole design runs on a real-time operating system (RTOS).

Fig 4.5: Design flow

4.12. FPGA Design flow

The FPGA based Design Flow is extensively used in todays world due to its
following advantages short design time, easy to market etc. FPGA based Design Flow
allows one to implement his/her VLSI design in a very short duration, cater to customer


needs and make last minute changes. The FPGA based Design Flow consists of different
stages as shown in Fig.28.

Fig. 4.6. FPGA Design Flow

Design Entry




Configuring or Programming the target device.

4.13. Design Entry and Simulation

The major drawback of traditional design methods is the manual translation of design
description into a set of logical equations. This step can be entirely eliminated with hardware
description languages (HDLs). For example, most HDL tools such as VHDL, Verilog HDL
tools allow the use of finite state machines for sequential systems and truth tables for
combinatorial modules. Such design descriptions can be automatically converted into HDL
code that can be implemented by Synthesis tools. Hardware description languages found
their principal application in programmable logic devices (PLDs) of various complexities,
from simple PLDs up to complex CPLDs and FPGAs. There are several HDL languages in


use today. The most popular ones are VHDL (Very High Speed Integrated Circuit HDL),
Verilog HDL and Abel.

4.13.1. Hardware implementation ISE/XPS Flow

The ISE/XPS flow provides integration of a processor system at two levels as a
component in a FPGA design :
The processor system is the top-level design
The processor system is a sub module
Once the processor system is added in the ISE project, XPS can be invoked from ISE
by selecting .xmp file in Sources window and double-clicking Manage Processor
System in the Processes window
Add user constraint file in ISE
Four stages to perform software flow:

Pre-processor: Replaces all macros with definitions as defined in the .c or .h

Machine-specific and language-specific compiler: Compiles C/C++ code
Assembler: Converts code to machine language and generates the object file
Linker: Links all the object files using user-defined or default linker script

4.14 Spartan-3E Starter KIT

The Spartan-3E Starter Kit board highlights the unique features of the Spartan-3E
shown below in figure 6.17 FPGA family and provides a convenient development board for
embedded processing applications.
The board highlights these features:
Spartan-3E specific features
Parallel NOR Flash configuration
Multi Boot FPGA configuration from Parallel NOR Flash PROM
SPI serial Flash configuration
Embedded development
MicroBlaze 32-bit embedded RISC processor
PicoBlaze 8-bit embedded controller
DDR memory interfaces


Fig 4.7: Xilinx Spartan 3E FPGA kit

4.14.1. Key component features
The key features of the Spartan-3E Starter Kit board are:
1) Xilinx XC3S200E Spartan-3E FPGA
Up to 232 user-I/O pins
320-pin FPGA package
Over 10,000 logic cells

Xilinx 4 Mbit Platform Flash configuration PROM

Xilinx 64-macrocell XC2C64A Cool Runner CPLD
64 MByte (512 Mbit) of DDR SDRAM, x16 data interface, 100+ MHz
16 MByte (128 Mbit) of parallel NOR Flash (Intel Strata Flash)
FPGA configuration storage
MicroBlaze code storage/shadowing

6) 16 Mbits of SPI serial Flash (STMicro)


FPGA configuration storage

MicroBlaze code shadowing

2-line, 16-character LCD screen

PS/2 mouse or keyboard port

VGA display port

10/100 Ethernet PHY (requires Ethernet MAC in FPGA)

Two 9-pin RS-232 ports (DTE- and DCE-style)

On-board USB-based FPGA/CPLD download/debug interface

50 MHz clock oscillator

SHA-1 1-wire serial EEPROM for bit stream copy protection

Hirose FX2 expansion connector

Three Digilent 6-pin expansion connectors

Four-output, SPI-based Digital-to-Analog Converter (DAC)

Two-input, SPI-based Analog-to-Digital Converter (ADC) with programmable-gain preamplifier

o Chip Scope Soft Touch debugging port


Rotary-encoder with push-button shaft

Eight discrete LEDs

Four slide switches

4.15. Configuration Methods

A typical FPGA application uses a single non-volatile memory to store configuration
images. To demonstrate new Spartan-3E capabilities, the starter kit board has three different
configuration memory sources that all need to function well together. The extra configuration
functions make the starter kit board more complex than typicalSpartan-3E applications.
The starter kit board also includes an on-board USB-based JTAG programming interface. The
on-chip circuitry simplifies the device programming experience. In typical applications, the
JTAG programming hardware resides off-board or in a separate programming module, such
as the Xilinx Platform USB cable.

4.16. Voltages for all Applications


The Spartan-3E Starter Kit board showcases a triple-output regulator developed by

Texas Instruments, the TPS75003 specifically to power Spartan-3 and Spartan-3E FPGAs.
This regulator is sufficient for most stand-alone FPGA applications. However, the starter kit
board includes DDR SDRAM, which requires its own high-current supply. Similarly,
theUSB-based JTAG download solution requires a separate 1.8V supply.

4.17. JTAG
JTAG primary purpose is to allow a computer to take control of the state of all the IO
pins on a board. In turn, this allows each device connectivity to other devices on the board to
be tested. Standard JTAG commands can be used for this purpose.
FPGAs are JTAG-aware and so all the FPGA IO pins can be controlled from the
JTAG interface. FPGAs add the ability to be configured through JTAG (using proprietary
JTAG commands).
JTAG consists of 4 signals: TDI, TDO, TMS and TCK. A fifth pin, TRST, is optional.
A single JTAG port can connect to one or multiple devices (as long as they are all JTAGaware parts). With multiple devices, you create what is called a "JTAG chain". The TMS and
TCK are tied to all the devices directly, but the TDI and TDO form a chain: TDO from one
device goes to TDI of the next one in the chain. The master controlling the chain (a computer
usually) closes the chain.

4.18. RS232
As shown in Figure 4.7, the Spartan-3E Starter Kit board has two RS-232 serial ports:
a female DB9 DCE connector and a male DTE connector. The DCE-style port connects
directly to the serial port connector available on most personal computers and workstations
via a standard straight-through serial cable. Null modem, gender changers, or crossover
cables are not required.
Use the DTE-style connector to control other RS-232 peripherals, such as modems or
printers, or perform simple loop back testing with the DCE connector.

Fig 4.8: RS 232 Serial ports

The FPGA supplies serial output data using LVTTL or LVCMOS levels to the Maxim
device, which in turn, converts the logic value to the appropriate RS-232 voltage level.
Likewise, the Maxim device converts the RS-232 serial input data to LVTTL levels for the
FPGA. A series resistor between the Maxim output pin and the FPGAs RXD pin protects
against accidental logic conflicts.
Hardware flow control is not supported on the connector. The ports DCD, DTR, and
DSR signals connect together, as shown in Figure 5.4. Similarly, the ports RTS and CTS
signals connect together.

4.19. Universal Asynchronous Receiver/Transmitter (UART)

4.19.1. Introduction
The Universal Asynchronous Receiver Transmitter (UART) is a popular and widelyused device for data communication in the field of telecommunication. There are different
versions of UARTs in the industry. Some of them contain FIFOs for the receiver/transmitter
data buffering and some of them have the 9 Data bits mode (Start bit + 9 Data bits + Parity +
Stop bits). This application note describes a fully configurable UART optimized for and
implemented in a variety of Lattice devices, which have superior performance and
architecture compared to existing semiconductor ASSPs (application-specific standard
products). This UART reference design contains a receiver and a transmitter.
The receiver performs serial-to-parallel conversion on the asynchronous data frame
received from the serial data input SIN. The transmitter performs parallel-to serial conversion
on the 8-bit data received from the CPU. In order to synchronize the asynchronous serial data

and to insure the data integrity, Start, Parity and Stop bits are added to the serial data. An
example of the UART frame format is shown in Figure 23 below.

Figure 4.9. UART Frame Format: (1 Start Bit, 8 Data Bits, 1 Parity Bit, 1 Stop Bit)
This design can also be instantiated many times to get multiple UARTs in the same
device. For easily embedding the design into a larger implementation, instead of using tristate buffers, the bi-directional data bus is separated into two buses, DIN and DOUT. The
transmitter and receiver both share a common internal Clk16X clock. This internal clock
which needs to be 16 times of the desired baud rate clock frequency is obtained from the onboard clock through the MCLK input directly.
4.19.2. Features

Functionally compatible with the NS16450 UART.

Faster performance than industry standard hardwired devices.

Inserts or extracts standard asynchronous communication bits (Start, Stop and Parity)
to or from the serial data.

Holding and shifting registers eliminate the need for precise synchronization between
the CPU and serial data.

Standard CPU Interface.

Fully prioritized interrupt system control.

MODEM interface functions (CTS, RTS, DSR, DTR, RI and DCD)

Fully programmable serial interface characteristics:

a) 5, 6, 7 or 8-bit characters
b) Even, odd, or no-parity bit generation and detection
c) 1, 1.5 or 2-stop bit generation and detection

False Start bit detection

Interactive control signaling and status reporting capabilities

Separate input and output data buses for use as an embedded module in a larger

Receiver synchronizes off the Start bit

Receiver samples all incoming bits at the center of each bit.

4.19.3. Operations Overview

Fig.4.10. UART General Block Diagram

Thus, from the general overview of the UART, we will extract out the desired
functionalities mainly of the transmitter, receiver and the baud-rate generator to develop the
Software Implementation of the UART as a serial data communication protocol required for
interfacing the FPGA-based AES processor with the PC.

4.20. Conclusion
In this chapter discuss about Hardware Implementation of project and description of the
each and every blocks in the block diagram.


5.1. Introduction
Any discussion of AES must begin with DES, the original Data Encryption Standard.
DES was selected as a Federal Information Processing Standard (FIPS) for the United States
in 1976. In 1977 the National Bureau of Standards (now the National Institute of Standards
and Technology, or NIST) adopted an IBM-designed cipher that encrypted 64-bit blocks
under 56- bit keys as the Data Encryption Standard (DES).

It became widely used

internationally in many commercial applications, including financial transactions. The

algorithm remained controversial because of suspicions that the National Security Agency
had introduced deliberate weaknesses. But with only 56 bits of key, DES is now obsolete. In
its place many people are now using triple-DES, a multiple version of an algorithm that does
not perform particularly well. In 1997, the NIST announced its desire to choose a successor
to DES that could no longer be considered secure because of its small key size and the
increased availability of computing power. So NIST announced a competition for an
Advanced Encryption Standard (AES), an algorithm with 128-bit blocks and 128-, 192-, and
256-bit keys to replace DES.

NIST sought a symmetric-key algorithm for sensitive,

unclassified information. The chosen algorithm would have to be available royalty-free

worldwide. Winners would get fame and gloryand probably a lot of consulting. And AES
would undoubtedly become one of the most widely used cryptographic algorithms in the
world. In 1998 twenty-one industry and academic groups offered candidates; fifteen met
NISTs submission criteria.


On October 2, 2000, NIST announced its choice for the Advanced Encryption
Standard: Rijndael (pronounced Rhine Dahl), an algorithm developed by two Belgian
cryptographers, Joan Daemen

and Vincent Rijmen.

Rijndael should appeal to

mathematicians; the cryptosystem is quite algebraic. Rijndael repeats rounds, with the
number of rounds determined by key size. In the 128-bit key version, Rijndael runs for 10
rounds. As specified in the call for algorithms, Rijndael operates on a 128-bit block of data. It
divides the block into sixteen 8-bit bytes and treats these as elements of GF(28), defined by
the polynomial x8 + x4 + x3 + x + 1, which is irreducible over Z/2Z. The data are placed in a
4 x4 array, and all operations occur on the bytes of the array. Each round consists of four
operations: one transforms the bytes, one transforms the rows, one transforms the columns,
and one adds in the key. First, each of the bytes is modified by maps easily described in the
arithmetic of GF (28): inversion (with zero mapped to itself) and an affine transformation;
then the rows of the array are shifted circularly, with the bytes of row i moving i - 1 locations
to the right. Next the bytes in each column are mixed by multiplication: view the column
elements as coefficients of a polynomial of degree 3, and multiply this polynomial by


+ x2 + x + 2 modulo x4 + 1. The last operation is an XOR of the key bits with the elements of
the array.
The polynomials used for the field arithmetic were determined by two criteria: (a)
arithmetic efficiency and (b) resistance to cryptanalytic attack. Though DES was first cracked
by brute-force attack that searched the entire key space, linear and differential cryptanalysis
and weak keys are serious attacks on the security of the algorithm. Rijndaels multiplicative
map and affine transformation were chosen for their ability to resist these. The polynomial
3x3 + x2 + x + 2 was picked for its combination of fast multiplication and diffusion power.
(Diffusion is spreading changes in key or text bits into the cipher text.) NISTs evaluation
used published research from academic and industry experts and private advice from the
National Security Agency (NSA). NIST based its decision on security, efficiency, and
algorithm and implementation characteristics (including hardware and software suitability
and simplicity). Security is difficult to assess. The breaking of an algorithm is clear, but there
are no proofs of security, only proofs that an algorithm passes the tests we currently know to
perform. By contrast, results of efficiency tests, even though only using current technology,
provide more definitive information. Efficiency tests were conducted in a variety of venues,
including fast implementations in C++, Java, assembler code, FPGAs (Field Programmable
Gate Arrays) and ASICs (Application Specific Integrated Circuits).

All finalists were fine on these measures, but some were finer than others. Why did
NIST pick Rijndael? NIST judged the submission to be the best overall algorithm for the
AESRijndaels combination of security, performance, efficiency, implements ability, and
flexibility make it an appropriate selection for the AES. Rijndaels cryptographic complexity
rests on several well-studied cryptographic transformations, and the algorithm is easy to
describe. The algorithm performs efficiently on a variety of platforms (NIST noted that it was
a good performer in hardware and software across a wide range of computing
environments), and the algorithm is relatively easy to defend against power and timing
attacks. There were some comments that the polynomials chosen for Rijndaels primitives
might lead to breaks. But GF (2n) is a field that NSA knows well, and it is fair to assume that
Rijndael passed NSAs tests. Many of the finest minds in the field submitted candidates, and
the candidate algorithms were widely reviewed, criticized, and discussed by experts around
the world. As a result, AES is considered to be a high quality and trustworthy solution for
data encryption. AES became a government standard in 2002. In 2003, the U.S. Government
approved AES for use with classified information. Today, it is one of the most popular
algorithms used in symmetric key cryptography.

5.2. AES FIPS-197-Algorithm

5.2.1. Introduction
This standard specifies the Rijndaelalgorithm ([3] and [4]), a symmetric block cipher that can
process data blocksof 128 bits, using cipher keys with lengths of 128, 192, and 256bits.
Rijndael was designed to handle additional block sizes and key lengths, however they are not
adopted in this standard. Throughout the remainder of this standard, the algorithm specified
herein will be referred to as the AES algorithm. The algorithm may be used with the three
different key lengths indicated above, and therefore these different flavors may be referred
to as AES-128, AES-192, and AES-256. This specification includes the following

Definitions of terms, acronyms, and algorithm parameters, symbols, and functions;

Notation and conventions used in the algorithm specification, including the ordering
and numbering of bits, bytes, and words;

Mathematical properties that are useful in understanding the algorithm;

Algorithm specification, covering the key expansion, encryption, and decryption


Implementation issues, such as key length support, keying restrictions, and additional
block/key/round sizes.

5.2.2. Definitions
1) Glossary of Terms and Acronyms
The following definitions are used throughout this standard:

AES: Advanced Encryption Standard

Affine Transformation: A transformation consisting of multiplication by a matrix

followed by the addition of a vector.

Array: An enumerated collection of identical entities (e.g., an array of bytes).

Bit: A binary digit having a value of 0 or 1.

Block: Sequence of binary bits that comprise the input, output, State, and Round
Key. The length of a sequence is the number of bits it contains. Blocks are also
interpreted as arrays of bytes.

Byte: A group of eight bits that is treated either as a single entity or as an array of 8
individual bits.

Cipher: Series of transformations that converts plaintext to ciphertext using the

Cipher Key. Cipher Key Secret, cryptographic key that is used by the Key Expansion
routine to generate a set of Round Keys; can be pictured as a rectangular array of
bytes, having four rows and Nkcolumns.

Ciphertext: Data output from the Cipher or input to the Inverse Cipher.

Inverse Cipher: Series of transformations that converts ciphertext to plaintext using

the Cipher Key.

Key Expansion: Routine used to generate series of Round Keys from the Cipher Key.

Plaintext: Data input to the Cipher or output from the Inverse Cipher.

Rijndael: Cryptographic algorithm specified in this Advanced Encryption Standard.

Round Key: Round keys are values derived from the Cipher Key using the Key
Expansion routine; they are applied to the State in the Cipher and Inverse Cipher.

State: Intermediate Cipher result that can be pictured as a rectangular array of bytes,
having four rows and Nbcolumns.

S-box: Non-linear substitution table used in several byte substitution transformations

and in the Key Expansion routine to perform a one-for-one substitution of a byte

Word: A group of 32 bits that is treated either as a single entity or as an array of 4


2) Mathematical Preliminaries: All bytes in the AES algorithm are interpreted as finite field
elements using the notation introduced in Sec. Finite field elements can be added and
multiplied, but these operations are different from those used for numbers. The following
subsections introduce the basic mathematical concepts needed for Sec. 2.2.5.
3)Addition: The addition of two elements in a finite field is achieved by adding the
coefficients for the corresponding powers in the polynomials for the two elements.
For example, the following expressions are equivalent to one another:

4) Multiplication: In the polynomial representation, multiplication in GF(28) corresponds

with the multiplication of polynomials modulo an irreducible polynomial of degree 8. A
polynomial is irreducible if its only divisors are one and itself. For the AES algorithm, this
irreducible polynomial is given by

Eq. 5.1.
It is also represented by {01}{1b} in hexadecimal notation. For example, {57} {83} = {c1},
because of the operations as shown:

The modular reduction by m(x) ensures that the result will be a binary polynomial of
degree less than 8, and thus can be represented by a byte. Unlike addition, there is no simple
operation at the byte level that corresponds to this multiplication. The multiplication defined
above is associative, and the element {01} is the multiplicative identity.

For any non-zero binary polynomial b(x) of degree less than 8, the multiplicative
inverse of b(x), denoted b-1(x), can be found as follows: the extended Euclidean algorithm is
used to compute polynomials a(x) and c(x) such that
It follows that the set of 256 possible byte values, with XOR used as addition and the
multiplication defined as above, has the structure of the finite field GF(2^8).
5) Multiplication by x: Multiplying the binary polynomial defined in Equation.) with the
polynomial x results in
Eq. 5.4

The resultb(x) is obtained by reducing the above result modulo m(x), as defined in
equation ( If b7 = 0, the result is already in reduced form. If b7 = 1, the reduction is
accomplished by subtracting (i.e., XORing) the polynomial m(x). It follows that
multiplication by x (i.e., {00000010} or {02}) can be implemented at the byte level as a left
shift and a subsequent conditional bitwise XOR with {1b}. This operation on bytes is denoted
by xtime(). Multiplication by higher powers of x can be implemented by repeated application
of xtime(). By adding intermediate results, multiplication by any constant can be
6) Polynomials with Coefficients in GF(28)
Four-term polynomials can be defined - with coefficients that are finite field elements - as:

which will be denoted as a word in the form [a0 , a1 , a2 , a3 ]. Note that the
polynomials in this section behave somewhat differently than the polynomials used in the
definition of finite field elements, even though both types of polynomials use the same
indeterminate, x. The coefficients in this section are themselves finite field elements, i.e.,
bytes, instead of bits; also, the multiplication of four-term polynomials uses a different
reduction polynomial, defined below. The distinction should always be clear from the
To illustrate the addition and multiplication operations, let
Eq. 5.6

define a second four-term polynomial. Addition is performed by adding the finite field
coefficients of like powers of x. This addition corresponds to an XOR operation between the
corresponding bytes in each of the words in other words, the XOR of the complete word


values. Thus, using the equations of (5.5) and (5.6),

Eq. 5.7
Multiplication is achieved in two steps. In the first step, the polynomial product c(x) = a(x)
b(x) is algebraically expanded, and like powers are collected to give
Eq. 5.8

Eq. 5.9
The result, c(x), does not represent a four-byte word. Therefore, the second step of the
multiplication is to reduce c(x) modulo a polynomial of degree 4; the result can be reduced to
a polynomial of degree less than 4. For the AES algorithm, this is accomplished with the
polynomial x4 + 1, so that
The modular product of a(x) and b(x), denoted by a(x) b(x), is given by the four-term
polynomial d(x), defined as follows:
Eq. 5.11



When a(x) is a fixed polynomial, the operation in Eq. can be written in matrix
form as:

Because x^41 is not an irreducible polynomial over GF(2^8), multiplication by a
fixed four-term polynomial is not necessarily invertible.
However, the AES algorithm specifies a fixed four-term polynomial that does have an

Eqs. 5.14, 5.15

Another polynomial used in the AES algorithm (see the RotWord() function) has a0=
a1 = a2 = {00} and a3 = {01}, which is the polynomial x3. Inspection of equation (
above will show that its effect is to form the output word by rotating bytes in the input word.
This means that [b0, b1, b2, b3] is transformed into [b1, b2, b3, b0].
7) AES Algorithm-Block Overview and Specification:
For the AES algorithm, the length of the input block, the output block and the State is
128 bits. This is represented by Nb= 4, which reflects the number of 32-bit words (number of
columns) in the State. For the AES algorithm, the length of the Cipher Key, K, is 128, 192, or
256 bits. The key length is represented by Nk= 4, 6, or 8, which reflects the number of 32-bit
words (number of columns) in the Cipher Key. For the AES algorithm, the number of rounds
to be performed during the execution of the algorithm is dependent on the key size. The
number of rounds is represented by Nr, where Nr = 10 when Nk= 4, Nr = 12 when Nk= 6,
and Nr = 14 when Nk= 8. The only Key-Block-Round combinations that conform to this
standard are given in Fig. 4. For implementation issues relating to the key length, block size
and number of rounds,


Table 5.1: key block round combinations

The various operational blocks required and the state flow in our design consideration
of the AES-128 algorithm is shown here:

Fig: 5.1: Pseudo code for the Cipher

8)Sub Bytes()Transformation: The SubBytes() transformation is a non-linear byte
substitution that operates independently on each byte of the State using a substitution table
(S-box). This S-box (Fig. 7), which is invertible, is constructed by composing two
1. Take the multiplicative inverse in the finite field GF(2^8), the element {00} is
mapped to itself.
2. Apply the following affine transformation (over GF(2) ):


Eq. 5.17
for,0 -i-8 where bi is the ith bit of the byte, and ci is the ith bit of a byte c with the
value {63} or {01100011}. Here and elsewhere, a prime on a variable indicates that the
variable is to be updated with the value on the right.
In matrix form, the affine transformation element of the S-box can be expressed as:
The S-box used in the SubBytes() transformation is presented in hexadecimal form in Fig. 7.
For example, if S1,1={53}, then the substitution value would be determined by the intersection
of the row with index 5 and the column with index 3.

Fig 5.2: Substitution Values for the byte xy (in hexadecimal format)
9) ShiftRows() Transformation: In the ShiftRows()transformation, the bytes in the last three
rows of the State are cyclically shifted over different numbers of bytes (offsets). The first
row, r = 0, is not shifted. Specifically, the ShiftRows() transformation proceeds as follows:
Eq. 5.18
where the shift value shift (r, Nb) depends on the row number, r, as follows
(recall that Nb= 4):
shift(1,4) =1; shift(2,4) = 2 ; shift(3,4) = 3 .

Eq. (5.19)

This has the effect of moving bytes to lower positions in the row (i.e., lower values
of c in a given row), while the lowest bytes wrap around into the top of the row (i.e.,
higher values of c in a given row). Figure 8 illustrates the ShiftRows()transformation.

Fig 5.3.shift Rows () cyclically shifts the last three rows in the state
10) MixColumns() Transformation: The MixColumns() transformation operates on the
State column-by-column, treating each column as a four-term polynomial as described. The
columns are considered as polynomials over GF(28) and multiplied modulo x4 + 1 with a
fixed polynomial a(x), given by,

11). AddRoundKey() Transformation: In the AddRoundKey() transformation, a Round Key
is added to the State by a simple bitwise XOR operation. Each Round Key consists of
Nbwords from the key schedule Those Nbwords are each added into the columns of the
State, such that:
where [wi] are the key schedule words described in Sec., and round is a value in the
range 0<= round <= Nr. In the Cipher, the initial Round Key addition occurs when round= 0,
prior to the first application of the round function (see Fig. 5). The application of the
AddRoundKey() transformation to the Nr rounds of the Cipher occurs when 1<= round <=
Nr. The action of this transformation is illustrated in Fig. 10, where l = round * Nb. The byte
address within words of the key schedule was described in.


12) Key Expansion: The AES algorithm takes the Cipher Key, K, and performs a Key
Expansion routine to generate a key schedule. The Key Expansion generates a total of Nb(Nr
+ 1) words: the algorithm requires an initial set of Nbwords, and each of the Nr rounds
requires Nbwords of key data. The resulting key schedule consists of a linear array of 4-byte
words, denoted [wi], with i in the range 0 <= i <Nb(Nr + 1). The expansion of the input key
into the key schedule proceeds according to the pseudo code.SubWord() is a function that
takes a four-byte input word and applies the S-box to each of the four bytes to produce an
output word. The function RotWord() takes a word [a0,a1,a2,a3] as input, performs a cyclic
permutation, and returns the word [a1,a2,a3,a0]. The round constant word array, Rcon[i],
contains the values given by [x^(i-1),{00},{00},{00}], with x^(i-1) being powers of x (x is
denoted as {02}) in the field GF(28), as discussed. (note that i starts at 1, not 0).From Fig. 11,
it can be seen that the first Nkwords of the expanded key are filled with the Cipher Key.
Every following word, w[i], is equal to the XOR of the previous word, w[i-1], and the word
Nkpositions earlier, w[i-Nk]. For words in positions that are a multiple of Nk, a
transformation is applied to w[i-1] prior to the XOR, followed by an XOR with a round
constant, Rcon[i]. This transformation consists of a cyclic shift of the bytes in a word
(RotWord()), followed by the application of a table lookup to all four bytes of the word
(SubWord()). It is important to note that the Key Expansion routine for 256-bit Cipher Keys
(Nk= 8) is slightly different than for 128- and 192-bit Cipher Keys. If Nk= 8 and i-4 is a
multiple of Nk, then SubWord() is applied to w[i-1] prior to the XOR.
13) Decryption ( Inverse Cipher Generation): The Cipher transformations in Section. can
be inverted and then implemented in reverse order to produce a straightforward Inverse
Cipher for the AES algorithm. The individual transformations used in the InverseCipher
InvShiftRows(),InvSubBytes(), InvMixColumns(), and AddRoundKey() process the State
and are described in the following subsections.The Inverse Cipher is described in the pseudo
code in Fig. 12. In Fig. 12, the array contains the key schedule, which was described
previously in
14.a. InvShiftRows() Transformation: InvShiftRows() is the inverse of the ShiftRows()
transformation. The bytes in the last three rows of the State are cyclically shifted over
different numbers of bytes (offsets). The first row, r = 0, is not shifted. The bottom three rows
are cyclically shifted by Nb- shift(r, Nb) bytes, where the shift value shift (r,Nb) depends on
the row number, and is given in equation.
Specifically, the InvShiftRows() transformation proceeds as follows:


Figure 13 illustrates the InvShiftRows() transformation.

14.b)InvSubBytes() Transformation: InvSubBytes() is the inverse of the byte substitution

transformation, in which the inverse Sbox is applied to each byte of the State. This is
obtained by applying the inverse of the affine transformation followed by taking the
multiplicative inverse in GF (28).
The inverse S-box used in the InvSubBytes() transformation is presented in Fig. 14:

Fig 5.4: Inverse S- box : Substitution values








MixColumns() transformation. InvMixColumns() operates on the State column-by-column,

treating each column as a four-term polynomial as described in Sec. 4.3. The columns are
considered as polynomials over GF(28) and multiplied modulo x^4 + 1 with a fixed
polynomial a-1(x), given by

As described in Sec., this can be written as a matrix multiplication.

Eq. 5.15

As a result of this multiplication, the four bytes in a column are replaced by the following:

16) Inverse of the AddRoundKey() Transformation:

AddRoundKey(), which was

described in Sec., is its own inverse, since it only involves an application of the
XOR operation.
17)Equivalent Inverse Cipherthe adopted method to improve speed of Operation:

In the straightforward Inverse Cipher presented in Sec. and Fig. 12, the sequence of
the transformations differs from that of the Cipher, while the form of the key schedules for
encryption and decryption remains the same. However, several properties of the AES
algorithm allow for an Equivalent Inverse Cipher that has the same sequence of
transformations as the Cipher (with the transformations replaced by their inverses). This is
accomplished with a change in the key schedule.
The two properties that allow for this Equivalent Inverse Cipher are as follows:
1. The SubBytes() and ShiftRows() transformations commute; that is, a SubBytes()
transformation immediately followed by a ShiftRows() transformation is equivalent
to a ShiftRows() transformation immediately followed buy a SubBytes()
transformation. The same is true for their inverses, InvSubBytes() and InvShiftRows.
2. The column mixing operations MixColumns() and InvMixColumns() are linear
with respect to the column input, which means
InvMixColumns (stateXORRoundKey)=InvMixColumns(state) XOR
InvMixColumns(Round Key).











transformations to be reversed. The order of the AddRoundKey() and InvMixColumns()

transformations can also be reversed, provided that the columns (words) of the decryption
key schedule are modified using the InvMixColumns() transformation.

5.3. Implementation Issues

Key Length Requirements: An implementation of the AES algorithm shall support at

least one of the three key lengths specified in Sec. 5: 128, 192, or 256 bits (i.e., Nk= 4, 6,
or 8, respectively). Implementations may optionally support two or three key lengths,
which may promote the interoperability of algorithm implementations.

Keying Restrictions: No weak or semi-weak keys have been identified for the AES
algorithm, and there is no restriction on key selection.

Parameterization of Key Length, Block Size, and Round Number: This standard
explicitly defines the allowed values for the key length (Nk), block size (Nb), and number
of rounds (Nr) see Fig. 4. However, future reaffirmations of this standard could include
changes or additions to the allowed values for those parameters. Therefore, implementers
may choose to design their AES implementations with future flexibility in mind.

Implementation Suggestions Regarding Various Platforms: Implementation variations

are possible that may, in many cases, offer performance or other advantages. However,
given the same input key and data (plaintext or ciphertext), any implementation that
produces the same output (ciphertext or plaintext) as the algorithm specified in this
standard is an acceptable implementation of the AES. Thus, in this project, unlike the
usual implementation of Inverse Cipher, we have adopted the Equivalent Inverse Cipher
because of its potential advantages of reuse Gate-level implementation. We also propose
to merge, at least to some extent Subbytes()

and Shiftrows() transformations at

encryption as well as Invsubbytes() and Invshiftrows() at decryption by effectively

eliminating the two steps, otherwise required, first to convert the byte operation into a
word operation and next to apply the shift operation. This is done using a single operation







transformations, designing the algorithm using component reuse/calling technique

effectively eliminates the otherwise complex and even the most tedious mathematical
operations that would be required.


5.4 Advantages and Limitations of AES algorithm

In most ciphers, the iterated transform (or round) usually has a Feistel Structure.
Typically in this structure, some of the bits of the intermediate state are transposed unchanged
to another position (permutation). The major advantage of the AES algorithm is that it does
not have a Feistel structure but is composed of three distinct invertible transforms based on
the Wide Trial Strategy design method. The Wide Trial Strategy design method provides
resistance against linear and differential cryptanalysis. In the Wide Trail Strategy, every layer
has its own function:

The linear mixing layer: guarantees high diffusion over multiply rounds

The non-linear layer: parallel application of S-boxes that have the optimum worstcase non-linearity properties.

The key addition layer: a simple XOR of the round key to the intermediate state

5.4.1. Advantageous Features

Key lengths of 128, 192, and 256 bits are supported. Each step in key size requires
only two additional rounds. The decipher is simply, the inverse of the cipher.

Effective and easier implementations both in Software and Hardware approaches.

Easier design and reduction in number of additional instructions and efficient

utilization rate because of similar algorithms adopted for both encryption and
decryption only with an additional timing.

There are no weak or semi-weak keys in the Advanced Encryption Algorithm.

By using a true low level bit-serial approach, minimum cost AES co-processor
architecture can be achieved. This architecture can be used in many military,
industrial, and commercial applications that require compactness and low cost.

It has much higher strength of the key security as compared to that of the asymmetric
key cryptographic methods such as RSA, Elliptical Curve Cryptography.

It is more resistant to theoretical attacks such as linear and differential crypt analysis
and weak keys. And also resistant to various attacks on implementations such as
timing and power attacks.

It occupies minimum space due its inherent properties of modularity, regularity and
availability that greatly helps in instruction level parallelism potentialities.


Fig.5.5. Features of AES candidate algorithm

Fig.5.6. Comparison of key sizes in conventional and public key cryptography

5.4.2. Cryptanalytic progress against AES:
No effective breaks affecting the AES algorithm yet because finding a secret key is
computationally infeasible. It may be largely attributed to the following considerations:

The complexity of the sequence of operations and the operations themselves

performed in the algorithm, that too for a large number of iterations and

The complexity of the order of the key-space.

Then the solution to find a secret key would be a function of its key length n (say),

then the number of operations required would be a function, O (2^n). Then one can hardly
imagine the exhaustive search that may find the secret key required in the 128-, 192-, or 256bit key spaces. For a chosen 128-bit key space, the effort required would be 2^128, which is
a magnificent 3 x 10 E 38. Then even with an approximately trillion number of chips that
would operate at 1000GHz frequency, it would take at least a million years to exhaustively
search a 128-bit key space and hence, one need not again say of the next higher 192-bit or
256-bit key space strength. The analysis figure below would represent the rough estimation of
finding a secret key from AES algorithm.


Table:5.2. Analysis of the effort needed to break AES ALGORITHM

And the storage requirement to allocate such huge number of encryption and
decryption operations (to construct the two tables in order to assist in searching the required
secret key) for the key space would also be analogously a large amount. Thus, if we are ready
to afford these enormous costs and the unimaginably large electric bills particularly meeting
the above said conditions, at least for a million years continuously, perhaps we may break the
secret key!!!
5.4.3. Limitations and the possible attacks:
The main limitation of the Advanced Encryption Algorithm which is a major
development in symmetric key algorithms would be same as that of the major drawback of
the conventional cryptography that is the distribution of the secret key between the two
communicating parties without the third-party intervention would be the major weak link.
No matter how strong a cryptosystem would be, if an intruder could steal the key at least
while communicating through the weak channel, the whole system would render useless. So,
it has to take advantage of the public key algorithms at least for the purpose of safe keydistribution through the channel.
Another major offset is that AES is quite susceptible to the new type attack on the
cache behavior, if implemented in a Microprocessor/ DSP-based processor. If the attacker
can access the machine where AES runs, secret key can be retrieved in a fraction of a second.
Perhaps this type of attack can be minimized in our present idea of implementation through
the programmable logic devices such as FPGAs, CPLDs, ASICs which would act as virtual
processors that completely minimizes the burden on the actual processors.


5.5. The Security of AES and the future trend

Some cryptographers still have concerns about the security of AES. A common attack
on block ciphers is to attack the algorithm with a reduced number of rounds. At the time of
this writing, attacks on AES exist for seven rounds with 128-bit keys, eight rounds with 192bit keys, and nine rounds with 256-bit keys. Recall that the full implementation of AES uses
10, 12, and 14 rounds with 128-, 192-, and 256-bit keys, respectively. There is concern that
there is not enough distance between the attack for a seven-round encryption and the actual
ten-round implementation and that there is a risk these attacks could be improved to break the
cipher. Another worry results from the mathematical structure of AES. In contrast to most
ciphers, AES has a concise and elegant algebraic structure. There is concern among some
cryptographers that an attack based on new insights into this formulation could be successful.
AES appears to be secure as of the work done in late 2006 during our project working
period. The largest well-known brute force attack occurred in 2002 against a 64-bit RC5 key.
With a key size of at least 128 bits, AES is well out of reach of brute force attacks by normal
adversaries for years if not decades. AES is efficient, elegant, and secure. It will be a top
choice for data security in the next decade and beyond with this safety and visible security of
AES for many years together, there can be a tremendous growth in the development of low
cost and highly pipelined processors with minimum size requirements that would even suit
the ultimate requirements of the common man applications and the smaller organizational
needs ranging from smart cards to internal database locking and regulated distributed in the
organizations. The advanced implementations of AES and DES together such as Advanced
Parallel Encryption Standard (APES), Advanced Data Encryption Standard (ADES), etc.,
would be a practical possibility in all the security demands and network applications of secret
key cryptography.

5.6. Applications
Vendors of both hardware and software have enthusiastically adopted AES. Because
AES uses a simple and efficient algorithm, using it as an encryption specification decreases
system complexity, lowers costs, and promotes interoperability. There are many areas where
AES is now in commercial use.

1. Most high-end VPN software contains implementations of AES, including offerings

from Checkpoint, Cisco, and Symantec.
2. AES is now commonly found in Network Appliances.
3. Voice Over IP vendors are using AES for telephone security.
4. Vendors now use AES to provide security for process control (SCADA) systems.
5. AES has even been added to common file compression programs, such as WinZip.
6. Dozens of hardware implementations are available that use both FPGAs and ASICs.
7. There are multiple implementations in software in the public domain such as SmartCards security systems.
Thus, it seems that there is no security system and sensitive data transfer that cannot
use, rely and be operated upon the most efficient and highly structured algorithm Advanced
Encryption Standard.

5.7. Conclusion:
In This chapter Deals with Mathematical preliminaries and overview of project.


6.1. Introduction
The functional verification was carried out for all the test cases and hence the Xilinx
platform studio is taken to the synthesis process using the Xilinx tool.

6.2. Synthesis Process

The synthesis process will be carried out by giving the XPS model as the input to the
tool. This XPS modeling requires Spartan 3 board for the implementation. Hence the Spartan
board is selected and the whole process flow will be carried out in the Xilinx tool and finally
is generated which is used for dumping on the board

6.3. Xilinx Platform Studio Outputs


Fig 6.1 : HyperTerminal Encryption Output



Fig 6.2 : HyperTerminal Decryption Output

Fig 6.3 : XPS Synthesis report


7.1 Introduction
The main aim of the project is to provide security for the Encrypted and Decrypted
data. These algorithms can be used for many applications. They are as follows.

7.2 Applications
1. This standard may be used by Federal department and agencies when an agency
determines that sensitive (unclassified) information (as defined in P.L. 100-235)
require cryptographic protection.
2. Security purposes.
3. Medical field.
4. Network Security.
5. online bank security.
6. Secure video teleconferencing.
7. Routers and remote access servers
8. High speed ATM/ Ethernet/Fiber-channel switches.
9. In addition , This standard may be adopted and used by non-Federal Government

organizations. Such use is encouraged when it provides the desired security for
commercial and private organizations.

7.3. Advantages

Through AES, input message of length 128 bits can be encrypted which is more than

the DES and triple DES.


ASE has the various secret key lengths such as 128 bits, 192 bits and 256 bits,
Whereas DES and Triple DES have fixed length of 64 bits.
The cipher key is expanded into a larger key, which is later used for the actual

The expanded key shall Always be derived from the cipher and never be specified


AES is very hard to attack or crack when compared to DES.
AES will be faster when compared to the Triple DES.

7.4 Conclusion
The project work aims at implementing the secure data communication between any
two users based on the realization of advanced Symmetric-key Cryptographic algorithm
called Advanced Encryption Standard (AES) on an FPGA based processor.Basically, starting
with the selection of highly-structured and immensely secure Advanced Encryption Standard
Algorithm, and making suitable modifications in the AES algorithm to improve the Speed
and the Parallelism of instruction execution, which is designed selectively in a superior
Description Language System C, simulated with a powerful debugging tool from Hyper
terminal, Spartan 3 EDK kit, and then synthesized in Xilinx Platform Studio with Speed as an
optimization goal aimed at reducing the unrelated logic and improving the maximum clockrate particularly targeted on a low cost, high speed and highly efficient architectural FPGA
chip SPARTAN-III-EDK using the low cost and Graphical User- Friendly (GUI)
configuration tool from SANDS, FPGA/CPLD Development Platform Software v 1.1, we
have ultimately achieved the proven tremendous performance and cost-effective parameters
of the hardware implementation of the Advanced Encryption Algorithm (AES) that suits the
greatest security demands from a wide variety of users and applications.
So, In future, there is a definite hope of vast utilization of the improved versions of
AES processors such as APES and ADES, wherein we may witness much greater security
due to increased key length as well as bit length and the enormous speeds of even the bulk
encryption/decryption achieved by employing sophisticated parallel execution schemes.

7.5. Future scenario and suggestions

The possibility of future scope in this domain and the implementation suggestions can
be directly mentioned by the Scope and Suggestion statements as follows:
1. New Algorithms and Improved Speed: The existing algorithms AES and DES could be
modified into Advanced Parallel Encryption Standard (APES) and Advanced Data
Encryption Standard (ADES) respectively by increasing the key length and bit length. By
this the probability method of predicting the data can be much dramatically increased


even compared with AES algorithm. Even the Speed of the bulk encryption/decryption
can be improved because of the Parallel Schemes employed.
2. Improvement in security: The probability of cracking the key becomes much less and
hence, the transmitted data will be more secure. Improvement in security may further be
possible by completely eliminating not only the precise timing attacks but also all the rest
of the side-channel attacks.
3. Improvements in FPGA and EDA tools: Modified algorithms would demand
implementations increasingly in FPGA rather than the DSP domain due to the further
possible growth in the fast processing, low power consumption and reduced size of VLSI
and evolution of the powerful EDA tools to implement.



[1] S. Sau , C. Pal and A Chakrabarti Design and Implementation of Real Time Secured
RS232 Link for Multiple FPGA Communication, Proc. Of International Conference on
Communication, Computing & Security,2011, ISBN - 978- 1-4503-0464- 1.
[2] C. D. Walter. August 1999. Montgomery's Multiplication Technique: How to Make It
Smaller and Faster. Cryptographic Hardware and Embedded Systems, Lecture Notes in
Computer Science, Springer.No. 17 17. pp. 80-93.
[3] A Mazzeo, L. Romano, G. P. Saggese and N. Mazzocca. 2003. FPGABased
Implementation of a Serial RSA Processor. Design. Proceedings of the conference on
Design, Automation and Test in Europe - Volume I. ISBN:O- 7695- 1870-2 .
[4] xilkernel_v3.00.pdf on www.xilinx .com.
[5] R. L. Rivest et al. 1978. A Method for Obtaining Digital Signatures and Public-Key
Cryptosystems. Communications of the ACM. Vol. 2 1. pp. 120- 126.
[6] Cryptography & Network Security ByBehrouzAForouzan.
[7] Montgomery Algorithm for Modular Multiplication Professor Dr. D. J. Guan ,August
25, 2003.
[8] RSA & Public Key Cryptography in FPGAs, John Fry, Martin Langhammer Altera
Corporation - Europe
[9] A. Tenca, C. Koc. 1999. A Scalable Architecture for Montgomery Multiplication.
Cryptographic Hardware and Embedded Systems, Lecture Notes in Computer Science, No.
17 17, pp. 94- 108.
[10]. A. Tenca, G. Todorov, C. Koc. May 200 1.High-radix design of a scalable modular
multiplier. Cryptographic Hardware and Embedded Systems, Lecture Notes in Computer
Science, Springer. No. 2 162.pp. 185- 20 1. [II] High-Speed RSA Implementation, Cetin
Kaya Koc, November 1994, Version 2.0, ftp://ftp.rsa.comlpub/pdfs/tr20I.pdf.
[ 12] ] http://csrc.nist.gov/publications/fips/fipsI97Ifips-197.pdf.
[ 13] http://www.design-reuse.comlarticlesIl398 1 /fpga-implementation-ofaes- encryptio nand-decryptio n. html.
[ 14]B. Schneier. 1996. Applied Cryptography, Protocols, Algorithms, and Source Code in
C, John Wiley and Sons Inc. 2nd Edition. New York, U.S.A.
[ 15] G.B. Arfken, D.F. Griffing , D.C. Kelly and J priest. University Physics San Diego, CA
Harcourt Brace, Jovanovich Publishers , 1989.
[ 16] http://www.techmaish.comlmaximum-internet-speed-available-in-theworld/.
[ 17] D. E. knuth , The Art of Computer Programming Seminumeritical Algorithm, Volume
2, Reading M.A. : Addison Wasley, Second Edition, 198 1.


[ 18] Qing Li , Caroline Yao "Real-Time Concepts for Embedded Systems".

[ 19] Tran Nguyen BaoAnh*t, Su-Lim Tant Survey and performance evaluation of real-time
operating systems (RTOS) for small microcontrollers", *Renesas Technology Singa pore,
Singapore Engineering Centre, Singapore 098632, t Sc h o ol of Computer Engineering,
Nanyang Technological University, Singapore 639708.
[20] Awais M. Kamboh, Adithya H. Krishnamurthy and Jaya Krishna K. Vallabhaneni
"Demonstration of Multitasking using ThreadX RTOS on Microblaze and PowerPC"
[21] Operating system for Xilinx embedded processor" at http://www.em.avnet.com.
[22] SaratYoowattana, ChinnapatNantajiwakornchai, ManasSangworasil "A Design of
Embedded DMX5 12 Controller using FPGA and XILKernel" ,2009 IEEE Symposium on
Industrial Electronics and Applications (lSIEA 2009), October 4-6, 2009, Kuala Lumpur,
[23] M. Ibrahimy, M.B.Reaz, K.Asaduzzaman and S.Hussain. 2007. FPGA Implementation
of RSA Encryption Engine with Flexible Key Size. International Journal of
Communications.Issue 3.Volume I.

9.1. Book References:

1. Computer Networks, Third Edition, By Andrew S. Tanenbaum;

2. Cryptography and Network Security- Principles and Practices, Third Edition, By
William Stallings;
3. Digital Design- Principles and Practices, Third Edited Updated, By John F. Wakerly;
4. A VHDL Primer, Third Edition, By J. Bhasker;
5. Basic VLSI Design, Third Edition, By Pucknell&Eshraghian;

9.2. Web References:

1. AES FIPS-197 page available via http://www.nist.gov/CryptoToolkit;
2. Computer Security Objects Register (CSOR): http://csrc.nist.gov/csor/;
3. J. Daemen and V. Rijmen, AES Proposal: Rijndael, AES Algorithm Submission,
September 3, 1999, available at [1].
4. Xilinx tools and datasheets on FPGAs from www.xilinx.com;
5. www.giac.org;
6. www.tldp.org;
7. e-books on VHDL from www.aldec.com/downloads;
8. www.latticesemiconductor.com;

#include <stdio.h>

#include <string.h>
#define MAXBC


#define MAXKC


#define MAXROUNDS 14
#define SC

((BC - 4) >> 1)

typedef unsigned intuint ;

typedef unsigned char word8;
typedef unsigned short word16;
typedef unsigned long word32;
main_aes(uintfirst_key[],uintdatain[],intkey_bits,intblock_bits,intenc_dec,uintdataout[]) ;
intrijndaelKeySched (word8 k[4][MAXKC], intkeyBits, intblockBits,
intrijndaelEncrypt (word8 a[4][MAXBC], intkeyBits, intblockBits,
intrijndaelDecrypt (word8 a[4][MAXBC], intkeyBits, intblockBits,
voidprint_result(uint temp[], intlen) ;
static word8 shifts[3][4][2] = {
0, 0,
1, 3,
2, 2,
3, 1,
0, 0,
1, 5,
2, 4,
3, 3,
0, 0,
1, 7,

3, 5,
4, 4
word8Logtable[256] = {
0, 0, 25, 1, 50, 2, 26, 198, 75, 199, 27, 104, 51, 238, 223, 3,
100, 4, 224, 14, 52, 141, 129, 239, 76, 113, 8, 200, 248, 105, 28, 193,
125, 194, 29, 181, 249, 185, 39, 106, 77, 228, 166, 114, 154, 201, 9, 120,
101, 47, 138, 5, 33, 15, 225, 36, 18, 240, 130, 69, 53, 147, 218, 142,
150, 143, 219, 189, 54, 208, 206, 148, 19, 92, 210, 241, 64, 70, 131, 56,
102, 221, 253, 48, 191, 6, 139, 98, 179, 37, 226, 152, 34, 136, 145, 16,
126, 110, 72, 195, 163, 182, 30, 66, 58, 107, 40, 84, 250, 133, 61, 186,
43, 121, 10, 21, 155, 159, 94, 202, 78, 212, 172, 229, 243, 115, 167, 87,
175, 88, 168, 80, 244, 234, 214, 116, 79, 174, 233, 213, 231, 230, 173, 232,
44, 215, 117, 122, 235, 22, 11, 245, 89, 203, 95, 176, 156, 169, 81, 160,
127, 12, 246, 111, 23, 196, 73, 236, 216, 67, 31, 45, 164, 118, 123, 183,
204, 187, 62, 90, 251, 96, 177, 134, 59, 82, 161, 108, 170, 85, 41, 157,
151, 178, 135, 144, 97, 190, 220, 252, 188, 149, 207, 205, 55, 63, 91, 209,
83, 57, 132, 60, 65, 162, 109, 71, 20, 42, 158, 93, 86, 242, 211, 171,
68, 17, 146, 217, 35, 32, 46, 137, 180, 124, 184, 38, 119, 153, 227, 165,
103, 74, 237, 222, 197, 49, 254, 24, 13, 99, 140, 128, 192, 247, 112, 7,
word8Alogtable[256] = {
1, 3, 5, 15, 17, 51, 85, 255, 26, 46, 114, 150, 161, 248, 19, 53,
95, 225, 56, 72, 216, 115, 149, 164, 247, 2, 6, 10, 30, 34, 102, 170,
229, 52, 92, 228, 55, 89, 235, 38, 106, 190, 217, 112, 144, 171, 230, 49,
83, 245, 4, 12, 20, 60, 68, 204, 79, 209, 104, 184, 211, 110, 178, 205,
76, 212, 103, 169, 224, 59, 77, 215, 98, 166, 241, 8, 24, 40, 120, 136,
131, 158, 185, 208, 107, 189, 220, 127, 129, 152, 179, 206, 73, 219, 118, 154,
181, 196, 87, 249, 16, 48, 80, 240, 11, 29, 39, 105, 187, 214, 97, 163,
254, 25, 43, 125, 135, 146, 173, 236, 47, 113, 147, 174, 233, 32, 96, 160,
251, 22, 58, 78, 210, 109, 183, 194, 93, 231, 50, 86, 250, 21, 63, 65,
195, 94, 226, 61, 71, 201, 64, 192, 91, 237, 44, 116, 156, 191, 218, 117,

159, 186, 213, 100, 172, 239, 42, 126, 130, 157, 188, 223, 122, 142, 137, 128,
155, 182, 193, 88, 232, 35, 101, 175, 234, 37, 111, 177, 200, 67, 197, 84,
252, 31, 33, 99, 165, 244, 7, 9, 27, 45, 119, 153, 176, 203, 70, 202,
69, 207, 74, 222, 121, 139, 134, 145, 168, 227, 62, 66, 198, 81, 243, 14,
18, 54, 90, 238, 41, 123, 141, 140, 143, 138, 133, 148, 167, 242, 13, 23,
57, 75, 221, 124, 132, 151, 162, 253, 28, 36, 108, 180, 199, 82, 246, 1,
word8 S[256] = {
99, 124, 119, 123, 242, 107, 111, 197, 48, 1, 103, 43, 254, 215, 171, 118,
202, 130, 201, 125, 250, 89, 71, 240, 173, 212, 162, 175, 156, 164, 114, 192,
183, 253, 147, 38, 54, 63, 247, 204, 52, 165, 229, 241, 113, 216, 49, 21,
4, 199, 35, 195, 24, 150, 5, 154, 7, 18, 128, 226, 235, 39, 178, 117,
9, 131, 44, 26, 27, 110, 90, 160, 82, 59, 214, 179, 41, 227, 47, 132,
83, 209, 0, 237, 32, 252, 177, 91, 106, 203, 190, 57, 74, 76, 88, 207,
208, 239, 170, 251, 67, 77, 51, 133, 69, 249, 2, 127, 80, 60, 159, 168,
81, 163, 64, 143, 146, 157, 56, 245, 188, 182, 218, 33, 16, 255, 243, 210,
205, 12, 19, 236, 95, 151, 68, 23, 196, 167, 126, 61, 100, 93, 25, 115,
96, 129, 79, 220, 34, 42, 144, 136, 70, 238, 184, 20, 222, 94, 11, 219,
224, 50, 58, 10, 73, 6, 36, 92, 194, 211, 172, 98, 145, 149, 228, 121,
231, 200, 55, 109, 141, 213, 78, 169, 108, 86, 244, 234, 101, 122, 174, 8,
186, 120, 37, 46, 28, 166, 180, 198, 232, 221, 116, 31, 75, 189, 139, 138,
112, 62, 181, 102, 72, 3, 246, 14, 97, 53, 87, 185, 134, 193, 29, 158,
225, 248, 152, 17, 105, 217, 142, 148, 155, 30, 135, 233, 206, 85, 40, 223,
140, 161, 137, 13, 191, 230, 66, 104, 65, 153, 45, 15, 176, 84, 187, 22,
word8 Si[256] = {
82, 9, 106, 213, 48, 54, 165, 56, 191, 64, 163, 158, 129, 243, 215, 251,
124, 227, 57, 130, 155, 47, 255, 135, 52, 142, 67, 68, 196, 222, 233, 203,
84, 123, 148, 50, 166, 194, 35, 61, 238, 76, 149, 11, 66, 250, 195, 78,
8, 46, 161, 102, 40, 217, 36, 178, 118, 91, 162, 73, 109, 139, 209, 37,
114, 248, 246, 100, 134, 104, 152, 22, 212, 164, 92, 204, 93, 101, 182, 146,
108, 112, 72, 80, 253, 237, 185, 218, 94, 21, 70, 87, 167, 141, 157, 132,

144, 216, 171, 0, 140, 188, 211, 10, 247, 228, 88, 5, 184, 179, 69, 6,
208, 44, 30, 143, 202, 63, 15, 2, 193, 175, 189, 3, 1, 19, 138, 107,
58, 145, 17, 65, 79, 103, 220, 234, 151, 242, 207, 206, 240, 180, 230, 115,
150, 172, 116, 34, 231, 173, 53, 133, 226, 249, 55, 232, 28, 117, 223, 110,
71, 241, 26, 113, 29, 41, 197, 137, 111, 183, 98, 14, 170, 24, 190, 27,
252, 86, 62, 75, 198, 210, 121, 32, 154, 219, 192, 254, 120, 205, 90, 244,
31, 221, 168, 51, 136, 7, 199, 49, 177, 18, 16, 89, 39, 128, 236, 95,
96, 81, 127, 169, 25, 181, 74, 13, 45, 229, 122, 159, 147, 201, 156, 239,
160, 224, 59, 77, 174, 42, 245, 176, 200, 235, 187, 60, 131, 83, 153, 97,
23, 43, 4, 126, 186, 119, 214, 38, 225, 105, 20, 99, 85, 33, 12, 125,
word32rcon[30] = {
0x01,0x02, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80, 0x1b, 0x36, 0x6c, 0xd8, 0xab,
0x4d, 0x9a, 0x2f, 0x5e, 0xbc, 0x63, 0xc6, 0x97, 0x35, 0x6a, 0xd4, 0xb3, 0x7d,
0xfa, 0xef, 0xc5, 0x91, };
initial_key[]={0xd5d0d92a,0xd3a90372,0x9089018b,0x9fca4c3b,0x53198a16,0x561ce01f} ;
,0x00000000,0x00000000} ;
x00000000,0x00000000} ;
int main()
intdata_num=256 ;
intkey_num=192 ;
xil_printf("\n**** Key length is : %d\n",key_num) ;
xil_printf("\n**** Data length is : %d\n",data_num) ;


xil_printf("This is Encryption") ;
main_aes(initial_key,initial_data,key_num,data_num,1,last_data) ;
xil_printf("\nThis is Decryption") ;
main_aes(initial_key,last_data,key_num,data_num,0,initial_data) ;
return 0 ;
void main_aes(uintfirst_key[],uintdatain[],intkey_bits,intblock_bits,intenc_dec,uintdataout[])
inti,j ;
uinttemp_byte ;
uinttemp_data[8] ;
uinttemp_key[6] ;
word8 data[4][MAXBC]={
word8 initial_key[4][MAXKC]={
word8 keys[MAXROUNDS+1][4][MAXBC] ;
xil_printf("\nFirst_key is : \n") ;

print_result(first_key,key_bits/32) ;
xil_printf("\nDatain is : \n") ;
print_result(datain,block_bits/32) ;
for(i=0 ; i < (key_bits/32) ; i++)
temp_key[i]=first_key[i] ;
for(i=0 ; i < (key_bits/32) ; i++)
for (j=0 ; j < 4 ; j++)
temp_byte = temp_key[i] ;
temp_byte = temp_byte<< (j*8) ;
initial_key[j][i] = ((temp_byte& 0xff000000) >> 24 ) ;
for(i=0 ; i < (block_bits/32) ; i++)
temp_data[i]=datain[i] ;
for(i=0 ; i < (block_bits/32) ; i++)
for (j=0 ; j < 4 ; j++)
temp_byte = temp_data[i] ;
temp_byte = temp_byte<< (j*8);
data[j][i] = ((temp_byte& 0xff000000) >> 24 );
/* xil_printf("key\n") ;
for(i=0 ; i < 4 ; i++)
for ( j=0 ; j < (key_bits/32) ; j++)
xil_printf(" %x ",initial_key[i][j]) ;


xil_printf("Data is : \n") ;
for(i=0 ; i < 4 ; i++)
for ( j=0 ; j < (block_bits/32) ; j++)
xil_printf(" %x ",data[i][j]) ;
rijndaelKeySched ( initial_key , key_bits , block_bits , keys ) ;
if ( enc_dec == 1 )
rijndaelEncrypt ( data , key_bits , block_bits , keys ) ;
rijndaelDecrypt ( data , key_bits , block_bits , keys ) ;
xil_printf("Data after encry_decry is \n") ;
for(i=0 ; i < 4 ; i++)
for ( j=0 ; j < (block_bits/32) ; j++)
xil_printf(" %x ",data[i][j]) ;
for ( i=0 ; i< (block_bits/32) ; i++ )
temp_data[i] = 0 ;
for (j=0 ; j < 4 ; j++)
temp_byte = 0 ;
temp_byte = data[j][i] ;
temp_byte = temp_byte<< (24-j*8) ;
temp_data[i] = temp_data[i] | temp_byte ;

for(i=0 ; i < (block_bits/32) ; i++)
dataout[i]=temp_data[i] ;
xil_printf("\nDataout is : \n") ;
print_result(dataout,block_bits/32) ;
xil_printf("\n") ;
word8mul(word8 a, word8 b) {
if (a && b) return Alogtable[(Logtable[a] + Logtable[b])%255];
else return 0;
voidKeyAddition(word8 a[4][MAXBC], word8 rk[4][MAXBC], word8 BC) {
int i, j;
for(i = 0; i < 4; i++)
for(j = 0; j < BC; j++) a[i][j] ^= rk[i][j];
voidShiftRow(word8 a[4][MAXBC], word8 d, word8 BC) {
int i, j;
for(i = 1; i < 4; i++) {
for(j = 0; j < BC; j++) tmp[j] = a[i][(j + shifts[SC][i][d]) % BC];
for(j = 0; j < BC; j++) a[i][j] = tmp[j];

void Substitution(word8 a[4][MAXBC], word8 box[256], word8 BC) {
int i, j;
for(i = 0; i < 4; i++)
for(j = 0; j < BC; j++) a[i][j] = box[a[i][j]] ;
voidMixColumn(word8 a[4][MAXBC], word8 BC) {
word8 b[4][MAXBC];
int i, j;
for(j = 0; j < BC; j++)
for(i = 0; i < 4; i++)
b[i][j] = mul(2,a[i][j])
^ mul(3,a[(i + 1) % 4][j])
^ a[(i + 2) % 4][j]
^ a[(i + 3) % 4][j];
for(i = 0; i < 4; i++)
for(j = 0; j < BC; j++) a[i][j] = b[i][j];
voidInvMixColumn(word8 a[4][MAXBC], word8 BC) {
word8 b[4][MAXBC];
int i, j;
for(j = 0; j < BC; j++)
for(i = 0; i < 4; i++)
b[i][j] = mul(0xe,a[i][j])
^ mul(0xb,a[(i + 1) % 4][j])

^ mul(0xd,a[(i + 2) % 4][j])
^ mul(0x9,a[(i + 3) % 4][j]);
for(i = 0; i < 4; i++)
for(j = 0; j < BC; j++) a[i][j] = b[i][j];






int i, j, t, rconpointer = 0;
switch (keyBits) {
case 128: KC = 4; break;
case 192: KC = 6; break;
case 256: KC = 8; break;
default : return (-1);
switch (blockBits) {
case 128: BC = 4; break;
case 192: BC = 6; break;
case 256: BC = 8; break;
default : return (-2);
switch (keyBits>= blockBits ? keyBits : blockBits) {
case 128: ROUNDS = 10; break;
case 192: ROUNDS = 12; break;
case 256: ROUNDS = 14; break;
default : return (-3);
for(j = 0; j < KC; j++)

for(i = 0; i < 4; i++)

tk[i][j] = k[i][j];
t = 0;
for(j = 0; (j < KC) && (t < (ROUNDS+1)*BC); j++, t++)
for(i = 0; i < 4; i++) W[t / BC][i][t % BC] = tk[i][j];
while (t < (ROUNDS+1)*BC) {
for(i = 0; i < 4; i++)
tk[i][0] ^= S[tk[(i+1)%4][KC-1]];
tk[0][0] ^= rcon[rconpointer++];
if (KC != 8)
for(j = 1; j < KC; j++)
for(i = 0; i < 4; i++) tk[i][j] ^= tk[i][j-1];
else {
for(j = 1; j < KC/2; j++)
for(i = 0; i < 4; i++) tk[i][j] ^= tk[i][j-1];
for(i = 0; i < 4; i++) tk[i][KC/2] ^= S[tk[i][KC/2 - 1]];
for(j = KC/2 + 1; j < KC; j++)
for(i = 0; i < 4; i++) tk[i][j] ^= tk[i][j-1];
for(j = 0; (j < KC) && (t < (ROUNDS+1)*BC); j++, t++)
for(i = 0; i < 4; i++) W[t / BC][i][t % BC] = tk[i][j];
return 0;






int r, BC, ROUNDS;

switch (blockBits) {
case 128: BC = 4; break;
case 192: BC = 6; break;
case 256: BC = 8; break;
default : return (-2);
switch (keyBits>= blockBits ? keyBits : blockBits) {
case 128: ROUNDS = 10; break;
case 192: ROUNDS = 12; break;
case 256: ROUNDS = 14; break;
default : return (-3);
for(r = 1; r < ROUNDS; r++) {

return 0;






int r, BC, ROUNDS;
switch (blockBits) {

case 128: BC = 4; break;

case 192: BC = 6; break;
case 256: BC = 8; break;
default : return (-2);
switch (keyBits>= blockBits ? keyBits : blockBits) {
case 128: ROUNDS = 10; break;
case 192: ROUNDS = 12; break;
case 256: ROUNDS = 14; break;
default : return (-3);

for(r = ROUNDS-1; r > 0; r--) {

return 0;
voidprint_result(uint temp[], intlen)
int i ;
for ( i = 0 ; i<len ; i++)
xil_printf ("%x ",temp[i]) ;
xil_printf ("\n") ;