Advanced Encryption Standard Hardware

FPGA Implementation of Advanced Encryption standards
Srihari Sridharan October 22nd 2007
Efficient Implementation of Rijndael Encryption in Reconfigurable Hardware: Improvements and Design Tradeoffs
Francois-Xavier Standaert,Gael Rouvroy,JeanJacques Quisquater, and Jean-Didler Legat
CHES Springer-Verlag Berlin Heidelberg 2003
OUTLINE

Performance Evaluation of AES Algorithm Effective FPGA implementation Heuristics to evaluate hardware efficiency Derive at optimum throughput/area efficiency Optimum Throughput = 18.5 Gbps , Area = 542 slices , 10 RAM blocks
Hardware Description

XILINX VIRTEX E 32448 slices 64986 LUTs,F.Fs 208 RAM Blocks Synthesis Synopsys Circuit modeling - VHDL

2 Slices per CLB Slice 2 L.C L.C one 4-I/p LUT + storage + additional logic Storage element Latch/Edge Triggered D F.F Additional Logic Mux F5,F6 Arithmetic logic CY logic + XOR + AND
Evaluation Paramaters

2 Types of Performance evaluation parameters. In terms of performance

In terms of resource

Throughput : bits processed per sec Area : Slices Ratio is an evaluation parameter Nbr of LUTs Nbr of Registers Ratio is Evaluation parameter
Encryption Block
Plain Text - Block Ciphers

Input 128 bit blocks State transformed S[r+c] = in[r+4c] Out[r+4c] = S[r+c] 0<=r<4, o<=c<Nb(=4)
Implementation

2 Types of Optimization Algorithmic SBox Multiplexer Model

MixColumns

RAM based Composite field
MixColumns transform Mixadd transform
Architectural
Loop unrolling Pipelining Sub-Pipelining
SBOX - Mux Model
Sbox Table
Mux Model - Background
N i/p boolean function G(x) represented by
In AES Which is bit representation Implemented as
Mux Model
MUX Model
Mux Model
Realization on FPGA

LUT based 4 I/p 4 o/p Lookup Four 4 I/p 1 o/p LUT Coupled 4:1 Mux Realizing 4:1 Mux through three 2:1 Mux
a b
c
d
s0
s1
Mux Model - Implementation
Mux Model - Analysis

1 Bit output Repeated 16 times and looped 16 times Critical path LUT4 + MUXF5 + MUXF6 2 level pipelining 12 clock pulses
Implementation

2 Types of Optimization Algorithmic SBox
Multiplexer Model
RAM based
MixColumns

Composite field MixColumns transform Mixadd transform
Architectural
SBOX RAM Based
Lookup type BRAM two single port 256x8 bit Write enable of RAM made low Input held low ROM implemented 1 clock Design

SBOX = 16x16x8 = 2048 bits = 2Kbits 16 SBOx for each state 1 BRAM = two 2Kbit RAM Hence 8 BRAM required
Implementation

Multiplexer Model
RAM based
MixColumns

Composite field
Architectural
Composite field - Math Basics
Byte representation in Galois Field GF(28) For e.g. 01100011 is x6 + x5 + x + 1. Addition Modulo 2 Arithmetic (No subtraction) Multiplication polynomial multiplication modulo irreducible polynomial (deg = 8) m(x) = x8 + x4 + x3 + x +1 Multiplicative inverse

b(x)a(x) + m(x)c(x) = 1. b-1 (x) = a(x) mod m(x) because a(x) b(x) mod m(x) = 1,
E.g 3m 1 (mod 11) , 3-1 m (mod 11)
Composite model equations Multiplicative Inverse
GF(28) = GF(24) 2 GF(24) = a1x + a0 Inverse given by X belongs to x2 + x + = 0 b0=(a0+a1)-1 b1=a1-1 = a0.(a0+a1)+ a12
Composite field - Affine Transformation
Linear transformation + Translation Transformation = rotations, scaling, shear Translation = shift In AES
Composite field - implementation
Implementation

Multiplexer Model
MixColumns
MixColumns transform
Mixadd transform
Architectural
Mixcolumns transform - Background

Four-term polynomials Coefficients are bytes M(x) = X4 + 1 Product defined as a(x) X b(x) = d(x)
Mixcolumns transform - Equations
Solution
Multiplication of GF(28) polynomial with X = multiplication by 02 = left shift plus Conditional XOR (based on MSB)
Mixcolumns transform - Implementation
Mixcolumns transform Implementation
To implement 03a1 = (02 + 01)a1 = 02a1 + a1 Hence we have 2 multiplication with x (a0,a1) 5 XOR addition Above two + a1+a2+a3 2 level pipelined
Mixcolumns transform Implementation
Implementation

Multiplexer Model
MixColumns
MixColumns transform
Mixadd transform
Architectural

Mixadd transform - Principle
Inside X(a0) or X(a1) Mostly shift operator In both the bytes XOR is done only to 3 bits So these three bits separately added Now pipelined Combined with Key addition
Mixadd transform Implementation
Implementation

Multiplexer Model
MixColumns

Architectural Loop unrolling
Pipelining Sub-Pipelining
Unrolled Architecture
10 AES round unrolled Lots of hardware Area is increased Throughput is Increased
Implementation

Multiplexer Model
MixColumns

Architectural
Loop unrolling
Pipelining
Sub-Pipelining
Pipelined Architecture - I
At a time only one round Hardware reduced Throughput reduced Area reduced
Pipelined Architecture - II
All 10 rounds taken inside loop Loss of mixadd combination Additional Mux Good choice in ASIC
Heuristic optimization
Results
Pipelined -I architecture
Unrolled Architecture
Results Contd
Comparison
RAM/unrolled RAM/pipelined
Mux/pipelined
composite/pipelined
Summary
http://www.cs.bc.edu/~straubin/cs38105/blockciphers/rijndael_ingles2004.swf
Conclusion
Algorithmic and Architectural Design Tradeoffs were evaluated Optimum Design principle found through heuristics Throughput = 1563Mbps Performance (throughput/Area) = .69
Phase 2 preview

Implement Implement transform Implement Implement shift
SBOX RAM based Mixcoloumn Mixcoloumn
Addkey Direct XOR ShiftRow Simple cyclic

Advanced Encryption Standard Hardware

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Advanced Encryption Standard Hardware

Transféré par

Droits d'auteur :

Formats disponibles

FPGA Implementation of Advanced Encryption standards

Srihari Sridharan October 22nd 2007

Francois-Xavier Standaert,Gael Rouvroy,JeanJacques Quisquater, and Jean-Didler Legat

CHES Springer-Verlag Berlin Heidelberg 2003

2 Types of Performance evaluation parameters. In terms of performance

Plain Text - Block Ciphers

2 Types of Optimization Algorithmic SBox Multiplexer Model

RAM based Composite field

MixColumns transform Mixadd transform

Loop unrolling Pipelining Sub-Pipelining

SBOX - Mux Model

Mux Model - Background

N i/p boolean function G(x) represented by

In AES Which is bit representation Implemented as

Mux Model - Implementation

Mux Model - Analysis

2 Types of Optimization Algorithmic SBox

Composite field MixColumns transform Mixadd transform

Loop unrolling Pipelining Sub-Pipelining

SBOX RAM Based

2 Types of Optimization Algorithmic SBox

Loop unrolling Pipelining Sub-Pipelining

Composite field - Math Basics

E.g 3m 1 (mod 11) , 3-1 m (mod 11)

Composite model equations Multiplicative Inverse

Composite field - Affine Transformation

Composite field - implementation

2 Types of Optimization Algorithmic SBox

Loop unrolling Pipelining Sub-Pipelining

Mixcolumns transform - Background

Mixcolumns transform - Equations

Mixcolumns transform - Implementation

Mixcolumns transform Implementation

Mixcolumns transform Implementation

2 Types of Optimization Algorithmic SBox

Loop unrolling Pipelining Sub-Pipelining

Mixadd transform - Principle

Mixadd transform Implementation

2 Types of Optimization Algorithmic SBox

MixColumns transform Mixadd transform

Architectural Loop unrolling

10 AES round unrolled Lots of hardware Area is increased Throughput is Increased

2 Types of Optimization Algorithmic SBox

MixColumns transform Mixadd transform

Implement Implement transform Implement Implement shift

SBOX RAM based Mixcoloumn Mixcoloumn

Addkey Direct XOR ShiftRow Simple cyclic

Vous aimerez peut-être aussi