A Vlsi Implementation of Secure Hash Algorithm

A VLSI IMPLEMENTATION OF SECURE HASH ALGORITHM
Secure Hash Algorithm(SHA), with 128-

Abstract. -The project is to 512-bit output Digest Messages (DMs),
about set of techniques respectively.
for hardware The SHA-1 was approved by the NIST in
implementations of 1995 as an improvement to the SHA-
Secure Hash Algorithm
(SHA) hash functions. 0. SHA-1 and found its way into all major
This involve mostly security applications, such as SSH, PGP,
operation rescheduling and and IPSec. In 2002, the SHA-2 was
hardware reutilization. This introduced.
result in significant reduction The techniques used to improve
of critical path and hardware implementation of SHA are :
required area. Throughputs • parallel counters and balanced carry save
from 1.3 Gbit/s to 1.8 adders (CSA), in order to improve the
Gbit/s were obtained for partial additions
the SHA implementations • unrolling techniques optimize the data
on a Xilinx. Compared dependency and improve the throughput.
to commercial cores and • balanced delays and improved addition
previously published units; in this algorithm, additions are the
research, these figures most
correspond to an critical operations .
improvement in throughput / • embedded memories store the required
slice in the range of 29% constant values
to 59% for SHA-1 and 54% to • pipelining techniques, allow higher
100% for SHA-2. Experimental working frequencies
results on hybrid software .the SHA functions implementation can
implementations of the SHA be summarized as follows:
cores have shown speedups • operation rescheduling for a more
upto 150 times for the efficient pipeline usage;
proposed cores. • hardware reuse in the DM addition;
• a shift-based input/output (I/O) interface;
• memory-based block expansion
1. INTRODUCTION
structures.
Cryptographic algorithms can be
Also an alternative data block
divided into three several classes: public
expansion structure has also been
key algorithms, symmetric key
introduced.
algorithms, and hash functions. While
, achieve a high throughput for the SHA
the first two are used to encrypt and
calculation via operation rescheduling. At
decrypt data, the hash functions are one-
the same time, the proposed hardware
way functions that do not allow the
reuse techniques indicates an area
processed data to be retrieved. This
decrease, resulting in a significant increase
project focuses on hashing algorithms.
of the throughput per slice efficiency
Currently, the most commonly used hash
metric. Implementation results on several
functions are the MD5 and the Secure
Hash Algorithm implementation of
FPGA technologies of the proposed Wt and Kt are 64 bit wide and each data
SHA, show that a throughput of 1.4 block is composed of 16 by 64 bit
Gbit/s is words ,having in total 1024 bits.
achievable for both SHA-128 and SHA-
256 hash functions. For SHA-512 this
value increases to 1.8 Gbit/s. Moreover,a
Throughput/Slice improvement up to
100% is achieved, regarding current
state of the art.
2. SHA-1 AND SHA-2 HASH

FUNCTIONS
.
SHA-1 or SHA128 hash function
produce a single output 160 bit message
digest from an input message. The input
block of 512 bit is split into 80 by 32 bit
words ,denoted as Wt , i.e. one 32 bit for
each computational round.fig1 shows the
SHA1 calculation.
Fig2:SHA2 round calculation
A.DATA BLOCK EXPANSION
The SHA1 algorithm computational steps

involves 80 times round calculation. Each
round uses a 32 bit word obtained from
current input data block. Since each input
data block has 16 by 32 bits
words(512),the remaining 64 by 32 words
are obtained from data expansion. The
Fig1:SHA1 round calculation. expansion is done by the following
method
SHA256 hash function produces a final Wt=
DM of 256 bit. Each 512 bit input is
expanded and fed to the 64 rounds of {M t , (i)
0≤t≤15
the SHA256 function in words of 32 bits RotL1(W t-3 Exor W t-8 Exor W t-14
each (Wt) .The fig 2 shows the SHA2 Exor W t-16), 16≤t≤79
calculation . (1)
Where Mt(i) denotes first 16 bit of ith data
SHA512 hash function algorithm block
calculation involves difference in size of
operands from 32 bit to 64 bit.The DM For SHA2 algorithm the computation
has twice the width ,512 bits, and steps are performed 64 rounds(80 rounds
different logical functions are used.The for SHA 512) .In each round a 32 bit
word (or 64 bit for 512) from current From fig1 ,it is observed that the bulk of
data input block is used. The input block the SHA1 round computation is oriented
has 16 by 32 bits words(64 bit towards the A value calculation. The
for512),resulting in the need to expand remaining values do not require any
the initial data block to obtain the computation other than being rotation
remaining words. This expansion is operation being performed. The required
performed by the following method values are provided by the previous round
values of the variables A to D. Given that
Wt= the values of the A depends on its
previous values, no parallelism can be
{M ,
t
(i)
0≤t≤15 directly exploited
σ1(Wt-2)+Wt-7 + σ0(Wt-15)+Wt-16, At+1=RotL5(At)+[f(Bt,Ct,Dt)+Et+Kt+Wt]
16≤t≤79 (3)
(2) The term which does not depend on A
is pre-computed, producing the carry (βt)
B.MESSAGE PADDING and save (St) vectors of the partial
addition.
To make the input data block a
multiple of 512 bits, as required by St+ βt= f (Bt,Ct,Dt)+Et+Kt+Wt (4)
SHA1 and SHA256 specification, the
original message has to be padded. For The calculation of At is done by
SHA 512 the input block is a multiple of
1024 bits. At=RotL5(At-1 ) + ( St-1+ Bt-1 )
For 512 bit data block ,the original
message is composed of n bits ,the bit St+ βt= f (Bt,Ct,Dt)+Et+Kt+Wt (5)
“1” is appended at the end of the
message( the n+1 bit),followed by k zero By splitting the computation of the
bits, where k is the smallest solution to value A and by rescheduling it to a
the equation n+1+k=448 mod 512. different computational round, the critical
These last 64 bits are filled with the path of the SHA-1 algorithm can be
binary representation of n ,the original significantly reduced. The calculation of
message size. the function f(B,C,D) and the partial
For SHA 512 message padding, 1024 addition are no longer in the critical path,
bit data blocks are used and last 128 bit the critical path of the algorithm is
is reserved for the binary value of the reduced to a three-input full adder and
original message. additional selection logic.
The 80 SHA-1 rounds have been
3.PROPOSED DESIGN FOR SHA1 computed, the final values of the internal
Here a functional rescheduling of the variables (A to E ) are added to the current
SHA-1 algorithm is done ,which allows DM. In turn, the DM remains unchanged
the high throughput of an unrolled until the end of each data block
structure to be combined with allow calculation. This final addition is
hardware complexity. performed by one adder for each 32 bits
portion of the 160-bit hash value . The
A.OPERATION RESCHEDULING addition of the value DM0 is directly
performed by a CSA adder in the round
calculation. With this option, an extra calculation, that depends on A is
full adder is saved and the value performed in one less clock cycle.
Fig3 SHA1 rescheduling and internal structure

addition can be performed with one adder
B. HASH VALUE INITIALIZATION per each 32 bit of the DM. Here
For the first block the internal
hash(DM0) is initialized by adding zero the addition of B through E with the
to the initialization vector(IV). This current DM requires four additional
initial value is afterwards loaded to the adders. Now considering this
internal registers(B to E),through a
multiplexer . In this value of register A Et=Dt-1=Ct-2=RotL30(Bt-3) (7)
is set to zero and DM is directly
introduced into the calculation of A, The computation of the DM from the data
block I can be calculated from the internal
S0+β0=f(BDM1,CDM2,DDM3)+EDM4+ variable B, as
K0 + W0 +RotL5(DM0)
DM4i=RotL30(Bt-3) +DM4i-1 ;
A= RotL5(A) + (S0+β0) DM3i=RotL30(Bt-2) +DM3i-1
= RotL5(0) + (S0+β0) (6) DM2i=RotL30(Bt-1) +DM2i-1
DM1i=Bt +DM1i-1
C IMPROVED HASH VALUE
ADDITION Thus the calculation can be performed by
When all rounds are computed for a just a single addition unit ,used to select
given data block, the internal variables between the value B and its bitwise
have to be added to the current DM.This rotation,RotL30. The rot() in the given
represents the optional rotation of the
input value At+1=∑0(At) +Maj(Bt,Ct,Dt)+∑1(Et)
+ Ch(Et,Ft,Gt)+δt (11)
DM [j]i= rot (Bt-j-1)+ DM [j] i-1 ; 1≤j≤4
4.PROPOSED SHA2 DESIGN Where δ t is calculated in previous
round. The value of δt+1 can be the result of
Like for the SHA-1, the functional a full addition or the two vectors from
rescheduling can also be applied to carry save addition.
SHA-2 algorithm. However, as depicted
in Fig. 2, the SHA-2 computational path B.HASH VALUE ADDITION AND
is more complex and with an even INITIALIZATION
higher data dependency level. In each
round of the algorithm, the values A Similar to SHA-1, the internal variables of
through H have to be calculated, but SHA-2 also have to be added to the DM. If
only the values A and E require this addition is implemented in a
computation. In the proposed SHA-2 straightforward manner, eight adders
Computational structure , the part of the would be required, one for each internal
computation of a given round t that can variable, of 32 or 64 bits for SHA256 or
be computed ahead in the previous round SHA512, respectively. The experimental
t-1. results for the SHA-1 implementation,
suggest the DM addition with the shift to
A.OPERATION RESCHEDULING be more area efficient. Thus only this
approach is studied in the proposed SHA-2
While the variables B,C ,D ,F ,G and structure for the addition of the internal
H are obtained directly from the values values with the DM value. Since most of
of the round, not requiring any the SHA-2 internal values do not require
computation, the values and for round any computation, they can be
cannot be computed until the values for directly obtained from the previous values
he same variables have been computed of A and E.
in the previous round, as shown in
Ht = Gt-1= Ft-2= Et-3
Et+1=Dt+∑1(Et)+Ch(Et,Ft,Gt)+ Ht Dt= Ct-1= Bt-2= At-3 (12)
+Kt+Wt
At+1=∑0(At) +Maj(Bt,Ct,Dt)+∑1(Et) The computation of the DM for the data
+ Ch(Et,Ft,Gt)+Ht +Kt+Wt (10) block can thus be calculated from the
internal variables A and E, as
As the value of Ht+1 is given directly
by Gt which in its turn is given by Ft-1 , DM7i=Et-3+DM7i-1
the pre calculation of H can thus be DM6i=Et-2 +DM6i-1
given by Ht+1 = Ft-1 . Since the values Kt DM5i=Et-1 +DM5i-1
and Wt can be precalculated and are DM3i=At-3 +DM3i-1
directly used in each round, (10)can be DM2i=At-2 +DM2i-1
written as DM1i=At-1 +DM1i-1
With only two addition units
δt = Ht +Kt+Wt =Gt-1+Kt+Wt DM [j+4]i= Et-j +DM [j+4] i-1 ; 1≤j≤3
Et+1=Dt+∑1(Et)+Ch(Et,Ft,Gt)+ δt DM [j]i= At-j +DM [j] i-1 ; 1≤j≤3
These things are illustrated in figure
shown below
Fig 4: SHA2 round architecture
Instead of using a full adder, after the the final value computed for the two
calculation of the final value of A and E, variables is already the DM.
the DM is added during the calculation In the SHA-2 algorithm standard, the
of their final values. Since the value of initial value of the DM (loaded in A
DMi-1 is known, the value can be added through H ) is a constant value, that can be
during the first stage of the pipeline, loaded by using set/reset signals in the
using a CSA. registers. If the SHA-2 algorithm is to be
After each data block has been used in a wider set of applications and in
computed, the internal values A to H the computation of fragmented messages,
have to be reinitialized with the newly the initial DM is no longer a constant
calculated DM.This is performed by a value. In these cases, the initial value is
multiplexer that selects either the new given by an Initialization Vector (IV) that
value of the variable or the DM, as has to be loaded. In order to optimize the
depicted in the left most side of Fig. 4. architecture, the calculation structure for
The values and are the exception, since the DM can be used to load the IV, not
being directly loaded into all the the standard for secure hash standard,”
registers. The value of the A and E FIPS 180, 1993.
registers is set to zero during this
loading, thus the existing structure acts
as a circular buffer, where the value is
only loaded into one of the registers, and
shifted to the others. This circular buffer
can also be used for a more efficient
reading of the final DM, providing an
interface with smaller output ports.
5.CONCLUSION
The proposed rescheduling
techniques, Resulted in improvement in
speed and area. The critical path has
been reduced .The SHA is simulated by
using the verilog .from the simulation it
clearly suggest that there is speed up in
computations regarding the pure
software implementation.
REFERENCES
1) National Institute of Standards and
Technology (NIST), MD, “FIPS
180–2, secure hash standard (SHS),”
2002.
2) N. Sklavos and O. Koufopavlou,
“Implementation of the SHA-2 hash
family standard using FPGAs,” J.
Supercomput., vol. 31, pp. 227–248,
2005.
3) R. Lien, T. Grembowski, and K. Gaj,
“A 1 Gbit/s partially unrolled
architecture of hash functions SHA-1
and SHA-512,” in Proc. CT-RSA,
2004, pp. 324–338.
4)R. Chaves, G. Kuzmanov, L. A.
Sousa, and S. Vassiliadis,
“Rescheduling for optimized SHA-1
calculation,” in Proc. SAMOS
Workshop Comput. Syst. Arch. Model.
Simulation, Jul. 2006
5) National Institute of Standards and
Technology (NIST), MD, “Announcing

A Vlsi Implementation of Secure Hash Algorithm

Transféré par

Informations du document

Description originale:

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

A Vlsi Implementation of Secure Hash Algorithm

Transféré par

Droits d'auteur :

Formats disponibles

A VLSI IMPLEMENTATION OF SECURE HASH ALGORITHM

Secure Hash Algorithm(SHA), with 128-

2. SHA-1 AND SHA-2 HASH

Fig2:SHA2 round calculation

A.DATA BLOCK EXPANSION

The SHA1 algorithm computational steps

Fig3 SHA1 rescheduling and internal structure

Fig 4: SHA2 round architecture

Vous aimerez peut-être aussi