Vous êtes sur la page 1sur 8
Fast Architectures For FPGA-Based Implementation of RSA Encryption Algorithm ‘Omar Nibouche', Mokhtar Nibouche, Ahmed Bouridane, and Ammar Belatreche! * Faculty of Engineering, University of Ulster at Magee, Northland Rd, Derry, BT48 7IL, UK abelatreche@ulster..uk “Faculty of Computing, Engineering and Mathematical Ssiences, University of the West of England, Bristol BSI6 1QY, UK ‘mokhuar.niboushe ywe..uk Schoo! of computer Science, Queen's University of Belfast, 18 Malone Rd, Belfast B17 INN, UK ‘abouridane@qubac.uk Abstract In this poper, new stractres that implement RSA cryptographic algoritim are presented. These structures are buuit-upon a miified Montgomery rodular multiple, where the operations of multiplication and” modular reductions are cated out in parallel rather than interleaved «in the tradiional Montgomery muipier. The global broadcast of data Hines is avoided by’ interleaving O40 oF more enerypiontzcryption operations onto the same Structure, this making the implementation systolic and scalable The dist approach hasbeen adopted in this pape. ‘This methodology i based on varying the digit ize andthe level of pipelining of the structures. This: parameterised approach presets the designer with an efficient vay of choosing the architecture that suis beter hisher egurements in teams of spec and area usage, an se of ential imporance 10 the resoures-limted FPGA. chips “The resus ot implementation using FPGA have shown that the proposed RSA structures outperformed those sructres built around te tradonal Montgomery muir in terms of sped, tanks to aiding global ins broadest, Introduction: In the recsot past years, the use of software tole and hardware devices foe secusty functions his. increased ‘dramatically [8. 10-12, 16. 18-22]. Secu isucs pay a "cil fle i wide spreading the vse of many computers ‘communication systems, suchas the Internet, which more and mow people ae usiag to transmit sensitive information Sch 35 cfedt ead aumbers, A. canal (0! foe achieving system security isthe erplogaphic algorituns. Privacy and Fraud conces can be adresse through the use of various sccuriy primitives sich ae ila encryption, which ane used with the appropriate protoels to construct secure and trusted networks (10-12, 16-22). Copographie algortams require immense processing power that may cavse a otdencek in highspeed networks. Inthe pos. system evelopers have had to use software-based tochpigaes it forder 10. achieve the agility. necessary to main ompaibiity in the presence of diferent algorithms aad protocols [8 16, 16-21], However. software solitons are Prticuany inetiient for the computationally demanding “syrametic algorithms (ls called public key eryplograpic lgortims) To. achieve hardware agility the use of re onfigabe loge san appropciate solution especialy that the fcent development of FPGA technology means that these devices are now large enough so that is pesible fo Implemenc a reasonable application in single PPGA device (0-7803-8652-3/04/820.00 © 2004 TEEE, (11,16, 18:22). The unlimited re-conigrabilty of an FPGA permits continuous sequence of custom cic to Be employed each opimized forthe cute task. ‘This has led toa elaively ew concept for computing where such a device can be vsed as o-processor coupled with a host machine in onder t Speed Up the computation of & specific task. Fanhermore, in tems of secu. ancther benefit of using dedicated hardware i that Scurry attacks become mone dealt, as the secrets can be contained within the coprocessor using nonvolatile memory. which s externally inaccessible [16,17 In 1976, Die and Hellman introduced the idea of pubic key crplosraphy [6]. which is now widely used 0 provide ‘ontdentality, athencation, daa imegrity and son repudiation. Since then, ‘numerous publesey eryprosysiems. ave been Proposed. All these stems base their security on some Talbematcal one ay functions. RSA [7] isthe mest widely Used public key eryplosytem. An RSA operation is 2 modular exponentiation operation, which requires repeated) modslar ‘ultiplications. For security reasons RSA operand sizes nae 0 be 102s or greater” [B1012, 16. 1823). Ip 1985 Montgomery intoduced 2 new method for modular ‘mulpication [9]. The Montgomery mulipliciionalaoribmn is the most eicien’ mndular multiplication algorithm available "This method has proved to be very efficient and isthe basis of mary implementions of ‘modular muliplicaion, both in Softvare and aware [9]. Te eplaces sil division by the ‘odulis with series of additions and divisions by a power of 2 ‘which very easy to implement. Fo ths reason. this algethan has formed the basis of many ofthe RSA hardware architectures reported to date (35.8 1-12.14 16, 18-2) “This paper proposes sew and efficient FPGA based hardware implenenatons of RSA algorithm based ona modified Monigemery's modular mulipication presented ia. [3]. A systolic approach for the implementation straegy has been ‘opted in this paper in ceder ta achieve a high clock frequency. Many systolic arhitecures of RSA algosthm have been Proposed inthe irate (18-22) In tis paper. the focus i on avoiding plobal lines broadcat an using ony nearest neighbor communicason wherever itis possible. For FPGA implemesiaions. serial and digit architectures that intereave ‘more than one encryption operation and balance area-osge and Speed requirements are presented. The paper Is organized as follows: section If reviews Montgomery’ modular mulation lgesthm and the author's modified version of the algorithm ‘This modified algorithms uses two convenional lilies (ith ‘one multiplier being sity diferent) to cary out the operation fof modular maipication. These Wo mullipier operate in parallel. bref discussion about the hound of the ees ofthe mm ICFPT 2008 algorithm is presented. As a matter of fact (19-22), the bound ofthe result of Montgomery algorithm is changes in such a way thal, if multiple” modular multiplication ‘operations ae to be eared ot iteratively withthe result of tbe tein used Ha the next one, no subiracton operation ‘would be required. Section Tit reviews Montgomery srhitctures and discusce the adopted implementation approach, To design fully sytoicareitactures, more thin fre instance is interleaved onto the same sructue. Furthermore the digit approach presets the designer wih 2 rade off to find the best sractare that matches the design neds in terms of arcousage, speed requirements, and the ‘numberof encryption operaionsinerleaved ono the same svete [1, 3-5}. The implementation results of the proposed architectures using FPGA ae analyzed in Section Mi trom which the conclusions ae dsived in Seton V. {IL Montgomery modular multipiation RSA slgorthm consists of a malular exponentiation operation, which is, on its tur, broken into two modular smuliplcaion operations (7-8, 10-12, 18.22}, There are (40 lasses of modular multiplication algorithms. A clear ‘Sstnction between these to clases can be bated upon the ‘ata format. The Most Significant Bit Fist (MSRE} cass of ‘modular algorithms are either eomparisonsubtrction based algorithms or Look Up Table (LUT) based algorithms (2, 13, 15]. On the other hand, Montgomery's algorithm [9] makes the Least Significan: Bit Fiat (LSBF) approach Useful when performing numerous succesive modular multiplication operations suchas modular exponentiation. ‘Tis algorithm avoids long eins of cary peepagaion, and therefore speeds up both the modular molieation and Squaring operations cequied during. the exponentiation process [9]. Ruther than using) subwactonéivision ‘Operations, this MSBF method) computes the modular product of (wo numbers multiplied by a scaling tactr. ‘whichis reaively prime tothe medulus. Ths slows the algorithm to perform divisions by 2 power of two, whichis 2 Shit operation. thus making the design of modulo ata fs very imporiant. Such global Sroadcas can lower the lock fequency (3-5}As shown in (3-8) a way avoid Such global ins i810 Use a dist implementation and to imeseave multiple instances onto the same seucture Fanhemore the digit approach can be the appropriate solaon to avoid the large hardware usage of the parallel bse implementations and the slow speed that characterises the serial arebiecures (1, 3.8), “Let the modulus M be an integer within the range (2° and let R be 2%. Montgomery’ multiplication algorithm Fequizes R and AF to be relatively prime, ie. ged) = {ged Z'.AD) 1 which is sated if A sol ai equied by he agortm. By exploiting this propery. Monigomery lgoritim introduces an efficient mullipiation scheme. which computes the modular product, P, of two given Integers, and B, a5 follows [9] p= (aar*), 0 where i he inverse of medlo Mf, In order to describe his algorithm. Monigomery inioduced the quaniy. Af, ‘which ete inverse of md Re a ‘The computation of the Montgomery multiplication is caied ut using Algritim 1s follows Alani t ae Pairs(Tit) AR 7b 2MihenP Pa I ‘The algorithm uses the mukipication modal R and the division by R which are faster and simpler than the computation of AB ‘mad M sich invelves division by M. The algorithm is only ficient won multiple operation ae carried Out. suc a in the ‘modular exponentiation operation when is brokes ino modlar ‘mulplication operations 12). For a hardware implementation, a systolic aay canbe derived from the bitwise version of Monigomery muliplication 3s shown in Agorithn 2 (9) Algorin 2 Osan fez eM, o. The same Structure ater eiming Among these Montgomery rupli, those who interleave wo ‘modular mulipication operations are used. with the RSA strdture of Figure 8. The remaining mulpies, which carry out faly one multiplication operaion, are used with the RSA Structure of Figure 7. As it ill be explained further dows in this Section. all these architecture may be divided nia two groups based on avoiding oF not avoing the global ine broadcast inthe ‘Montgomery multiplier sutures For the purpose of comparison and analysis, the fllowing proposed structures “have been implemented ina Xilink XC2VE000 devices Souet 1: RSA architeture of Figure 7 with « conventional ‘Montgomery multiplier as described in Algritom 2 (3) Sirut 2: RSA structure of Figue 7 which uses the Montgomery ‘multiplier depicted in Figure 1 Sirut 3: RSA srustre of Figure & which uses the Morigomery ‘ukipler of Figure 2. Struct 4 RSA architecture of Figate 7 with the multiplier Sepited in Figure 9 Soruer 5: RSA ateitecre of Figure 7 which uses the 2-bit digit ‘Montgomery mulplir ef Figure 10 Soret 6: RSA structure of igure 8 with due multiplier of Figure 1 that iteieves 2 muliplicalon operation onte the same suite. Sirus 7 the RSA structure of Figure 7 which uses the 4it digit ‘Montgomery multiplier obained by unfolding the multiplier of Figue | (with an unfolding factor of 4). Sone 8: the RSA sirctre of Figure 8 that uses a hit digit “Montgomery mliplier in which to maltplication operations are ineleaved ‘The implemeniaion results of these 8 strctures were obtained sing the Xilik ISE 6.2 package. However, theses architectures ‘were not verified witha acal implementation an a chip. The obtained results are compared against similar work in the teraure (11. 20}. In [20]. an algorithm combining the “Montgomery’s technique and the cary save representation of| fumbers Was proposed. A highly modular bisice based ahitciure has been designed for executing the algothm ip FPGA. The serial RSA architecture i (20) ase cary save adders o avoid cary. propagation during. the addition stages of Montgomery algorithm. The architectures in [11] ae based on the use cary Propagate adders. implemented using fast cary logic resouress aval in he PPGA chip, wit the multiples of ‘A-and M are pe-computed and stored in a RAM. A rait-2 anda radi 16 architecture were proposed in 11 ‘As mentioned previously. the concept of digit-eralathmetc ‘vas proposed as a compromsse hetween the bit serial and the bit Parallel arithmetic. By an sppropeate selection ofthe cgi size. provides the designer withthe best are ad peed hat match reeds ofthe system. Therefore, the aim ofthe analysis presemed in this section is to provide the designer withthe necessary knowledge nd information for fading the bes compromise between the cost Of {he RSA archtecue and is performance. Te analysis is based on tho eflec of changing the cgi size and te lve of pipelining fora full 1024: modular exponentiation (.. both the key and the public exponent ae 1024 bts long). Te evaluation parameter re the frequency. the requied time 1 cary out the modular ‘exponeatation operation. andthe rea usage in tems of the nue ST FPGA shes, ‘The proposed sinctures cary out a modlar exponentiation ‘operation in (Zn + 4k fal clock eyes. where m..Leand d are ‘he number of bite the modulas, the number of bits i the 276 exponent. the level of pipelining, and the digit siz. fespectvly. Therefor, "for a fll n-bit medal ‘exponentiation operation, 2m + yn clock eyes are required. ‘The implementation esl ofthe proposed architecture are shown in Table 1 (he implementation which does not fnclude the pesmapping ind postmapping operation that ‘conver dat oan from the res sytem) ‘A clear distinction can be made besween two clases of ‘structures in one side Sie 3, Struc and Sout 6, and on the other side, ihe renwining stuciure. The distinction js made upon the working fequencies of the fist clas structures. which is much higher than the working lrequencies of the architectures of the later els, Ths underines the beneis of using. newest neighbour Communications only against the global ies distbuton that charctres the azchitctres ofthe second class. The effect that ppening his onthe eel path of the iferent Structures can clearly be sce ithe fequences of Suc 1. Sirct 5. and Sit 7 are compares those of Sut 2, Struct 6, and Siac 8, respectively. The erica lope path of Struct 1152 FAs and operates aa fequency of 145M while the tical logic path of Sut? one FA, and thos operates at 2 higher frequency of 151 MHe. The same observation ean be made for suet Sand srt 7 ‘+ Simer $ whos rial logic path is 2 FAs and operates ft fisguency 112 MHz nd Struct 6 which exhibits a tical path of one FA and has clock frequency of 2s5 Mit, Sirct 7 that bs a rial path of 4 PAS and operates at ‘a regueney of 78 M2 a Suc 8 whose tical path consis of 2 FAS with an operating frequency of 84 Mite ln these tvee previous examples. the improvement inthe clock frequeey is very clea in the case of Struct $ and Strut 6, Ths 6 ve ein 1 the fA that Struc 4 uses teirest neighbor communications oly, The stall, ‘improvement in de remaining two cases can be explined by the elfec of inherent rouiag delay. which is more ‘important than the delay of the Logie pth ‘Anoer point ha is worth to be meiond isthe eet of sing the architectures of Figure 7 and Figure 8. For xaple, Siract 3 operates aa froquency of 291 while ‘Suet 4 operates at a frequency of 278 MHz. However, Strut 4 caries out a 1024-0 modvlar exponentiation in 1738s, thus Taser than Sec 3, which feguees 14.46 to pesform te same operation, This explained by the fact, that Sirat f fnferkaves two modular naplicsion ‘operations into the same Monigomery multiplies, The same ‘im be suid about Sire 7 and Sic) 8 [An important decision shat can be made fom reviewing “Table I is the choice ofthe appropriate architecture for the application st hand. Fo instance, iru requires 7.55 ms {o cary out the fll madulr exponentiation aperaon wile Sonet 7 (hich uses bit igh Moowomery multiplies) ates out the same operation in 6.78 ms. Nevertheless. the fe usage of Ser 7% almost wie that of Ser BY hanging the digi size and the lel of pipelining, one may Inve t database of architectures Irom wich Relshe can choose the best architect that mches tbe design frequeney and area usage neds JFrequeicy [Period [ime nite. | as) cs) Sime} 48 — [ens [nasi ‘Suet? [ 151 [eae [1392 ‘Suet | 91 | Saad [tas (12400 Sime —T8 | 3.390 | 7.55 [0100 rr Sire 288 3.905 [8.25 [4200 ‘Sine |"— 18 [12999 [678 | 19900, ‘Sit @ | 84 |11. 845 [12.44 | 30300, ‘Table |, Performances ofthe proposed architectures ‘Table 2 shows a comparison hotween Sirct 4 which sa serial arcitectre seins vo sera architectres described (1!) and {20}, The table aso compares Siract 7. whichis a Abit digi achitectur against aradb16 arehtectre proposed in (1]-This ‘comarison i cured ott for iusetion purposes ony a the ‘vod in [11] eas implemeate in an XCADISOXV-8 FPGA while the work in (20) was implemented ia a Vinex 2000E-8 chip. Probably. using recent FPGA. chip may lead to beter performances, Clea, the architectures presented in this paper fe superiors 1 tose inthe Tirta i erm of the processing ime Sit 4 hus an 82% and 735 improvement over the work Jn [1 and (20), espeively. Stuee 7 has a 43% improvement ‘over the a 16 arbiter i [1 ny | Tine ti | tes) Sd —_— ama Ts8 Simi? [868 Rai? | 59] soos inti) Raid 135 in fUT “ Bkin | aaa “Table? comparison with similar work in the Hieraure Y. Conclusion ‘The modular exponentiation operation is the core ofthe RSA. algorithm that has become the most widely used public key computer security algocthin. New archtesures Tor the Implementation of Montgomery modular exponentiation for IRSA have been propose. The new architectures use a modted Montgomery algostim ia which the opecations of modular ‘muliicaon and modular redvcton are cued out separately but in a pale! way. To ivestignte the best area wsagetine leade-of the digi approach was adopted, The problem of having lobal lines in the architectures have been cicumented hy Stercaving more than one instance ino the same digit mltipie. Which was achieved ty pipelining the feedback loops and retiming the whole sutures, The tplemenation resus Bake ‘Shown That by pipelining and increasing the digit size of the Structures. he global line broadcast is avoided with more Bis proceso at each clock eye, which in ets has led to beer ferformances in tems of the processing time. However. ns was chives atthe price of extra ea usage. This provides designers wth the best trade off between speed and aces usage and {eroughpu rte VE Reterence [1] ONihouche. and MNiboache. “On Designing Disie Mutipers". Proceeding of the 9 Intemational IEEE on Conference on Bleonies, Ciruits and Systems (ICECS), Dubrovnik 2002, [2} 0. Niboucke. A, Bouridane and M. Nibouche, "New Tieraive Algonitns and Architecures of Modular Multiplication for Cryptography", Proceelogs of the 8 Inveratonal IEEE Conference on Electronics, Circuits, and Systems, ICECS Malia 2001 [3}-0. Nibouche, A. Bouridae, and M._Nibouche “Architectures for Monigomery’s multiplication". TEE Proceedings. Computers apd Digital Teenigues, Val.150, November 2003 se 05 p. 361-368 [4] W. L.Preking und KK. Pai, Performance-sealable lamay architectures for modular quliphicaion. In Procee- linge ofthe IEEE International Conference on Applicaron- Specie Stems, Archtecres, and Processor, pages 18 160. IEEE, 2000, IS] W. L. Freking and KK. Pash, °RingPlanarized Gylindical Arrays with” Application to Modular Multiplication” IEEE Workshop on Signal Processing ‘Systems Design & Implememaion (SIPS 2000), Lafayete Louisiana. USA. pp 497-30 [6] W. Dilfic and M. E, Hellman, New dretions in exyoe- raphy. IEEE Transactions on Iyformation Theory. 22:044- 654, 1996 TTR. L Rivest, A. Shamir, and L. Adleman. A method for ‘obtaining digital signaues and publickey erplosyiems. Communications ofthe ACM, 21(2)'120-126, 1978 [8] M. Shand an . Valen, ast implementation of BSA ‘eyplography. Proceedings of the 11th IEEE Symposium on ‘Computes Arithmetic. 1993, [9] PL: Moncgcmery. "Modular nliplicaion withou iat

Vous aimerez peut-être aussi