Vous êtes sur la page 1sur 10

Using System Generator To Design A Reconfigurable Video Encryption System

Daniel Denning1, Neil Harold2, Malachy Devlin2, James Irvine3


Institute of System Level Integration, Alba Centre, Alba Campus, Livingston, EH54 7EG, UK daniel.denning@sli-institute.ac.uk 2 Nallatech Ltd, Boolean House, One Napier Park, Cumbernauld, Glasgow, G68 0BH, UK {n.harold, m.devlin}@nallatech.com 3 EEE Department, University of Strathclyde, 204 George St., Glasgow, G1 1XW, UK j.m.irvine@strath.ac.uk
1

Abstract. As FPGAs increase in size there is a need for improved productivity, and this includes new design flows and tools. System Generator, from Xilinx, is a high-level block-based design tool that offers bit and cycle accurate simulation. In this paper, we discuss the use of System Generator to design a reconfigurable video encryption system. It includes the design of the AES (Advanced Encryption System) and Enigma encryption cores. This paper also demonstrates the use of Nallatech block-sets within System Generator to provide a synthesisable link straight to hardware. As a result of using this design flow, we are able to efficiently implement our system and algorithms with a significant improvement on traditional design times, without compromise for performance or area.

1 Introduction
As FPGA capacity moves towards 50 million system gates [1] design productivity and time-to-market become areas of increased concern. This has created unprecedented interest in high-level abstraction of systems with a focus on languages and design tools to generate VHDL or RTL and to overcome the issues of systems design. A range of high-level tools and languages for FPGA design are currently available including: Celoxicas Handel-C [2] a C based parallel language that generates EDIFs; AccelChips AccelFPGA [3] a Matlab synthesis tool that generates RTL; and Xilinxs Forge Java [4] a Java to Verilog tool. System Generator [5] from Xilinx is an extension of Simulink and provides a block-based high-level schematic tool that generates VHDL. It allows designers to access major functions within the silicon, without removing the designers functionally abstract view. The nature of System Generator makes it ideal for rapid development of data-path algorithms. Encryption functions are commonly implemented using data-path

architectures, so there is a natural compatibility between these functions and the System Generator tool. One of the encryption algorithms used in this system is based on the Enigma machine, which was an important part of German military and intelligence communications during World War II. The machine is equivalent to an encrypter/decrypter typewriter whereby pressing a letter of the alphabet reveals a different letter of the alphabet. A much more recent product cipher is AES, which is based on the Rijndael algorithm developed by V.Rijmen and J.Daemen [6]. The algorithm has a good parallel structure and can be used in a wide range of devices. AES was chosen by NIST (National Institute of Standards and Technology) to replace the highly popular but less-efficient DES (Data Encryption Standard). Xilinxs Xtreme DSP kit [7], developed in conjunction with Nallatech, offers development solutions for such areas as SDR (software defined radio), 3G wireless, networking, HDTV, and video imaging. The kit contains a BenOne motherboard [8] and a BenAdda module [9], which are part of the scalable DIME-II family. Nallatech has also developed the associated software/firmware FUSE to configure the module via a host computer and this contains a number of interfaces such as a GUI application and APIs for C/C++, Java, and Matlab. In this paper we investigate the implementation issues of designing a reconfigurable video encryption system using System Generator. We map our system to a Virtex-II FPGA on a BenAdda module, housed on a BenOne motherboard. The system is fully reconfigurable in that we essentially have two identical designs but with different encryption cores, a modified Enigma algorithm and the AES algorithm. Video is transmitted over a wireless link and fed back into the same FPGA to be decrypted. The reason for choosing these algorithms is to demonstrate the idea of IP obsolescence protection and update. We also aim to demonstrate by using a combination of System Generator and Nallatech hardware and software, the rapid development of a system.

2 Modelling with System Generator


System Generator is an extension of Simulink, and consists of a Simulink library called the Xilinx blockset. It maps the Xilinx block elements defined in Simulink into architectures and entities, signals, ports, and attributes. It also produces command files for FPGA synthesis, HDL simulation and implementation tools. The tool keeps the Simulink hierarchy when converted into VHDL. When designing in System Generator it is possible to access key features in the FPGA such as the high-speed multipliers, and it is also possible to incorporate userdefined VHDL blocks in to the model. For verification and testing System Generator can automatically generate testbenches, where by the Simulink input stimuli to the input block can be recorded for the VHDL simulation. The outputs can then be compared with the recorded results from the Simulink simulation in the VHDL simulation.

3 AES Algorithm
The AES algorithm is a symmetrical block cipher. It processes data blocks of 128 bits and uses a cipher key of either 128, 192, and 256 bits to produce an encrypted data block of 128 bits. We have chosen the algorithm to use the 128 bit key length. This results in 10 rounds of encryption within the algorithm, with each round, except the last value, having 4 transformations as shown in Fig. 1. Byte Substitution Shift Row Mix Column Add Round Key
Fig. 1. The AES transformations within one round.

1 Round

For this implementation we have chosen to design the traditional looped feedback for each incremental round without any pipelining. Each round has its own subkey, these are generated when needed and then stored locally in Block-RAM for the decryption core.

4 The Enigma
The Enigma is an electromagnetic cipher machine, which encrypts individual letters of the alphabet. It is composed of 3 out of 5 available rotors, a reflecting rotor, a stecker1, a set of lights, and a keyboard. Our modified Enigma algorithm is an 8-bit algorithm accepting values between 0 255. It uses 3 permanent rotors and has not included the stecker. A diagram of the encryption flow can be seen in Fig. 2. encrypted letter out
rotor 3 rotor 2 rotor 1

reflecting rotor

letter in

Fig. 2. Diagram of the rotors and reflecting rotor in the Enigma with the direction of flow.

For more information on the Enigma machine see [10].

The stecker is a plug board that allows the user to swap letters before going into the encryption

5 System Hardware
As stated earlier, the Xtreme DSP Kit is made up of a DIME-II module (the BenAdda) and motherboard (the BenOne) from Nallatech. DIME-II is an evolution of the DIME standard that was defined to provide a platform specifically targeted to extract the power and flexibility of FPGAs to a system level. The standard defines the physical hardware as well key system issues such as clocking multiple devices and high-bandwidth communication between FPGA-based modules. 5.1 BenOne Motherboard The BenOne (Fig. 3) is a single-slot DIME-II motherboard with PCI and USB interfaces. The motherboard contains a Spartan-II FPGA with pre-configured PCI/USB and board control firmware.

Fig. 3. BenOne single-slot DIME-II motherboard PCI

5.2 BenAdda Module The BenAdda DIME-II module (Fig. 4) provides high-speed digital-to-analogue and analogue-to-digital conversion through two 14-bit ADCs and DACs. The module fits on top of the BenOne. Each ADC and DAC channel provides a maximum sample rate of 105MSPS and 160MSPS respectively. There is also an onboard user programmable FPGA, which is available in a range of packages and programmed using the FUSE software. For our implementation we are targeting a XC2V3000 device.

Fig. 4. BenAdda DIME-II module

5.3 Block sets for System Generator In the system we also use Nallatech System Generator block sets (Fig. 5). These blocks provide a link straight to the DACs and ADCs on the BenAdda module. The blocks are easily dropped in to the design area and essentially provide the functionality and pin locations for the FPGA on the module. The LED Flasher block provides a link with the module LEDs and is used mainly for debugging and confidence testing.

Fig. 5. Examples of Nallatech block sets for System Generator

6 System Implementation
A PAL video stream is captured by an ADC channel on the BenAdda, fed through the FPGA, encrypted, and presented at the DAC. This is then fed back through the second ADC to the FPGA to be decrypted. A high-level System Generator model can be seen in Fig. 6. For reconfigurability the system is in two parts. The first part is an encryption system using the Enigma algorithm and the second is the same system except that the Enigma is substituted for the AES algorithm.

Fig. 6. High Level model of Video System using AES

The DDS source and floating scope in Fig. 6 do not form part of the Video Encryption System. They are included in the System Generator design for simulation purposes only, the DDS taking the place of the PAL source and the scope taking the place of the display. It is possible to add multiple scopes to the simulation environment to assess the performance of the system at any point in the data path. The real system is made up of all components in the path between ADC1 and DAC2 as well as the LED block as a visual check that the system is running. In operation, a PAL video stream is captured by ADC1 on the BenAdda. Only 8 of the 14-bits are required for an accurate representation of the video data so the 6 LSBs are discarded, while the 8 MSBs are forwarded to the encryptor (either Enigma or AES). The design of the Enigma is fully pipelined with 3 forward rotor blocks, a reflector rotor block, and 3 reverse rotor blocks for each encryption and decryption core. A rotor block in System Generator can be seen in Fig. 7.

Fig. 7. A rotor block in the Enigma algorithm

The AES algorithm uses the standard looped round design. This takes up the smallest amount of space on the FPGA but affects the throughput by the feedback latency. The S-boxes with in the subByte transformation are implemented using System Generators single port ROM blocks and can be seen as a one-dimensional array. The GF(28) multiplication (Fig. 8) with in the mixColumns transformation is a combination of shift, XOR, and multiplexor blocks.

Fig. 8. A part snapshot of the GF(28) multiplication with in the mixColumn transformation

The shiftRows transformation was implemented by separating the 128-bit data down to 8-bits using the slice block, re-ordering the 8-bits and appropriately concatenating all the 8-bits back using the concat block to form 128-bits. A block model of this transformation can be seen in Fig. 9.

Fig. 9. ShiftRows transformation using slice and concat blocks

The decryption core is the inverse of the encryption core, for more information see [6]. The expansion of the key to produce the subkeys is preformed locally when

needed and then stored in block-ROM to be accessed later by the decryption core. A buffering and de-buffering circuit is added before the encryption and after the decryption to buffer up and down 8-bits to 128-bits and vice-versa. The data having been encrypted, start and stop bits are added prior to transmission. The data is then serialised and modulated at 100MHz (via DAC1) onto a laser diode module that transmits to a pin diode receiver connected to ADC2 on the BenAdda. The modulated data received on ADC2 is captured and passed to a synchronisation circuit, which identifies the boundary of each parallel word and removes the start and stop bits before applying the parallel data to the decryption block. The system generator diagram of the synchronisation block is shown below (Fig.10).

Fig. 10. Synchronisation block model

Note that the 11-bit parallel to serial and serial to parallel conversions performed on the PAL data limit the operating frequency of the design. System generator automatically updates the system period of the design when the P/S and S/P conversions are added. The serial elements of the system run at the full clock frequency, while the sections processing the data in parallel have clock enable signals to restrict them to running at 1/11 of this full clock frequency. Moving between these clock domains is achieved through the use of up- and down-sampling blocks in System Generator. For example, when the serial data is being converted back to parallel, the start and stop bits of each parallel word are detected and the serial to parallel converter is enabled to ensure a valid word. However, the sync circuit has a clock period of 1/11 while the serial to parallel converter has a period of 1. A downsampling block is used to transfer the valid_data flag from one clock domain to another. However, the different clock

domains mean that there is a time difference between the valid_data flag going high and the down-sampled version transitioning. As a result, a selectable delay line is required to store the serial data until the downsampled flag is the same value as the original signal. Following decryption, the reformed 8-bit PAL signal is fed out through DAC2 on the BenAdda to a suitable display. The entire system is bandwidth limited by the ADCs, which are clocked at 100MHz.

7 Results
Each system, encryption and decryption, fits onto one XC2V3000. The Enigma system takes up 4588 slices (32%) of the device while the AES system takes up 1719 slices (12%) of the device. The performance of both systems is limited by the wireless laser link and the ADCs. Without these limitations the performance would go as high as the performance of the encryption cores. The Enigma encryption core has a throughput of 1.25 Gbit/sec and is fully pipelined, so there is a finite delay before the transmission of the video. The design of the encryption and decryption cores took one engineer just over one week. The AES encryption core has a throughput of 1.3Gbit/sec and requires 466 (3%) slices of the device. However, due to the use of block-RAMs for s-boxes in the subByte transformation, the core needs 37 out of the 96 block-RAMs available. The design of this encryption and decryption core took one engineer just over two weeks. These same statistics can be applied to both of the decryption cores.

8 Conclusions
The purpose of this paper was to demonstrate the use of System Generator to design our system. As can be seen from the results, especially the implementation of the AES core, it is possible with System Generator to design high-speed cores for FPGAs. The standard AES encryption core has also been fully pipelined. The System Generator model took 2 hours to complete. A synthesisable core has not been produced yet but it is estimated that a theoretical throughput of 14.3Gbit/sec could be achieved and this would mean using lookup tables for the s-boxes (a lookup table having a latency of 1). A further increase could be made by implementing combinatorial s-boxes. When comparing other implementations of AES with our standard looped feedback core which took over two weeks to design, [11] achieved a looped feedback design of 394.3Mbit/sec, and 1.9Gbit/sec in a partially pipelined to a degree of five design. UltraSonic [12] achieved a pipelined design of 8.4Gbit/sec, and the fastest published AES by [13], a fully pipelined memoryless encryptor achieves a throughput of 17.8Gbit/sec. Although the design times cannot be compared with an equivalent VHDL design time, it is the authors view that these design times have increased impressively.

The System Generator tool has proved to be very intuitive for datapath and algorithm design. With System Generator it is possible to use Matlab files to generate data, which is to be initialised in the block-RAMs in the model. On a downside the tool produces many VHDL files, for example the AES encryption and decryption core has over 2000 VHDL files in its hierarchy yet only takes up 3% of the slices in a XC2V3000. The average simulation time of the system took around 10-15 minutes. When the AES fully pipelined encryption model was simulated the tool took over 4 hours to simulate 15 clock cycles. Currently System Generator does not handle matrix operations for image processing. If a two-dimensional image array needs to be processed then each vector must be processed explicitly. When comparing designing in System Generator to designing in VHDL the generate statement is not available and users much copy and paste the required blocks. In this paper we have also demonstrated a simple way of updating an IP core with in an FPGA system. It has been completed with a minimum change to the system. The FPGA is then fully reconfigured with an updated encryption core in the system.

Acknowledgements
On behalf of Nallatech Limited the authors would like to thank the Ministry of Defence (UK) for allowing this funded work to be published, and the financial support of Engineering and Physical Sciences Research Council. And lastly Dave Shand, Derek Stark, and Eric Lord for their help and support.

References
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. Bolsens, I.: Challenges and Opportunities for FPGA Platforms, FPL 2002, LNCS 2438, pp. 391-392, 2002. Celoxica Ltd, Handel-C Product Brief, August 2002. AccelChip Inc, AccelFPGA datasheet, 2002. Xilinx Inc., Forge, www.xilinx.com/ise/advanced/forge.htm Xilinx Inc., System Generator Reference Guide, www.xilinx.com/ipcenter/dsp/ref_guide.pdf AES, Federal Information, Processing Standards Publication 197, Nov 26, 2001. Xilinx Inc. Xtreme DSP Development Kit, ww.xilinx.com/ipcenter/dsp/development_kit.htm. Nallatech Ltd. BenOne datasheet, 2002. Nallatech Ltd. BenAdda datasheet, 2002. Bletchley Park, Enigma, www.bletchleypark.org.uk. Labb, A., Prez, A.: AES Implementation on FPGA: Time-Flexibility Tradeoff, FPL 2002, LNCS 2438, pp.836-844, 2002. Moreira, E., McAlpine, P., Haynes, S.: Rijndael Cryptographic Engine on the UltraSonic Reconfigurable Platform, FPL 2002, LNCS 2438, pp. 770-779, 2002. Jrvinen, K., Tommiska, M., Skytt, J.: A Fully Pipelined Memoryless 17.8 Gbps AES128 Encryptor, FPGA 2003, ACM Press 248, pp. 207-215, 2003.

Vous aimerez peut-être aussi