Vous êtes sur la page 1sur 6

Custom Instruction Application in the NIOS II Soft Processor

Susilo Wibowo Jurusan Teknik Elektro, Universitas Surabaya Jalan Raya Kalirungkut, Surabaya +62-31-2981157 ext. 83 E-mail: susilo_w@ubaya.ac.id

Abstrak The purpose of this paper is to provide an overview for custom instruction in the NIOS II soft-processor. The custom instruction is a special logic block (hardware) that could be added into the NIOS II ALU. This unique custom instruction feature of NIOS II processors could be used to speed-up the algorithms dramatically, while reducing the size and complexity of software. This feature implements some part or entire algorithm in hardware and making it accessible to software through specially generated C software macros or inline assembly macros. This paper will use the custom instruction to execute basic matrix multiplication algorithm using floating point number as the data. Key words: NIOS II, custom instruction, floating point, matrix multiplication.

Advanced in reconfigurable hardware technology has gives a way to implements complex microprocessors with all the necessary peripheral components on a single chip. It is also give a way for the designer to make soft processor that will be used into the FPGA. Soft processor is a processor that is build from HDL (Hardware Description Language) like VHDL, Verilog, etc. NIOS II is one kind of soft processor that was designed by Altera. Soft processor helps the designers to make their design more flexible to accommodate the application need since they have more flexibility to configure the soft processor to fit the application. NIOS II has two features that can be used to add more specific function into it. One of the features is custom instruction. This feature is intended to speed-up the algorithm execution, while reducing the size and complexity of software. This feature implements some part or entire algorithm into hardware and add it into the NIOS II ALU. The custom instruction is accessed from software through specially generated C software macros or inline assembly macros. This paper will use the custom instruction to execute basic matrix multiplication algorithm using floating point number as the data. Firstly, the basic matrix multiplication algorithm is transformed into Data Flow Graph (DFG) and from the DFG representation, some part of the algorithm that is considered as the critical part from the algorithm is transformed to the hardware as a custom instruction. Finally, the performance enhancement of the custom instruction of this algorithm is measured against software-only implementation using special profiler function from NIOS II system.

NIOS II Custom Instruction

Custom instruction is a process of implementing complex sequence of standard instructions in hardware in order to reduce them to a single instruction that could be accessed by software. Custom instructions are used to implement complex processing tasks in single-cycle (combinatorial), multi-cycle (sequential), extended, and internal register file operations. Adding custom instructions to the NIOS II embedded processor instruction set accelerates time-critical software algorithms. In addition, these user-added custom instructions can access memory as well as logic outside the NIOS system. The two essential elements of custom instruction are the custom logic block and the software macro. Custom logic block is the hardware that performs the complex sequence of standard operations. Once implemented this block becomes part of NIOS II processors ALU. Figure 1 depicts how a custom logic is added to the NIOS II ALU. The software macro generated during custom logic implementation is used to access the custom logic through software code. The custom logic block is implemented with the aid of NIOS II CPU configuration wizard. The NIOS II configuration wizard integrates the custom logic block with the NIOS II processors ALU and also creates the software macros in C/C++ and assembly. With the appropriate interface provided by the NIOS II processor, a custom logic block can be designed to perform any function. The custom logic block can be created using any of the following formats: - VHDL - Verilog HDL - EDIF netlist file

- Quartus II Block Design file (.bdf) - Verilog Quartus Mapping File (.vqm) Since the logic block connects directly to ALU as shown in Figure 1, it provides an interface with predefined ports and names. The predefined ports and names that is used for custom logic for factory configured 32-bit NIOS II processor are shown in Figure 2 [1].

Figure 1. Custom Logic in the NIOS ALU

Figure 2. Custom Logic Block Interface for 32-bit NIOS II Processor

Custom Instruction Implementation

Basic NxN matrix multiplication algorithm with floating point data is used to explore the custom instruction implementation in NIOS II soft processor. Figure 3 and 4 show the data flow graph for basic 4 x 4 matrix multiplication algorithm. This algorithm is chosen because it has regular structure of multiplier and adder operations that will give more advantages when it transform into hardware as a custom instruction.

Figure 3. Data flow graph for 4x4 matrix multiplication part A

Figure 4. Data flow graph for 4x4 matrix multiplication part B Figure 5 shows the calculation process for one segment result matrix from the 4x4 matrix multiplication algorithm, it needs 4 multiplications and 3 additions. The algorithm was changed by adding one more addition after the first multiplication so the algorithm now has 4 multiply-adder blocks in the structure with 0 data values at the first adder input.

Figure 5 Multiplier-Adder package block This multiply-adder block is implemented as an extended multi-cycle custom instruction. In this custom instruction implementation, the custom instruction outputs its result also stores the result in the custom instruction internal register. This internal register data is used as second input in the next custom instruction adder operation. To create 0 data values that is required for the first adder input in the figure 5, this custom instruction is equipped with clear operation that will zeroes the internal register. Inline assembly is used to access the custom instruction, the inline assembly to clear the internal register is asm volatile("custom 6, %0, %1, %2" : "=r" (sumCI1) : "r" (tmp), "r" (tmp1)); Actually, for this clear operation, it doesnt matter what register you use as input or output since the custom instruction will ignore them. The inline assembly that perform the multiply-adder operation is asm volatile("custom 5, %0, %1, %2" : "=r" (sumCI1) : "r" (matrix1[i][0]), "r" (matrix2[0][j]));

Results Analysis
The number of clock cycles is measured using performance counter peripheral. This peripheral is a special build-in peripheral that came with the installation package from Altera. Table 1 presents the increase due to custom instruction implementation. The table compares the number of clock cycles between software only implementation with the Multiplier-Adder custom instruction implementation. The speed increase is calculated by divide the value in the second column with the value from the third column in the table. Table 1. Comparison between Software only with Multiplier-Adder Matrix Number of clock cycles Number of clock cycles for Speed Multiplication for custom instruction software only implementation increases dimension implementation 4x4 57168 9044 6.32 5x5 111491 16766 6.65 6x6 192837 28018 6.88

7x7 8x8 10x10

306715 456093 896373

44233 63776 122282

6.93 7.15 7.33

The custom instruction implementation can speed-up the NxN matrix multiplication algorithm significantly and the custom instruction gains bigger speed increases for the larger matrix dimension.

Reference 1. Altera Corporation, NIOS II Custom Instruction User Guide. 2. X. Wang and S.G. Ziavras, Parallel LU Factorization of Sparse matrices on FPGABased Configurable Computing Engines, Concurrency and Computation: Practice and Experience, vol. 16, no. 4, (April 2004), pp. 319-343. 3. Altera Corporation, NIOS II Processor Reference Handbook. 4. Altera Corporation, NIOS Software Development Tutorial. 5. Altera Corporation, NIOS II Software Developers Handbook. 6. S.S. Battacharyya, P.K. Murthy, and E.A. Lee, Software Synthesis from Dataflow Graphs, Kluwer Academic Publishers, 1996.