Académique Documents
Professionnel Documents
Culture Documents
ISSN: 2231-5381
http://www.ijettjournal.org
Page 3247
International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue8- August 2013
justification behind this is that having RISC architecture reduces the complexity of the design and also it becomes possible to speed-up the execution time. With CISC processor it is possible to speed up the operation time but it will prove to be difficult. Also, with the availability of cheap memory devices thanks to VLSI technology, size of the program code is no longer a limitation as much as the system performance and throughput is. Also it has been witnessed in the mobile computing realm that RISC processing is much more tuned for better performance than CISC architectures. With the above assumptions in mind, we have designed our 16-bit processor. The basic block diagram of the processor can be seen in Fig. 1 below
The processor has 8 KB of on-chip RAM for faster data execution. The processor has 16bit internal counter clocked at 20 MHz which is useful for measuring short intervals with high precision. The processor is also equipped with serial transmitter and receiver for use in communication systems. We have used vector processing in our design. Vector processors are special purpose computers that match a range of (scientific) computing tasks. These tasks usually consist of large active data sets, often poor locality, and long run times. In addition, vector processors provide vector instructions. These instructions operate in a pipeline (sequentially on all elements of vector registers), and in current machines.
All the signals have their usual meanings. The data core of the processor is responsible for computing. The memory signals are also extended to the outside of this module in order to connect to an optional external SRAM and I/O. The above figure shows the basic block diagram of processor and in this I have modified the design to optimize the size processor acquire on FPGA. In this design I have combined decoder and ALU together as execution unit, concept behind this approach is the vector data which can to be processed and executed simultaneously if decoding and execution done in a pipelined way together. Below in the Fig 2 is the design of designed processor.
III. PROCESSOR IMPLEMENTATION The entire processor has been designed using VHDL and implemented on Virtex FPGA from Xilinx [5] [6]. The Virtex user programmable gate array comprises two major configurable elements viz. Configurable logic blocks (CLBs) and input/output blocks (IOBs). Each CLB is composed of two slices. A slice contains 4-input 1-output Look-up Table (LUT) and two registers. Interconnections between these elements are configured by multiplexers controlled by SRAM cells programmed by a users bitstream. This structure allows a very powerful method of implementing arbitrary complex digital logic. The hardware has been simulated in Windows xp environment. The C compiler used to test the applications is the GCC obtained from MinGW which is the minimalist package of GNU tools and packages for Windows [7]. C code written for sample applications was converted to assembly language by the compiler and then tested on the processor.
ISSN: 2231-5381
http://www.ijettjournal.org
Page 3248
International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue8- August 2013
The RTL view of the processor generated by Xilinx ISE Design Suite is shown below in Fig 3:The FPGA device utilization summary is shown below :Logic utilization Number of splices Number of slice flip flops Number of 4 input LTUs Number of bonded IOBs Number of BRAMs Number of GCLKs used 418 229 available 768 1536 %utilization 54% 14%
789
1536
51%
The instruction set of the processor contains about 45 instructions for 16-bit and occasionally 32-bit operations. The entire processor uses WISHBONE compliant bus and operates in MASTER mode. The general interface diagram of WISHBONE specification is shown in Figure 4. This makes it easier for interfacing with other cores in multi-core SoC environment
48
124
38%
50%
12%
IV. TESTING AND DESIGN CRITICISM There are several advantages of our design. The first is the use of vector processing for speed-up of instruction execution. Second is the generic design of the processor at the HDL level such that it is possible to re-configure the processor for specific applications. Another advantage of this design is the use of lesser internal registers and more FPGA memories for reducing the design complexity of the processor while still maintaining adequate usage of FPGA resources. However, despite having several advantages, there are few drawbacks of the design. We always have to write the I/O module specific for particular FPGA board and there is no provision of a general I/O module that is applicable across a broad range of FPGA boards. This is an obvious problem which is bound to exist if it is to maintain specific bus specifications or hardware specifications. But this problem can be solved if a generic I/O module is designed that can be used across a large number of development boards. Another apparent drawback is that the HDL code is written specifically for a target FPGA (Virtex in this case) and hence may not be directly usable for other FPGA platforms. A general code in this case would have been more useful. However, the present design was concerned more with demonstrating the power of
Fig. 4: Wishbone Interface of Master and Slave MASTER and SLAVE interfaces are interconnected with a set of signals that permit them to exchange data. For descriptive purposes these signals are cumulatively known as a bus,and are contained within a functional module called the INTERCON. Address, data and other information is impressed upon this bus in the form of bus cycles.
ISSN: 2231-5381
http://www.ijettjournal.org
Page 3249
International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue8- August 2013
simple vector processing for fast instruction execution and also to demonstrate that FPGA based processors can be equally useful for real-time operating systems rather than to design just a single processor. The designed processor was tested by solving a scientific computational task. The task involved solving the set of Lorenz Equations which have been modeled several times before as a perfect example of chaotic system and is also believed to represent certain dynamics of weather forecasting. Initially, the Lorenz equations were written in C language and compiled using GCC compiler. The compiled target was executed on several processors (including ours) for benchmarking tests and the results are as follows :Param eters Execut ion time Suppor t for debug ging Multi core suppor t Cycle accurat e simula tion Process or A 1 min 20 sec yes Process or B 1 min 22 sec yes Process or C 1 min 20 sec no Our proce ssor 1 min 21 sec No(to be includ ed later) yes either as a stand-alone processing element or as a part of multi-processor SoC. The processor is WISHBONE compliant. The implementation of vector processing in the architecture substantially enhanced its processing capability wherein most instructions were executed within a single machine cycle rather as a rule than as a design. We intend to further take this work forward by designing a generic I/O module applicable for any standard FPGA. Also its performance in the context of a real-time system needs to be thoroughly evaluated and some of its features extended. Design of systems like real-time communication system, intelligent control and biometric systems are some ideas that we want to explore using this processor. REFERENCES
[1] L. Kaouane et al, A Methodology to Implement Realtime Applications on Reconfigurable Circuits, available at http://www-rocq.inria.fr/syndex [2] P. Kohlig et al, FPGA Implementation of high performance FIR Filters, In Proc. International Symposium on Circuits and Systems, 1997 [3] M. Shand, Flexible Image Acquisition using reconfigurable hardware In Proc. of the IEEE Workshop on Field Programmable Custom Computing Machines, April 1995. [4] J. Villasenor, Video Communication using rapidly reconfigurable hardware, IEEE Transactions on Circuits and Systems for Video Technology, Vol. 5, No. 12, pp. 565 567, Dec. 1995. [5] Xilinx Corporation. Xilinx breaks one million gate barrier with delivery of new virtex series October 1998. [6] Xilinx Corporation. Virtex Data Sheet 2000 [7] www.mingw.org
No
no
May be
yes
yes
yes
yes
In the above tests carried out, Processor A was STMicro 16-bit core, Processor B was Freescale 16bit core, Processor C was generic 16-bit benchmark core by OpenCores. All of these basic cores were chosen at random and without their on-chip resource usage. CONCLUSION In this paper we showed the design of a 16-bit RISC processor. The processor can be used within a large class of general purpose computing applications as well as for scientific computing tasks. It can be used
ISSN: 2231-5381
http://www.ijettjournal.org
Page 3250