Project Report Ashish

*****DESIGN
OF 8-BIT RISC PROCESSOR*****
A Project Report submitted in partial fulfilment for the award of the Degree of Bachelor of Technology in Department of electronic & communication Engineering by
***** ASHISH TOMAR*****

***** Enrolment No.-JNU08BTEC013 ***** Under the supervision of
***** ARPAN SHAH *****

***** Designation *****
Department of electronic & communication Engineering
Jagan Nath University Jaipur

May 2012
-1-
Candidate Declaration
I, ASHSIH TOMAR .hereby declare that the work presented in this report entitled 8 BIT RISC MICROPROCESSOR in partial fulfillment of the requirements for the award of Degree of Bachelor of Technology, submitted in the Department of ELECTRONIC & COMMUNICATION at Jagan Nath University, Jaipur, is an authentic record of my own work under the supervision of ARPAN SHAH
I also declare that the work embodied in the present project report is my original work/extension of the existing work and has not been copied from any Journal/thesis/book, and has not been submitted by me for any other Degree/Diploma.
(Name & Signature of Candidate) Enrolment No.: JNU08BTEC013 Date: 29TH MAY 2012
-2-
Certificate of the Supervisor(s)
This
is
to
certify
that
the
project
report
entitled"8
BIT
RISC
MICROPROCESSOR submitted by ASHISH TOMAR for the award of Degree of Bachelor of Technology in the Department of ELECTRONIC & COMMUNICATION of Jagan Nath University, Jaipur, is a record of authentic work carried out by him/her under my/our supervision.
The matter embodied in this project report is the original work of the candidate and has not been submitted for the award of any other degree or diploma. It is further certified that he/she has worked with me/us for the required period in the Department of ELCTRONIC & COMMUNICATION, Jagan Nath University, Jaipur.
(Name and Signature of Supervisor) Date:.
-3-
Acknowledgements
I would like to express my sincere gratitude to my project guide ARPAN SHAH for giving me the opportunity to work on this topic. It would never be possible for us to take this project to this level without his innovative ideas and his relentless support and encouragement.
Name of Student(s):ASHISH TOMAR (Roll Number):-0802BTEC016
-4-
Abstract
Field Programmable Gate Array (FPGA) devices offer a large set of advantages due to their reconfigurable nature. Although their performance is not comparable to ASIC devices, their flexibility is usually more important especially when fast time-to-market is an issue and the production is on small scale basis. For that reason they are widely used in electronic applications both during prototyping but also for final-production systems. Processors are the most demanding when is comes to flexibility, cost and time to market.
RISC (Reduced Instruction Set Computer) are machines that have fixed size instructions, that can execute in one clock, and instructions interface with memory via fixed mechanism. There are only a small number of primitive instructions. RISC is based on using many simpler and faster instructions to do the same work as a single complicated instruction on CISC (Complex Instruction Set Computer) machine.
The aim of this project is the design of a 8-bit RISC processor for FPGA implementation. The Processor can execute 14 instructions, including 2 memory access operations. Verilog is chosen HDL for design entry. Xilinx Web Pack -ISE generates the programming file for the target device, SPARTAN -3.
-5-
INDEX
1. INTRODUCTION
1.1. Reduced Instruction Set Computers ..1 1.2. Field Programmable Gate Array 1.2.1. Look Up Tables4 1.2.2. Programmable Logic Array...4 1.2.3. Programmable Array Logic...4 1.2.4. FPGA.5 1.2.5. Spartan-3...7 1.3. Hardware Description Languages 1.3.1. Importance of HDLs8 1.3.2. Verilog HDL.8
2. FUNCTIONAL DESCRIPTION
2.1. Block Diagram9 2.2. Specifications..9 2.3. Instructions 2.3.1. Move Instructions..11 2.3.2. Arithmetic Instructions.11 2.3.3. Jump Instructions..13 2.3.4. Memory Access Instructions14 2.4. Targeted Performance Parameters....14
-6-
3. DESIGN ARCHITECTURE 3.1. Instruction Set Architecture 3.1.1. Instruction Format15 3.1.2. Source/Destination Format.16 3.1.3. Instruction Examples17 3.2. Modular Design..18 3.3. Top Level Entity 3.3.1. Block Diagram...19 3.3.2. Ports Description..19 3.3.3. Architecture...22 3.3.4. Source Register Selection23 3.3.5. Memory Access Operations23 3.3.6. Data Bus.................23 3.3.7. Destination Decoder24 3.3.8. Output Port Xout.24 3.4. Move Unit..25 3.5. Shift Unit26 3.6. Arithmetic Unit 3.6.1. Block Diagram..27 3.6.2. Ports Description..28 3.6.3. Architecture..29 3.6.4. Functionality.31 3.6.5. Flags...32
-7-
3.7. Program Counter..32 3.8. Instruction Register..34 3.9. Instruction Decoder..34 3.10. Control Unit.36 3.11. Data Memory...38 3.12. Program Memory....38
4. DESIGN IMPLEMENTATION 4.1. HDL Entry..39 4.2. Functional Simulation..40 4.3. Synthesis.41 4.3.1. Synthesis Constraints..41 4.3.2. Synthesis Report..42 4.4. Translate 4.4.1. NGD Build Overview.43 4.4.2. Conversion of Netlist to NGD...43 4.5. MAP 4.5.1. MAP Input Files..44 4.5.2. MAP Output Files...45 4.5.3. MAP Report.46 4.5.4. Post MAP Timing Report...46 4.6. Place & Route 4.6.1. Overview..49 4.6.2. Placing...50 4.6.3. Routing.50 4.6.4. Post PAR Timing Report50
-8-
4.7. BitGen Overview...53
5. SIMULATION RESULTS......54
6. CONCLUSION 6.1. Performance Parameters........58 6.2. Future Improvements.58
APPENDIX A RTL CODING

A.1. Move Unit....59 A.2. Shift Unit..59 A.3. Arithmetic Unit...59 A.4. Program Counter....60 A.5. Instruction Register....61 A.6. Instruction Decoder...62 A.7. Control Unit....63 A.8. Main Processor Unit..64
APPENDIX B INSTRUCTION SET..68
-9-
8 BIT RISC MICROPROCESSOR ARCHITECHTURE
- 10 -
1. INTRODUCTION
1.1 REDUCED INSTRUCTION SET COMPUTER (RISC)
An important factor in computer design prior to 1980 was that all memories, including the memory to store program instructions, were very expensive. So if you were a computer designer, you would want to make each of the instructions you design to be short but powerful. That way, when programmers write programs using your instructions, their code will be dense and will require little memory, but each bit of code would do a lot of work.
This would in a bunch of instructions of different lengths. Finally, you would also end up with a very rich collection of instructions that can interface with the computers data memory in many different ways: either dealing directly with the data memory, or demanding that data first be stored into temporary locations (registers) first, or some mix of the two.
Now because of this rich, powerful, and variable-length group of (compact) instructions youve designed, the computer would have several characteristics. First, each instruction might take several clock cycles to complete. Thats because each instruction would be of a different size, so figuring out what each one says is complicated; because each instruction could talk to memory in a different way; and because each instruction could potentially do a lot of work. Second, and for the same reasons just given, the computer speed might be fairly slow.
- 11 -
But as time passed, memory became cheaper, compilers got better, and the motivation for making small but really powerful instructions faded. In 1980, Patterson and Ditzel at Berkeley argued in favor of a different architecture having simple instructions, all of uniform length and that simpler operations. Sure, youd need to specify more of these simpler instructions to equal one of the old-style complicated instructions, and yes, this takes more instruction memory, but memory is cheap, and your computer can run faster and take fewer clocks.
For example, say you had a complicated instruction called MUL that told the computer to take two pieces of data from memory and multiply their sum with a third piece of data and put the result back somewhere else. This one instruction might take 10 clock cycles to complete. Now suppose we had a simple instruction set. To do the same work as MUL did, wed need perhaps 8 different instructions (a few loads, an add, a multiply, a store, etc.). But each instruction completes in a single clock cycle because each is so simple. And mybe the computers clock can run much faster, too. The
downside of the simple system, of course, is that it requires you to store 8 times as many instructions.
A Comparison: Complicated system does MUL: 1 instruction x 10 clocks/instr x 10 nsecond/clock = 100ns Simple system does the same work as MUL: 8 instructions x 1 clock/instr x 9 nseconds/clock = 72ns
Three systems based on this idea were built in the early 80s: the Berkeley machines RISC-I and RISC-II, the Stanford MIPS processor [2], and the IBM 801 [3]. Based on comparisons between these machines and what came before, some characteristics
- 12 -
commonly associated with RISC and CISC arose.
Reduced Instruction Set Computer (RISC) is based on using many simpler and faster instructions to do the same work as a single complicated instruction on a Complex Instruction Set Computer (CISC).
RISC machines are machines that have Instructions execute in one clock Instructions of a fixed size Instructions interface with memory via fixed mechanism A small number of primitive instructions Pipelining, a way to do more than one instruction at a time.
- 13 -
1.2. FIELD PROGRAMMABLE GATE ARRAYS (FPGA)

There is a better way to implement a logic function than to hook together discrete 74XX packages. One can use semiconductor memory, integrated circuits known as Programmable Logic Devices or get a Custom made IC to implement logic.
1.2.1 LOOK UP TABLES (MEMORY)

To implement N functions of some K variables, we need a memory with 2K locations and N bits per location (use one address line for each variable, use data out line for each function). Thus Memory is not efficient at implementing functions with lots of input variables or multiple functions with different inputs.
1.2.2 PROGRAMMABLE LOGIC ARRAY (PLA)

PLA was the first device used specially for implementing logic circuits, introduced in the early 1079s by Philips; the array consists of 2 levels of logic gates, a programmable wired AND-plane followed by a programmable wired OR-plane. It is designed to implement random logic expression in SOP form. PLAs are difficult to manufacture, because of 2 levels of configurable logic. Further this introduces significant propagation delay.
1.2.3 PROGRAMMABLE ARRAY LOGIC (PLA)

To overcome the problems of PLA, PAL devices were developed. It has single level of programmability. It is programmable wired AND-plane and fixed OR-plane. In PLA, Logic is represented in SOP form. The number of products in a SOP from will be limited to a fixed number. The number of variables in each product term limited by number of input pins. The numbers of independent functions are limited by number of output
- 14 -
pins.
1.2.4 FIELD PROGRAMMABLE GATE ARRAYS

A Field Programmable Gate Array or FPGA is a semiconductor device containing programmable logic components and programmable interconnects. The programmable logic components can be programmed to duplicate the functionality of basic logic gates such as AND, OR, XOR, NOT or more complex combinatorial functions such as decoders or simple math functions. In most FPGAs, these programmable logic components (or logic blocks, in FPGA parlance) also include memory elements, which may be simple flip-flops or more complete blocks of memories.
A hierarchy of programmable interconnects allows the logic blocks of an FPGA to be interconnected as needed by the system designer, somewhat like a one-chip programmable breadboard. These logic blocks and interconnects can be programmed after the manufacturing process by the customer/designer (hence the term "field programmable") so that the FPGA can perform whatever logical function is needed.
FPGAs are generally slower than their application-specific integrated circuit (ASIC) counterparts, can't handle as complex a design, and draw more power. However, they have several advantages such as a shorter time to market, ability to re-program in the field to fix bugs, and lower non-recurring engineering costs.
The historical roots of FPGAs are in complex programmable logic devices (CPLDs). CPLD logic gate densities range from the equivalent of several thousand to tens of thousands of logic gates, while FPGAs typically range from tens of thousands to several million. The primary differences between CPLDs and FPGAs are architectural. A CPLD has a somewhat restrictive structure consisting of one or more programmable SOP logic arrays feeding a relatively small number of clocked registers. The result of this is less
- 15 -
flexibility, with the advantage of more predictable timing delays and a higher logic to interconnect ratio. The FPGA architectures, on the other hand, are dominated by interconnect. This makes them far more flexible, but also far more complex to design for. Another notable difference between CPLDs and FPGAs is the presence in most FPGAs of higher-level embedded functions (such as adders and multipliers) and embedded memories. A related, important difference is that many modern FPGAs support partial in-system reconfiguration, allowing their designs to be changed "on the fly" either for system upgrades or for dynamic reconfiguration.
A recent trend has been to take the architectural approach a step further by combining the logic blocks and interconnects of traditional FPGAs with embedded microprocessors and related peripherals to form complete "systems on a programmable chip". Examples of such hybrid technologies can be found in the Xilinx Virtex-II PRO and Virtex-4 devices, which include one or more PowerPC processors embedded within the FPGA's logic fabric. An alternate approach is to make use of "soft" processor cores that are implemented within the FPGA logic. These cores include the Xilinx MicroBlaze and PicoBlaze, and the Altera Nios and Nios II processors, as well as third-party processor cores.
Applications of FPGAs include DSP, software-defined radio, aerospace and defense systems, ASIC prototyping, medical imaging, computer vision, speech recognition, cryptography, bioinformatics, computer hardware emulation and a growing range of other areas. As their size, capabilities and speed increased they began to take over larger and larger functions to the state where they are now marketed as competitors for full systems on chips. They now find applications in any area or algorithm that can make use of the massive parallelism offered by their architecture.
To define the behavior of the FPGA the user provides a hardware description language (HDL) or a schematic design. Common HDLs are VHDL and Verilog. Then,
- 16 -
using an electronic design automation tool, a technology-mapped netlist is generated. The netlist can then be fitted to the actual FPGA architecture using a process called place-and-route, usually performed by the FPGA Companys proprietary place-androute software. The user will validate the map, place and route results via timing analysis, simulation, and other verification methodologies. Once the design and validation process is complete, the binary file generated (also using the FPGA company's proprietary software) is used to (re)configure the FPGA device. To simplify the design of complex systems in FPGAs, there exist libraries of predefined complex functions and circuits that have been tested and optimized to speed up the design process. These predefined circuits are commonly called IP cores, and are available from FPGA vendors and third-party IP suppliers. In a typical design flow, an FPGA application developer will simulate the design at multiple stages throughout the design process. Initially the RTL description in VHDL or Verilog is simulated by creating test benches to stimulate the system and observe results. Then, after the synthesis engine has mapped the design to a netlist, the netlist is translated to a gate level description where simulation is repeated to confirm the synthesis proceeded without errors. Finally the design is laid out in the FPGA at which point propagation delays can be added and the simulation run again with these values back annotated onto the netlist.
1.2.5 SPARTAN 3
The Spartan-3 families of FPGA offer densities ranging from 50,000 to five million system gates. Spartan-3 FPGAs are ideally suited to a wide range of consumer electronics applications, including broadband access, home networking,
display/projection & digital television equipment, because of their exceptionally low cost. Features: - Up to 784 I/O pins - 622 Mb/s data transfer rate per I/O - Signal swing ranging from 1.14V to 3.45V
- 17 -
- Double Data Rate (DDR) support - DDR, DDR2 SDRAM support up to 333 Mbps
1.3 HARDWARE DESCRIPTION LANGUAGE VERILOG

The HDLs allow designers to model the concurrency of processes found in hardware elements. HDLs such as Verilog HDL and VHDL became very popular. 1.3.1. IMPORTANCE OF HDLs HDLs have many advantages compared to traditional schematic-based design. Design can be described at a very abstract level by use of HDLs. Functional Verification of the design can be done early in the design cycle. A textual description with comments is an easier way to develop and debug circuits.
1.3.2. Verilog HDL Verilog HDL has evolved as a standard hardware description language. Verilog HDL offers many useful features for the hardware design. Verilog is easy to learn and use. It is similar in syntax to the C programming language. Allows different levels of abstraction to be mixed in the same model. Most popular synthesis tools support Verilog HDL.
- 18 -
2. FUNCTIONAL DESCRIPTION
This chapter gives the detailed information about the functionality of the design and the implementation constraints.
2.1. BLOCK DIAGRAM
Fig 2.1 Functional Block Diagram
2.2. SPECIFICATIONS
The following instructions have to be implemented:
1. MOV dst, src
-- dst <= src
2. INC dst, src -- dst <= src + 1
- 19 -
3. DEC dst, src -- dst <= src - 1 4. ADD src 5. SUB src 6. SL dst, src 7. SR dst, src 8. CMP src -- src <= src + A -- src <= src - A -- dst <= shift left src -- dst <= shift right src -- set Z flag if src = A
9. MVI A, immediate -- A <= immediate data 10. LOAD dst 11. STORE src -- dst <= memory contents at -- address [CD] -- memory at [CD] <= src -- jump to PC + imm_offset -- jump to PC + imm_offset if Z=1 -- jump to address pointed by [CD]
12. JMP immediate_offset 13. JZ immediate_offset 14. JMPCD
Src, dst can be either A, B, C, D or X. PC is the program counter. [CD] represents the contents of register C and D after concatenation. D is the least significant byte.
A, B, C and D are 8-bit registers.
X is 8 bits wide Input and Output port. X is visible at the periphery as "X In" and "X Out" as I/O ports. When anything is assigned to X, it will appear at "X Out". When X is read, the contents at "X In" will be used.
Z flag is set whenever the result of any operation is zero. C flag is set whenever the result of any arithmetic operation results in a carry. S flag is set whenever the result of any arithmetic operation results in a negative number.
It is assumed that the program memory and the data memory have synchronous writes
- 20 -
and asynchronous reads. Write operation: On a clock edge when the WR is asserted the data on the data bus is written into the location pointed by address. Read operation: When the RD is asserted, the contents of the location pointed by address will be presented at the data bus by the memory. When RD is de-asserted the memory will stop driving the bus.
For the sake of simplicity, it is assumed that both the memories are fast enough to complete the read and write operations in one clock.
2.3. INSTRUCTIONS
2.3.1. MOVE INSTRUCTIONS
There are two move instructions 2.3.1.1. Move INSTRUCTION: MOV dst, src This instruction copies the 8-bit data from the source register to the destination register. Destination & Source can be registers A/B/C/D or the input-output port X
2.3.1.2. Move Immediate Data INSTRUCTION: MVI, immediate data This instruction moves the 8-bit data which is a part of the instruction itself, to the register A.
2.3.2. ARITHMETIC INSTRUCTIONS

There are 5 arithmetic instructions
- 21 -
2.3.2.1. Increment INCINSTRUCTION: INC dst, src This instruction retrieves the 8-bit data from the source register/port, increments it by 1 and stores in the destination register/port. The contents of source register remain unchanged.
2.3.2.2. Decrement INSTRUCTION: DEC dst, src This instruction retrieves the 8-bit data from the source register/port, decrements it by 1 and stores in the destination register/port. The contents of source register remain unchanged. 2.3.2.3. Addition INSTRUCTION: ADD src This instruction retrieves the 8-bit data from the source register/port, increments it by the contents of register A, and stores the result back in the source register/port. 2.3.2.4. Subtraction INSTRUCTION: SUB src This instruction retrieves the 8-bit data from the source register/port, decrements it by the contents of register A and stores the result back in the source register/port. 2.3.2.5. Compare INSTRUCTION: CMP src This instruction retrieves the 8-bit data from the source register/port, compares it with the contents of register A, and sets Z flag high if both are equal. This instruction does not modify the contents of the source register/port.
2.3.2.6. Shift Left
- 22 -
INSTRUCTION: SL dst, src This instruction retrieves the 8-bit data from the source register/port and left shifts the data by 1-bit and stores the result in destination register/port. This instruction does not modify the contents of the source register/port.
2.3.2.7. Shift Right INSTRUCTION: SR dst, src This instruction retrieves the 8-bit data from the source register/port and right shifts the data by 1-bit and stores the result in destination register/port. This instruction does not modify the contents of the source register/port.
2.3.3. JUMP INSTRUCTIONS

The jump instructions are used to modify the sequence of instruction execution, by changing the value of program counter. The processor can execute three kinds of jump instructions.
2.3.3.1. Jump by immediate offset INSTRUCTION: JMP immediate_offset The value of the program counter is incremented by the value given as the immediate data. Immediate data is a part of the instruction itself. 2.3.3.2. Jump by immediate offset if Z flag is Set INSTRUCTION: JZ immediate_offset The value of the program counter is incremented by the value given as the immediate data, if the Z flag is high. Immediate data is a part of the instruction itself. If the Z flag is not set, then the program counter will increment by 1 as in other instructions.
2.3.3.3. Direct Jump
- 23 -
INSTRUCTION: JMPCD The value of the program counter is changed to the address pointed by the concatenation of the contents of the register C and D.
2.3.4. MEMORY ACCESS OPERATOINS

The processor can execute 2 memory access instructions. 2.3.4.1. Load Data INSTRUCTION: LOAD dst This instruction loads the destination register/port with 8-bit data retrieved from the Data Memory. The 16-Bit address of the data memory, from which data is retrieved, is given by the concatenation of the contents at registers C and D.
2.3.4.2. Store Data INSTRUCTION: STORE src This instruction stores data memory with the 8-bit data of the source register. The address of the data memory where the contents of source is stored is given by the concatenation of the contents at registers C and D.
2.4. TARGETED PERFORMANCE PARAMETERS

There are few performance parameters that the design needs to reach. The design is expected to have a worst case delay of 5ns, i.e. the processor is expected to have a maximum frequency of 200 MHz. Instruction opcodes are to be designed in such a way that implementation requires minimum hardware delays. An optimum instruction size is to be chosen. Tristate buffers are allowed inside the processor. Each instruction has to be executed in a single clock cycle. Modification of the instructions to improve performance is allowed. - 24 -
More instructions may also be added.
3. DESIGN ARCHITECTURE
This chapter explains the internal architecture of the top level entity and the sub modules. First the instruction set architecture was finalized an then the final design
3.1. INSTRUCTION SET ARCHITECTURE

The design is made for a total of 14 Instructions. The instruction set is designed to have equal instruction size for every instruction. The instruction size is chosen to be 11Bits. The X in the instructions means dont care condition i.e. the instruction will work in the same way either 1 or 0 is entered in that position.
3.1.1 INSTRUCTION FORMAT

Instructions MVI, JMP and JZ have immediate data/offset as the part of the instructions
1. MVI 2. JMP 3. JZ
: 01_< 8-Bit Immediate Data>_X : 10_< 8-Bit Immediate Offset>_X : 01_< 8-Bit Immediate Offset>_X
Instructions MOV, INC, DEC, SL and SR have both destination and source as the part of the instruction. 4. MOV : 00_001_< 3-Bit Destination>_< 3_Bit Source>
- 25 -
5. INC 6. DEC 7. SL 8. SR
: 00_010_< 3-Bit Destination>_< 3_Bit Source> : 00_011_< 3-Bit Destination>_< 3_Bit Source> : 00_100_< 3-Bit Destination>_< 3_Bit Source> : 00_101_< 3-Bit Destination>_< 3_Bit Source>
The destination register/port for the instructions ADD, CMP and SUB are same as the source, so no need of mentioning the destination in the instruction. 9. CMP 10. ADD 11. SUB : 00_000_00X_< 3_Bit Source> : 00_000_010_< 3_Bit Source> : 00_000_011_< 3_Bit Source>
The source in the case of LOAD instruction is fixed i.e. the data memory, and in the case of STORE instruction, the SOURCE is fixed i.e. Data Memory
12. LOAD 13. STORE
: 00_110_< 3_Bit Destination>_XXX : 00_111_XXX_< 3_Bit Source>
The direct jump instruction JMPCD doesnt require any destination, source or immediate data to be the part of the instruction
14. JMPCD : 00_000_1XX_XXX
3.1.2. SOURCE / DESTINATION FORMAT

Source can be one of the registers A, B, C, D or the input port Xin Total of 3-bits are required to define the source A B : : 000 001 - 26 -
C D Xin
: : :
010 011 1XX
Destination can be one of the registers A, B, C, D or the output port Xout Total of 3-bits are required to define the source A B C D : : : : 000 001 010 011 1XX
Xout :
3.1.3. INSTRUCTION EXAMPLES

1. MOV B,A i.e. Move the contents of register A to B Destination is B Source is A Instruction Code : : : 001 000 00_001_001_000
2. ADD D i.e. Add the contents of register D to A and store the result in D Destination is B Source is D Instruction Code : : : Not Required, Same as Source 011 00_000_010_011
3. MVI A7 i.e. Move immediate data A7 to register A Destination is A Data Instruction Code : : : Not Required, It is fixed 1010_0111 01_1010_0111_1 / 01_1010_0111_0
- 27 -
3.2. MODULAR DESIGN

Selection of the correct design hierarchy is advantageous for the following reasons. Improves simulation and synthesis results Improves debugging and modifying modular designs Allows parallel engineering (a team of engineers can work on different parts of the design at the same time) Improves the placement and routing of the design by reducing routing congestion and improving timing Allows for easier code reuse in the current design, as well as in future designs
In my design there are modules for arithmetic operations, logical operations, move operations, jump operations, instructions register and control unit. All the units are interconnected inside the Top module. The different modules are:
Move unit Shift Unit Arithmetic Unit Program Counter Instruction Register Instruction Decoder Control Unit Data Memory. Program Memory.
Selection of source register, Selection of the Destination register, Selection of input data to the destination register, control signal for the buffer for Xout and control signal
- 28 -
for data bus connected to the data memory are generated inside the top level entity.
3.3. TOP LEVEL ENTITY
3.3.1. BLOCK DIAGRAM
Xin 8 Xout 8 MAIN PROCESSOR UNIT Clk 16 Rst 11 16
Addr_PC IR_in
Data_inout 8 Addr_data wr_data rd_data
3.3.2. PORTS DESCRIPTION 1. Xin

Length Type : 8 Bit : Input
- 29 -
Use
: This port can be used by the user for providing immediate data for various instructions
2. Xout
Length Type Use : 8 Bit : Output : This port can be used by the user for getting the immediate result of various instructions
3. Clk
Length Type Use : 1 Bit : Input : This port provides the global clock signal used to synchronize the internal registers, program memory and the data memory
4. Rst
Length Type Use : 1 Bit : Input : This port provides the global reset signal to all the internal registers, program memory, data memory, instruction register etc.
5. Addr_PC
Length Type Use : 16 Bit : Output : This port serves as the address lines for the 6K x 11 Bits program memory
6. IR_in
Length Type Use : 11 Bit : Input : This port provides 11-bit instruction to the processor fetched from the program memory
7. Data_inout
Length Type Use : 8 it : Inout : This port provides the 8-bit data to-and-from the data memory. Buffers control the direction of data flow
- 30 -
8. Addr_data
Length Type Use : 16 Bit : Output : This port serves as the address lines for the 6K x 8 Bits data memory
9. wr_data
Length Type Use : 1 Bit : Output : This port provides the write signal to the data memory when data has to be written to the data memory
10. rd_data
Length Type Use : 1 Bit : Output : This port provides the read signal to the data memory when data has to be read from the data memory
- 31 -
- 32 -
3.3.4. SOURCE REGISTER SELECTION

There are four registers A, B, C, D and one input port Xin. The source can be identified with the help of Instruction bits I[3:1}. The instruction bits I[2:1] are used to identify the source register A/B/C/D. The instruction bit I[3] is used to identify that weather the source is input port Xin or one of the registers.
A 8-bit, 4-to-1 multiplexer with the select lines as I[2:1] is used to identify the register. Another 8-bit, 2-to-1 multiplexer with the select line as I[3] is used to select either the input port Xin or the already selected register. For e.g. if I[3] bit is 1 then irrespective of the bits I[2:1], the source will be input port Xin and if I[3] is 0 then the source will be selected according to the value of the bits I[2:1].
3.3.5. MEMORY ACCESS OPERATIONS

There are two memory access operations, load and store. The load operation and the store use the same bi-directional data bus to read and write data. So the direction of flow of data is controlled with the help of 2, 8-bit tristate buffers. The control lines wr_data and rd_data are generated inside the control unit. The write/store operation is synchronous and the read/load operation is asynchronous. The address for the data bus is given by the concatenation of the registers C and D.
3.3.6. DATA BUS

The contents of the source register/port are modified by 3-parallel modules, i.e. Move Unit, Arithmetic Unit and Shift Unit. The data to be sent of data bus is selected by a 8-bit, 4-to-1 multiplexer with three of the inputs being the three above mentioned units and the fourth input being the 8-bit line from data memory (for LOAD instruction). The - 33 -
select lines for this multiplexer are generated by the control unit.
3.3.7. DESTINATION DECODER

The Data bus is the common input to all the registers. The data from the data bus is stored on a particular destination register by enabling the load signal of that particular register. The load signals are generated using a 2X4 decoder. The four outputs represent the load signals of the four registers. The 2-bit input to the decoder comes from the destination bits of the instruction.
The destination is represented in the bits I[6:4] of the instruction. Only two (least significant, I[5:4]) of these bits are required to select one of the four registers, the third bit is used to select Xout as the destination. The instructions ADD and SUB have the destination same as the source. So for these two instructions the bits used as input to the destination decoder are I[2:1]. A 2-bit, 2-to1 multiplexer is used for this purpose. The input to this MUX are I[5:4] and I[2:1]. The select line is generated inside the control unit.
One more signal En_dec is used which serves as the enable for the decoder. This signal is also generated inside the control unit. If the control signal for Xout goes high, then also the destination decoder get disabled.
3.3.7. OUTPUT PORT Xout

There is a latency of one clock between the loading of the instruction and the storing of result when the destination is selected to be one of the registers, because the registers are loaded with the result only on positive edge of the clock. But when the destination is selected to be Xout port, then there is no latency. So to make the operations symmetric I have included one more 8-bit register X. The output of this register is connected to the port Xout. So the value of Xout also changes only on the rising edge of the clock. A 1-bit - 34 -
register is also being introduced in the design to store the value of control signal for tristate buffer for Xout. As for the destination decoder, the control signal for the Xout tristate buffer is generated using a 1-bit, 2-to-1 multiplexer. The inputs to this MUX are I[6] and I[3]. The select line is generated inside the control signal. Another signal Xout_buf is used which is ANDed with the output of the MUX. The result is stored in a 1-bit register X_buf, the output of which is connected to the control line of the tristate buffer for Xout. The signal Xout_buf is generated inside the control unit.
3.4. MOVE UNIT

The move unit performs two instructions: 1. MOV dst, src 2. MVI, immediate data
3.4.1. ARCHITECTURE
I[9:2]
src
8 1
8 0
I[10]
8 Result_mu - 35 -
Instruction
1. MVI immediate data 2. MOV dst, src : :
Instruction Code
0__1__< 8-bit immediate data>_X 0__0__001_< 3-bit Destination>_< 3-bit source> I[10]
So depending upon the instruction bit I[10 ], the multiplexer will select either the instruction bits I[9:2] (i.e. the immediate data) or the source
3.5. SHIFT UNIT

The shift unit performs two instructions: 1. SL dst, src 2. SR dst, src
3.5.1. ARCHITECTURE
{0, src[7:1]}
{src[6:0], 0}
8 1
8 0
I[7]
8 - 36 Result_su
Instruction
1. SL dst, src 2. SR dst, src : :
Instruction Code
00_10__0_< 3-bit Destination>_< 3-bit source> 00_10__1_< 3-bit Destination>_< 3-bit source> I[7]
So depending upon the instruction bit I[7 ], the multiplexer will either left shift the source by 1-bit or right shift by 1-bit.
3.6. ARITHMETIC UNIT

The arithmetic unit performs five instructions: 1. INC dst, src 2. DEC dst, src 3. ADD src 4. SUB src 5. CMP src
3.6.1. BLOCK DIAGRAM
- 37 -
src 8 8
Cin
Sub
I[8]
q_S
q_C
ARITHMETIC UNIT 8 Result_au S C Z
3.6.2. PORTS DESCRIPTION 1. src

Length Type Use : 8 Bit : Input : This port provides the data from the source register/port.
2. A
Length Type Use : 8 Bit : Input : This port always provides the contents of register A for SUB, ADD and CMP instructions.
3. Cin
Length Type Use : 1 Bit : Input : This port provides the carry-in signal to the adder inside the arithmetic unit. This signal is generated inside the control unit.
4. Sub
Length Type Use : 1 Bit : Input : This signal is generated inside the control unit. If Sub goes high then the 2nd input the adder is converted to its 2s complement form
5. I[8]
Length Type Use : 1 Bit : Input : This is the 8th bit of the instruction. This line is used to select the - 38 -
2nd input to the adder inside the unit.
6. q_C
Length Type Use : 1 Bit : Input : This signal is enable signal for the carry signal for the carry flag.
7. q_S
Length Type Use : 1 Bit : Input : This signal is enable signal for the Sign signal for the Sign flag.
8. Result_au
Length Type Use : 8 Bit : Output : This port gives the result of the arithmetic unit.
9. Z
Length Type Use : 1 Bit : Output : This signal is given to the Zero flag inside the top entity
10. C
Length Type Use : 1 Bit : Output : This signal is given to the Caary flag inside the top entity
11. S
Length Type Use : 1 Bit : Output : This signal is given to the Sign flag inside the top entity
3.6.3. ARCHITECTURE
The basic block inside the arithmetic unit is an 8-bit ripple carry adder. One input
- 39 -
to the adder is fixed, i.e. the 8-bit source. The second input to the adder depends upon the instruction to execute. The subtraction operations are also performed using the same adder by performing the 2s complement operation of the input to be subtracted by using 8 XOR gates. One input to the arithmetic unit comes from the Source register/port and the second input is fixed to register A Sign, Carry and Zero flags are the part of the top level entity, but their values are generated inside the arithmetic unit only.
Inst. No. q5 q6 q10 q11 q9
Inst INC DEC ADD SUB CMP
Inst. Code 000_ 1 _0_<dst><src> 000_ 1 _1_<dst><src> 000_ 0 _00_10__<src> 000_ 0 _00_11__<src> 000__0__00_0X__<src>
I/P1 Src Src Src Src Src
I/P2 0 0 A A A
Cin 1 0 0 1 1
Sub 0 1 0 1 1
Operation Src + 1 Src - 1 Src + A Src - A Src - A
- 40 -
0 A
8 8 I[8] Src 8 Sub
I/P1 Cout
-------I/P2-------- 8-Bit Adder Cin
8 Result_au
Depending upon the value of instruction bit I[8], the input 2 will be either 0 or
register A
Instruction nos. given here are generated by the instruction register discussed
later
Thus by controlling the values Cin, Sub and I/P2, different operation can be
performed by the same unit.
o If Sub is 1 and Cin is 0 then the 2nd input is converted to its 1s

complement form.
o If Sub is 1 and Cin is 0 then the 2nd input is converted to its 2s

complement form i.e. to its negative value.
- 41 -
3.6.4. FUNCTIONALITY
1. INC: The 2nd input to the adder is 0 and Cin is high, so the result comes out to be source +1 2. DEC: The 2nd input is Zero, Sub is high and Cin is low, the result is source + 1s complement of 0 i.e. 1111_1111 which is also the 2s complement of 1. So the result comes out to be source 1 3. ADD: Cin and Sub both are low, so the 2nd input i.e. A, is passed as it is. The result comes out to be source + contents of register A. 4. SUB: Cin and Sub both are high, so the 2nd input i.e. A, is converted to its 2s complement form i.e. its negative value. The result comes out to be source contents of register A. 5. CMP: Its functionality is exactly the same as Sub, the only difference being that the result in this case is not stored in any register.
3.6.5. FLAGS
The flags are the part of the top level entity, but the values to be loaded in them are generated inside the arithmetic unit 1. Carry: This is be high only if there is a carry out and the instruction being executed is ADD or INC 2. Sign: This is high only if carry out is low and the instruction being executed is SUB, CMP or DEC 3. Zero: This is high if the result of the arithmetic unit is 0
The signals q_S and q_C controlling the Sign and Carry flags are generated inside the Control unit. - 42 -
PROGRAM COUNTER
This unit performs three instructions: 1. JMP immediate offset 2. JZ immediate offset 3. JMPCD
3.7.1. ARCHITECTURE
If instruction is JMPCD i.e. q14 is high then the program counter will be loaded with the value stored in registers C & D If q14 is low then there can be three cases 1. Instruction is JMP 2. Instruction is JZ and Zero flag is set. In both these cases the program counter will be loaded with a new value which is equal to the old value plus the 8- bit immediate offset which is specified in the instruction bits I[9:2]. 3. If all of the above conditions are not met then the program counter will be just incremented by 1.
- 43 -
q14 16 CD 16
rst clk
PROGRAM COUNTER 8
S4 I[9:2] 8 00000001
16
16-BIT ADDER
Address Lines for Program Memory
Signal S4 is generated inside the control unit
3.8. INSTRUCTION REGISTER

The instruction register is a 11-bit triggered register. It loads the instructions on the positive edge of the clock. The instruction to the instruction register is fed from the program memory. The address for the program memory is taken by the value of the program counter.
- 44 -
3.9. INSTRUCTION DECODER

This unit is used to identity the instruction being executed. The input to this unit is the op-code part of the instruction which comes from the instruction register. Output of this unit is a 14-bit port where each bit represents one of the 14 instructions. All the instructions have different operation codes, so at time only one of the 14 bits will be high in the output. 1. MVI : 01_< 8-Bit Immediate Data>_X
q[1] = I[11] I[10] 2. JMP : 10_< 8-Bit Immediate Offset>_X
q[2] = I[11] I[10]
3. JZ
: 01_< 8-Bit Immediate Offset>_X
q[3] = I[11] I[10] 4. MOV : 00_001_< 3-Bit Destination>_< 3_Bit Source>
q[4] = I[11] I[10] I[9] I[8] I[7]
5. INC
: 00_010_< 3-Bit Destination>_< 3_Bit Source>
q[5] = I[11] I[10] I[9] I[8] I[7] 6. DEC : 00_011_< 3-Bit Destination>_< 3_Bit Source>
q[6] = I[11] I[10] I[9] I[8] I[7]
7. SL
q[7] = I[11] I[10] I[9] I[8] I[7]
8. SR
- 45 -
q[8] = I[11] I[10] I[9] I[8] I[7]
9. CMP
: 00_000_00X_< 3_Bit Source>
q[9] = I[11] I[10] I[9] I[8] I[7] I[6] I[5]
10. ADD
: 00_000_010_< 3_Bit Source>
q[10] = I[11] I[10] I[9] I[8] I[7] I[6] I[5] I[4]
11. SUB
: 00_000_011_< 3_Bit Source>
q[11] = I[11] I[10] I[9] I[8] I[7] I[6] I[5] I[4]
12. LOAD
: 00_110_< 3_Bit Destination>_XXX
q[12] = I[11] I[10] I[9] I[8] I[7]
13. STORE
: 00_111_XXX_< 3_Bit Source>
q[13] = I[11] I[10] I[9] I[8] I[7]
14. JMPCD : 00_000_1XX_XXX q[14] = I[11] I[10] I[9] I[8] I[7] I[6]
3.10. CONTROL UNIT

Control unit generates many control signals required by different modules and the top level entity. The inputs to the control unit are Decoded Instructions from the instruction decoder and the values of the flags. The output is many control signals.
Signals to arithmetic unit
- 46 -
1. q_C: This is the enabling signal for the carry flag. It is high only if the instruction being executed is ADD(q10) or INC(q5). q_C = q[5] + q[10]
2. q_S: This is the enabling signal for the sign flag. It is high only if the instruction being executed is SUB(q11) or DEC(q6) or CMP(q9). q_C = q[6] + q[9] + q[11]
3. Sub: As shown in the table in arithmetic unit, this signal is high in the case of DEC, CMP and SUB Sub = q[6] + q[9] + q[11]
4. Cin: As shown in the table in arithmetic unit, this signal is high in the case of INC, CMP and SUB Cin = q[5] + q[9] + q[11]
Signals to Program Counter 1. S4: This signal selects the immediate offset to be added to contents of the program counter. It is high if the instruction being executed is JMP or if the instruction begin executed is JZ and Zero flag is set at the same time S4 = q[2] + q[3].Z
Signals to Data Memory 1. wr_data: This signal goes high if the instruction being executed is STORE. wr_data = q[13]
2. rd_data: This signal goes high if the instruction being executed is LOAD. rd_data = q[12]
- 47 -
Signals to Top level Entity 1. ld_flags: This is the load signals for the flags. This signal is high if the instruction being executed in an arithmetic instruction. ld_flags = q[5] + q[6] + q[9] + q[10] + q[11]
2. S2: This signal selects the either destination or the source bits for the input to the destination decoder. This signal is high only if the instructions being executed is ADD or SUB which have destination same as the source. S2 = q[10] + q[11]
3. Xout_buf: This signal is ANDed with the destination bit to generate the control signal for the Xout tristate buffer. This signal is high only if the instruction being executed involves any destination. Xout_buf = q[4] + q[5] + q[6] + q[7] + q[8] + q[10] + q[11] + q[12]
4. En_dec: This signal is NORed with the control signal of Xout tristate buffer to generate the enable signal for the Destination Decoder. This signal is high only if the instruction being executed doesnt involve any destination. So if either the control signal for Xout goes high or this En_dec signal goes high, it will disable the destination decoder. En_dec = q[1] + q[2] + q[3] + q[9] + q[13] + q[14] 5. S1, S0: These are the select lines for the multiplexer which selects the result of which unit should be present on the data bus. Their value is 00 for Move Unit 01 for Arithmetic Unit 10 for Shift Unit 11 for LOAD Instruction
- 48 -
So these signals are generated by 4X2 Encoder. The Input to the encoder begin E[3:0] where: E[0] = q[1] + q[4] E[1] = q[5] + q[6] + q[9] + q[10] + q[11] E[2] = q[7] + q[8] E[3] = q[12]
3.11. DATA MEMORY

The data memory is a block RAM of size 65kbytes. The data memory has a synchronous write and asynchronous read. The address lines for it comes from the concatenation of the contents of the registers C & D. The data line for the memory is bidirectional. Write and Read operations are controlled by the wr_data and rd_data signals generated by the control unit.
3.12. PROGRAM MEMORY

The program memory is a block RAM with 65536 locations and 11 bits per location. This stores the instructions to be executed by the processor. Read operation is asynchronous. The address line for the program memory comes from the 16-bit program counter.
4. DESIGN IMPLEMENTATION
This chapter details the complete design flow for the FPGA implementation of the design. The target device is SPARTAN 3.
- 49 -
Fig 4.1 FPGA Design Flow
4.1. HDL ENTRY

The first step in implementation of the design is creating the HDL code based on the design criteria. The following recommendations were taken care of to create effective design.
Using RTL Code

Usage of register transfer level (RTL) code and avoiding (when possible) instantiating specific components creates designs with the following characteristics.
Readable code Faster and simpler simulation Portable code for migration to different device families
- 50 -
Reusable code for future designs
In my design, Verilog is the HDL used to make the design entry.
4.2. FUNCTIONAL SIMULATION

Functional or RTL simulation is used to verify the syntax and functionality of the design. The following recommendations were used for simulating the design.
Typically with larger hierarchical HDL designs, one should perform separate simulations on each module before testing the entire design. This makes it easier to debug your code.
Once each module functions as expected, a test bench is created to verify that entire design functions as planned. The same test bench is used again for the final timing simulation to confirm that the design functions as expected under worstcase delay conditions.
My designs functionality was tested successfully
4.3. SYNTHESIS
After creating HDL design, you must synthesize it. During synthesis, behavioral information in the HDL file is translated into a structural netlist, and the design is optimized for a Xilinx device. Xilinx offers its own synthesis tool, Xilinx Synthesis Technology (XST). XST is a Xilinx tool that synthesizes HDL designs to create Xilinx specific netlist files called NGC files. The NGC file is a netlist that contains both logical design data and constraints that takes the place of both EDIF and NCF files.
- 51 -
4.3.1. SYNTHESIS CONSTRAINTS

Constraints are essential to help you meet your design goals or obtain the best implementation of your design. Constraints are available in XST to control various aspects of the synthesis process itself, as well as placement and routing. Synthesis algorithms have been tuned to automatically provide optimal results in most situations. In some cases, however, synthesis may fail to initially achieve optimal results; some of the available constraints allow you to explore different synthesis alternatives to meet your specific needs. Following is a list of some HDL Options that can be set within the HDL Options tab of the Process Properties dialog box for FPGA devices:
FSM Encoding Algorithm Case Implementation Style FSM Style RAM Extraction RAM Style Mux Style Decoder Extraction Priority Encoder Extraction Shift Register Extraction Logical Shifter Extraction
4.3.2. SYNTHESIS REPORT

While synthesizing the design, Xilinx XST creates a synthesis report also having my details like Device utilization, Macro Statistics, Timing etc. The following shows some parts of the synthesis report generated for the top level entity of my design
HDL Synthesis Report ====================
- 52 -
Macro Statistics ---------------# Adders/Subtractors 16-bit adder carry out 8-bit adder carry in/out # Registers 1-bit register 11-bit register 16-bit register 8-bit register # Multiplexers 1-bit 4-to-1 multiplexer 8-bit 4-to-1 multiplexer # Tristates 8-bit tristate buffer # Xors 8-bit xor2 Device utilization summary: --------------------------Selected Device : 3s200pq208-5 Number Number Number Number Number of of of of of Slices: Slice Flip Flops: 4 input LUTs: bonded IOBs: GCLKs:
: : : : : : : : : : : : : : :
2 1 1 9 2 1 1 5 3 2 1 3 3 1 1
82 77 146 71 1
out out out out out
of of of of of
1920 3840 3840 141 8
4% 2% 3% 50% 12%
TIMING REPORT ------------Minimum Minimum Maximum Maximum period: 10.599ns (Maximum Frequency: 94.347MHz) input arrival time before clock: 7.845ns output required time after clock: 10.277ns combinational path delay: 7.862ns
4.4. TRANSLATE
4.4.1. NGD Build Overview
NGD Build reads in a netlist file in EDIF or NGC format and creates a NGD file that contains a logical description of the design in terms of logic elements, such as AND gates, OR gates, decoders, flip-flops, and RAMs.
- 53 -
The NGD file contains both a logical description of the design reduced to Xilinx Native Generic Database (NGD) primitives and a description of the original hierarchy expressed in the input netlist. The output NGD file can be mapped to the desired device family.
4.4.2. Conversion of Netlist to NGD File

NGD Build performs the following steps to convert a netlist to an NGD file: 1. Reads the source netlist. NGD Build invokes the Netlist Launcher. The Netlist Launcher determines the input netlist type and starts the appropriate netlist reader program. The netlist reader incorporates NCF files associated with each netlist. NCF files contain timing and layout constraints for each module. 2. Reduces all components in the design to NGD primitives. NGD Build merges components that reference other files. NGD Build also finds the appropriate system library components, physical macros (NMC files), and behavioral models.
3. Checks the design by running a Logical Design Rule Check (DRC) on the converted design Logical DRC is a series of tests on a logical design.
4. Writes an NGD file as output
4.5. MAP
The MAP program maps a logical design to a Xilinx FPGA. The input to MAP is an NGD file, which is generated using the NGD Build program. The NGD file contains a logical description of the design that includes both the hierarchical components used to develop the design and the lower level Xilinx primitives. The NGD file also contains any number of NMC (macro library) files, each of which contains the definition of a physical macro. MAP first performs a logical DRC (Design Rule Check) on the design in the NGD - 54 -
file. MAP then maps the design logic to the components (logic cells, I/O cells, and other components) in the target Xilinx FPGA. The output from MAP is an NCD (Native Circuit Description) filea physical representation of the design mapped to the components in the targeted Xilinx FPGA. The mapped NCD file can then be placed and routed using the PAR program.
4.5.1. MAP Input Files

MAP uses the following files as input:
NGD fileNative Generic Database file. This file contains a logical description of the design expressed both in terms of the hierarchy used when the design was first created and in terms of lower-level Xilinx primitives to which the hierarchy resolves. The file also contains all of the constraints applied to the design during design entry or entered in a UCF (User Constraints File). The NGD file is created by the NGD Build program.
NMC fileMacro library file. An NMC file contains the definition of a physical macro. When there are macro instances in the NGD design file, NMC files are used to define the macro instances. There is one NMC file for each type of macro in the design file.
Guide NCD fileAn optional input file generated from a previous MAP run. An NCD file contains a physical description of the design in terms of the components in the target Xilinx device. A guide NCD file is an output NCD file from a previous MAP run that is used as an input to guide a later MAP run.
Guide NGM fileA binary design file containing all of the data in the input NGD file as well as information on the physical design produced by the mapping.
- 55 -
4.5.2. MAP Output Files

Output from MAP consists of the following files:
NCD (Native Circuit Description) filea physical description of the design in terms of the components in the target Xilinx device.
PCF (Physical Constraints File)an ASCII text file that contains constraints specified during design entry expressed in terms of physical elements. The physical constraints in the PCF are expressed in Xilinxs constraint language. MAP creates a PCF file if one does not exist or rewrites an existing file.
NGM filea binary design file that contains all of the data in the input NGD file as well as information on the physical design produced by mapping. The NGM file is used to correlate the back-annotated design netlist to the structure and naming of the source design.
MRP (MAP report)a file that contains information about the MAP run. The MRP file lists any errors and warnings found in the design, lists design attributes specified, and details on how the design was mapped (for example, the logic that was removed or added and how signals and symbols in the logical design were mapped into signals and components in the physical design). The file also supplies statistics about component usage in the mapped design.
4.5.3. MAP REPORT

The MAP report is generated in the following format
______________________________ Table of Contents ---------------------------------------------Section 1 - Errors Section 2 - Warnings
- 56 -
Section 3 - Informational Section 4 - Removed Logic Summary Section 5 - Removed Logic Section 6 - IOB Properties Section 7 - RPMs Section 8 - Guide Report Section 9 - Area Group Summary Section 10 - Modular Design Summary Section 11 - Timing Report Section 12 - Configuration String Information Section 13 - Additional Device Resource Counts
____________________________________
4.5.4. POST MAP TIMING REPORT

The timing report generated after MAP process contains all the component delays. But this report doesnt take care of the interconnect delays. So the delays for the same type of components come out to be exactly same.
The Post MAP Timing Report for my Design is:

Data Sheet report: ----------------All values displayed in nanoseconds (ns) Setup/Hold to clock clk +-------------+------------+------------+ | | | | | Clock | Setup to | Hold to |
- 57 -
| Source | clk (edge) | clk (edge) | +-------------+------------+------------+ data_in_out<0>| 1.356(R)| 0.134(R) data_in_out<1>| 1.305(R)| 0.134(R) data_in_out<2>| 1.356(R)| 0.134(R) data_in_out<3>| 1.305(R)| 0.134(R) data_in_out<4>| 1.356(R)| 0.134(R) data_in_out<5>| 1.305(R)| 0.134(R) data_in_out<6>| 1.356(R)| 0.134(R) data_in_out<7>| 1.305(R)| 0.134(R) ir_in<10> | 3.202(R)| 0.643(R) ir_in<11> | 3.202(R)| -1.117(R) ir_in<1> | 3.202(R)| -1.117(R) ir_in<2> | 3.202(R)| -1.117(R) ir_in<3> | 3.202(R)| -1.117(R) ir_in<4> | 3.202(R)| -1.117(R) ir_in<5> | 3.202(R)| -1.117(R) ir_in<6> | 3.202(R)| -1.117(R) ir_in<7> | 3.202(R)| 0.643(R) ir_in<8> | 3.202(R)| 0.643(R) ir_in<9> | 3.202(R)| -1.117(R) xin<0> | 4.237(R)| -0.832(R) xin<1> | 4.247(R)| -0.349(R) xin<2> | 4.026(R)| -0.832(R) xin<3> | 4.036(R)| -0.832(R) xin<4> | 3.815(R)| -0.832(R) xin<5> | 3.825(R)| -0.832(R) xin<6> | 3.380(R)| -0.349(R) xin<7> | 3.004(R)| -0.832(R) +-------------+------------+------------+
Clock clk to Pad +-------------+------------+ | | clk (edge) | | Destination | to PAD | +-------------+------------+ addr_data<0> | 6.407(R) addr_data<10> | 6.407(R) addr_data<11> | 6.407(R) addr_data<12> | 6.407(R) addr_data<13> | 6.407(R) addr_data<14> | 6.407(R) addr_data<15> | 6.407(R) addr_data<1> | 6.407(R) addr_data<2> | 6.407(R) addr_data<3> | 6.407(R) addr_data<4> | 6.407(R)
- 58 -
addr_data<5> | 6.407(R) addr_data<6> | 6.407(R) addr_data<7> | 6.407(R) addr_data<8> | 6.407(R) addr_data<9> | 6.407(R) addr_pc<0> | 6.407(R) addr_pc<10> | 6.407(R) addr_pc<11> | 6.407(R) addr_pc<12> | 6.407(R) addr_pc<13> | 6.407(R) addr_pc<14> | 6.407(R) addr_pc<15> | 6.407(R) addr_pc<1> | 6.407(R) addr_pc<2> | 6.407(R) addr_pc<3> | 6.407(R) addr_pc<4> | 6.407(R) addr_pc<5> | 6.407(R) addr_pc<6> | 6.407(R) addr_pc<7> | 6.407(R) addr_pc<8> | 6.407(R) addr_pc<9> | 6.407(R) data_in_out<0>| 7.565(R) data_in_out<1>| 7.565(R) data_in_out<2>| 7.565(R) data_in_out<3>| 7.565(R) data_in_out<4>| 7.565(R) data_in_out<5>| 7.565(R) data_in_out<6>| 7.565(R) data_in_out<7>| 7.565(R) rd_data | 7.164(R) wr_data | 7.164(R) xout<0> | 6.618(R) xout<1> | 6.618(R) xout<2> | 6.618(R) xout<3> | 6.618(R) xout<4> | 6.618(R) xout<5> | 6.618(R) xout<6> | 6.618(R) xout<7> | 6.618(R) Pad to Pad +--------------+---------------+---------+ | Source Pad |Destination Pad| Delay | ---------------+---------------+---------+ xin<0> |data_in_out<0> | 6.159 xin<1> |data_in_out<1> | 6.159 xin<2> |data_in_out<2> | 6.159 xin<3> |data_in_out<3> | 6.159 xin<4> |data_in_out<4> | 6.159 xin<5> |data_in_out<5> | 6.159
- 59 -
xin<6> |data_in_out<6> | 6.159 xin<7> |data_in_out<7> | 6.159 +--------------+---------------+---------+ Analysis completed Tue May 30 13:11:29 2006
4.6. PLACE AND ROUTE

4.6.1. OVERVIEW
After you create a Native Circuit Description (NCD) file with the MAP program, you can place and route that design file using PAR. PAR accepts a mapped NCD file as input, places and routes the design, and outputs an NCD file to be used by the bit stream generator (BitGen). The NCD file output by PAR can also be used as a guide file for additional runs of PAR that may be done after making minor changes to your design. PAR places and routes a design based on the following considerations:
Timing-drivenThe Xilinx timing analysis software enables PAR to place and route a design based upon timing constraints.
Non Timing-driven (cost-based)Placement and routing are performed using various cost tables that assign weighted values to relevant factors such as constraints, length of connection, and available routing resources. Non timing-driven placement and routing is used if no timing constraints are present.
4.6.2 PLACING
The PAR placer executes multiple phases of the placer. PAR writes the NCD after all
- 60 -
the placer phases are complete. During placement, PAR places components into sites based on factors such as constraints specified in the PCF file, the length of connections, and the available routing resources.
4.6.3. ROUTING
After placing the design, PAR executes multiple phases of the router. The router performs a converging procedure for a solution that routes the design to completion and meets timing constraints. Once the design is fully routed, PAR writes an NCD file, which can be analyzed against timing. PAR writes a new NCD as the routing improves throughout the router phases. Note: Timing-driven place and timing-driven routing are automatically invoked if PAR finds timing constraints in the physical constraints file
4.6.3. POST PAR TIMING REPORT

The timing report generated after MAP process contains all the component delays. But the timing report generated after PAR have both the component as well as the interconnect delays. The interconnect delays comes out to be comparable to the component delays. Now the delays for the same type of components will not be same because of different routing paths.
The Post PAR Timing Report for my Design is:

Data Sheet report: ----------------All values displayed in nanoseconds (ns) Setup/Hold to clock clk +-------------+------------+------------+ | Clock | Setup to | Hold to |
- 61 -
| Source | clk (edge) | clk (edge) | +-------------+------------+------------+ data_in_out<0>| 3.411(R)| 0.111(R) data_in_out<1>| 3.599(R)| -0.001(R)| data_in_out<2>| 3.271(R)| 0.056(R)| data_in_out<3>| 3.497(R)| -0.016(R)| data_in_out<4>| 3.551(R)| 0.085(R)| data_in_out<5>| 4.543(R)| -0.583(R)| data_in_out<6>| 3.925(R)| -0.144(R)| data_in_out<7>| 3.065(R)| -0.077(R)| ir_in<10> | 2.622(R)| 0.534(R)| ir_in<11> | 2.623(R)| -0.401(R)| ir_in<1> | 2.623(R)| -0.400(R)| ir_in<2> | 2.622(R)| -0.400(R)| ir_in<3> | 2.623(R)| -0.400(R)| ir_in<4> | 2.623(R)| -0.401(R)| ir_in<5> | 2.623(R)| -0.400(R)| ir_in<6> | 2.622(R)| -0.400(R)| ir_in<7> | 2.622(R)| 0.753(R)| ir_in<8> | 2.623(R)| 0.394(R)| ir_in<9> | 2.623(R)| -0.401(R)| xin<0> | 8.194(R)| -2.109(R)| xin<1> | 7.703(R)| -1.613(R)| xin<2> | 7.778(R)| -1.795(R)| xin<3> | 8.995(R)| -1.671(R)| xin<4> | 7.548(R)| -1.576(R)| xin<5> | 8.228(R)| -1.709(R)| xin<6> | 7.885(R)| -1.298(R)| xin<7> | 6.916(R)| -2.378(R)| +-------------+------------+------------+ Clock clk to Pad +-------------+------------+ | Destination | clk (edge) | | | to PAD | +-------------+------------+ addr_data<0> | 9.442(R) addr_data<10> | 9.144(R) addr_data<11> | 9.149(R) addr_data<12> | 8.825(R) addr_data<13> | 8.607(R) addr_data<14> | 8.525(R) addr_data<15> | 8.784(R) addr_data<1> | 8.424(R) addr_data<2> | 8.638(R) addr_data<3> | 9.100(R) addr_data<4> | 8.361(R) addr_data<5> | 8.380(R)
- 62 -
addr_data<6> | 8.640(R) addr_data<7> | 9.172(R) addr_data<8> | 8.907(R) addr_data<9> | 8.852(R) addr_pc<0> | 9.099(R) addr_pc<10> | 8.084(R) addr_pc<11> | 8.290(R) addr_pc<12> | 8.528(R) addr_pc<13> | 8.178(R) addr_pc<14> | 8.735(R) addr_pc<15> | 8.824(R) addr_pc<1> | 9.076(R) addr_pc<2> | 9.333(R) addr_pc<3> | 9.067(R) addr_pc<4> | 11.008(R) addr_pc<5> | 8.909(R) addr_pc<6> | 9.681(R) addr_pc<7> | 9.785(R) addr_pc<8> | 8.384(R) addr_pc<9> | 9.393(R) data_in_out<0>| 12.164(R) data_in_out<1>| 13.308(R) data_in_out<2>| 12.626(R) data_in_out<3>| 12.626(R) data_in_out<4>| 12.632(R) data_in_out<5>| 14.403(R) data_in_out<6>| 12.408(R) data_in_out<7>| 14.370(R) rd_data | 12.080(R) wr_data | 12.561(R) xout<0> | 9.245(R) xout<1> | 9.249(R) xout<2> | 9.616(R) xout<3> | 8.549(R) xout<4> | 9.300(R) xout<5> | 9.623(R) xout<6> | 9.578(R) xout<7> | 9.265(R) +-------------+------------+ Pad to Pad +--------------+---------------+---------+ | Source Pad |Destination Pad| Delay | +--------------+---------------+---------+ xin<0> |data_in_out<0> | 9.217 xin<1> |data_in_out<1> | 8.800 xin<2> |data_in_out<2> | 9.069 xin<3> |data_in_out<3> | 8.827 xin<4> |data_in_out<4> | 8.639 xin<5> |data_in_out<5> | 9.744
- 63 -
xin<6> |data_in_out<6> | 9.310 xin<7> |data_in_out<7> | 10.122 +--------------+---------------+---------+ Analysis completed Tue May 30 13:15:43 2006
4.7 BITGEN OVERVIEW

BitGen produces a bit stream for Xilinx device configuration. After the design is completely routed, it is necessary to configure the device so that it can execute the desired function. This is done using files generated by BitGen, the Xilinx bit stream generation program. BitGen takes a fully routed NCD (native circuit description) file as input and produces a configuration bit streama binary file with a .bit extension. The BIT file contains all of the configuration information from the NCD file that defines the internal logic and interconnections of the FPGA, plus device-specific information from other files associated with the target device. The binary data in the BIT file is then downloaded into the FPGAs memory cells, or it is used to create a PROM file.
The final bit file was downloaded into the FPGA device and real time verification was done.
5. CONCLUSION
The design was successfully implemented on the target device. The design was tested successfully by both Functional and Post PAR Simulation.
5.1. PERFORMANCE PARAMETERS

Here are some of the performance parameters that my design achieved.
- 64 -
1. Throughput 2. Initial Latency
: :
1 instruction/cycle 1 Clock 2 97 Mhz
3. No. of Pipelining Stages : 4. Max. Operating Freq :
5.2. FUTURE IMPROVEMENTS

1. More instructions can be included in the design with the same instruction size by using the dont care bits.
2. Number of pipelining stages can be increased to 4-5 from the current number of 2. First pipelining stage is Read-Fetch-Execute and the Second pipelining stage is Write. By dividing the First stage further in to three stages, maximum operating frequency will also be improved by great extent.
APPENDIX A RTL CODING

A.1. MOVE UNIT
/* ~~~~MOVE UNIT~~~~ */ module move_unit(I_10, src, I_2_9, result); input I_10; input [7:0] src; input [7:0] I_2_9; output [7:0] result;
- 65 -
assign result = I_10 ? I_2_9 : src; endmodule
A.2. SHIFT UNIT

/* ~~~~SHIFT UNIT~~~~ */ module shift_unit(src, I_7, result); input [7:0] src; input I_7; output [7:0] result; assign result = I_7 ? {1'b0, src[7:1]} : {src[6:0], 1'b0}; endmodule
A.3. ARITHMETIC UNIT

/* ~~~~8-BTIT FULL-ADDER~~~~ */ module full_adder_8bit(in1, in2, sum, cout, cin); input [7:0] in1, in2; input cin; output [7:0] sum; output cout; assign {cout, sum} = in1 + in2 + cin; endmodule /* ~~~~ARITHMETIC UNIT~~~~ */ module arithmetic_unit(A, src, I_8, cin, sub, q_c, q_s, result, S, C, Z); input input input input input input [7:0] A; [7:0] src; I_8; cin; sub; q_c, q_s;
output [7:0] result; output S, C, Z; wire [7:0] in2, in2_final; wire cout;
- 66 -
assign in2 = I_8 ? 8'b0 : A; assign in2_final = in2 ^ {8{sub}}; full_adder_8bit a1(.in1(src), .in2(in2_final), .cin(cin), .cout(cout), .sum(result)); assign C = q_c && cout; //CARRY FLAG assign S = q_s && (!cout); //SIGN FLAG assign Z = (!result[7]) & (!result[6]) & (!result[5]) & (!result[4]) & (!result[3]) & (!result[2]) & (!result[1]) & (!result[0]); //ZERO FLAG endmodule
A.4. PROGRAM COUNTER

/* ~~~~16-BIT ADDER~~~~ */ module adder_16(in1, in2, out, cout, cin); input cin; input [15:0] in1, in2; output cout; output [15:0] out; assign {cout, out}=in1 + in2 + cin; endmodule
/* ~~~~PROGRAM COUNTER~~~~ */ module program_counter(ld_pc, rst, clk, c, d, I_2_9, s4, q14, PC); input input input input input input input ld_pc; rst; clk; [7:0] c; [7:0] d; [7:0] I_2_9; s4, q14;
output reg [15:0] PC; wire cin=1'b0; wire [15:0] in2, adder_out, pc_in; wire [7:0] in2_half; assign in2_half = s4 ? I_2_9 : 8'b0000_0001;
- 67 -
assign in2 = {8'b0000_0000, in2_half}; assign pc_in = q14 ? {c,d} : adder_out; adder_16 a16 (.in1(PC), .in2(in2), .out(adder_out), .cin(cin)); always@(posedge clk, posedge rst) begin if (rst) PC = 8'b0; else if (ld_pc) PC = pc_in; end endmodule
A.5. INSTRUCTION REGISTER

/* ~~~~INSTRUCTION REGISTER~~~~ */ module instruction_register(clk, rst, ld_ir, ir_in, I); input input input input clk; rst; ld_ir; [11:1] ir_in;
output reg [11:1] I;
always@(posedge clk, posedge rst) begin if(rst) I = 11'b0100_0000_000; else if (ld_ir) I = ir_in; end endmodule
A.6. INSTRUCTION DECODER

- 68 -
/* ~~~~INSTRUCTION DECODER~~~~ */ module instruction_decoder(I_4_11, q); input [11:4] I_4_11; output [14:1] q; assign q[1] assign q[2] assign q[3] assign q[4] (!I_4_11[8]) = = = = & (!I_4_11[11]) & I_4_11[10]; //MVI I_4_11[11] & (!I_4_11[10]); //JMP offset I_4_11[11] & I_4_11[10]; //JZ (!I_4_11[11]) & (!I_4_11[10]) & (!I_4_11[9]) & I_4_11[7]; //MOV
assign q[5] = (!I_4_11[11]) & (!I_4_11[10]) & (!I_4_11[9]) & I_4_11[8] & (!I_4_11[7]); //INC assign q[6] = (!I_4_11[11]) & (!I_4_11[10]) & (!I_4_11[9]) & I_4_11[8] & I_4_11[7]; //DEC assign q[7] = (!I_4_11[11]) & (!I_4_11[10]) & I_4_11[9] & (!I_4_11[8]) & (!I_4_11[7]); //SL assign q[8] = (!I_4_11[11]) & (!I_4_11[10]) & I_4_11[9] & (!I_4_11[8]) & I_4_11[7]; //SR assign q[9] = (!I_4_11[11]) & (!I_4_11[10]) & (!I_4_11[9]) & (!I_4_11[8]) & (!I_4_11[7]) & (!I_4_11[6]) & (!I_4_11[5]); //CMP assign q[10] = (!I_4_11[11]) & (!I_4_11[10]) & (!I_4_11[9]) & (!I_4_11[8]) & (!I_4_11[7]) & (!I_4_11[6]) & I_4_11[5] & (!I_4_11[4]); //ADD assign q[11] = (!I_4_11[11]) & (!I_4_11[10]) & (!I_4_11[9]) & (!I_4_11[8]) & (!I_4_11[7]) & (!I_4_11[6]) & I_4_11[5] & I_4_11[4]; //SUB assign q[12] = (!I_4_11[11]) & (!I_4_11[10]) & I_4_11[9] & I_4_11[8] & (!I_4_11[7]); //LOAD assign q[13] = (!I_4_11[11]) & (!I_4_11[10]) & I_4_11[9] & I_4_11[8] & I_4_11[7]; //STORE assign q[14] = (!I_4_11[11]) & (!I_4_11[10]) & (!I_4_11[9]) & (!I_4_11[8]) & (!I_4_11[7]) & I_4_11[6]; //JMPCD endmodule
A.7. CONTROL UNIT

/* ~~~~CONTROL UNIT~~~~ */
- 69 -
module control_unit(q, z, q_c, q_s, s0, s1, s2, wr_data, rd_data, ld_pc, ld_ir, en_dec, sub, cin, s4, xout_buf, ld_flags); input [14:1] q; input z; output output output output output output output output reg q_c, q_s; reg s0, s1, s2, s4; wr_data, rd_data; ld_pc, ld_ir; reg ld_flags; reg en_dec; reg sub, cin; reg xout_buf;
reg [3:0] E; assign ld_pc=1'b1; assign ld_ir=1'b1; assign rd_data=q[12]; assign wr_data=q[13]; always @ * begin E[0]=q[1] | q[4]; E[1]=q[5] | q[6] | q[9] | q[10] | q[11]; E[2]=q[7] | q[8]; E[3]=q[12]; case (E) 4'b0010: 4'b0100: 4'b1000: default: endcase sub=q[6] cin=q[5] q_c=q[5] q_s=q[6] ld_flags | | | | = begin begin begin begin s0=1'b1; s0=1'b0; s0=1'b1; s0=1'b0; s1=1'b0; s1=1'b1; s1=1'b1; s1=1'b0; end end end end
q[9] | q[11]; q[9] | q[11]; q[10]; q[11] | q[9]; E[1];
s4=q[2] | (q[3] && z); s2=q[10] | q[11]; en_dec=q[1] | q[2] | q[3] | q[9] | q[13] | q[14]; xout_buf=q[4] | q[5] | q[6] | q[7] | q[8] | q[10] | q[11] | q[12]; end
- 70 -
endmodule
A.8. MAIN PROCESSOR UNIT

/* ~~~~MAIN PROCESSOR UNIT~~~~ */ module main_processor(clk, rst, xin, xout, wr_data, rd_data, addr_data, data_in_out, ir_in, addr_pc); input clk; input rst; input [7:0] xin; input [11:1] ir_in; inout [7:0] data_in_out; output output output output output [7:0] xout; wr_data; rd_data; [15:0] addr_data; [15:0] addr_pc;
reg Sign, Carry, Zero; reg [7:0] A_reg, B_reg, C_reg, D_reg, X_reg; reg X_buf; reg [7:0] data_bus; reg ld_a_temp, ld_B, ld_C, ld_D; wire wire wire wire wire wire wire wire wire cin, sub, q_C, q_S, s, c, z; s0, s1, s2, s4; xout_buf, en_dec; ld_A, ld_ir, ld_pc, ld_flags; [11:1] I; [14:1] q; [7:0] result_au, result_su, result_mu; [7:0] src; [7:0] data_in;
arithmetic_unit au1(A_reg, src, I[8], cin, sub, q_C, q_S, result_au, s, c, z); control_unit cu1(q, Zero ,q_C, q_S, s0, s1, s2, wr_data, rd_data, ld_pc, ld_ir, en_dec, sub, cin, s4, xout_buf, ld_flags); instruction_decoder id1(I[11:4], q); instruction_register ir1(clk, rst, ld_ir, ir_in, I); move_unit mu1(I[10], src, I[9:2], result_mu); program_counter pc1(ld_pc, rst, clk, C_reg, D_reg, I[9:2], s4, q[14],
- 71 -
addr_pc); shift_unit su1(src, I[7], result_su); assign xout = X_buf ? X_reg : 8'bz; assign ld_A = ld_a_temp || q[1]; assign addr_data = {C_reg, D_reg}; //SRC Multiplexer assign src = I[3] ? xin : (I[2] ? (I[1] ? D_reg : C_reg) : (I[1] ? : A_reg)); assign data_in = rd_data ? data_in_out : 8'bz; assign data_in_out = wr_data ? src : 8'bz; always @ (posedge clk, posedge rst) begin if (rst) begin A_reg=8'b0; B_reg=8'b0; C_reg=8'b0; D_reg=8'b0; X_reg=8'b0; X_buf=1'b0; Sign=1'b0; Carry=1'b0; Zero=1'b0; end else begin X_reg = data_bus; X_buf = xout_buf & (s2 ? I[3] : I[6]); if(ld_flags) begin Carry=c; Zero=z; Sign=s; end if (ld_A) A_reg = data_bus; if (ld_B) B_reg = data_bus;
B_reg
- 72 -
if (ld_C) C_reg = data_bus; if (ld_D) D_reg = data_bus; end end
always @ * begin // Destination Decoder if (!((xout_buf & (s2 ? I[3] : I[6])) || en_dec)) begin case (s2 ? I[2:1] : I[5:4]) 2'b00: begin ld_a_temp =1'b1; ld_B = 1'b0; ld_C = 1'b0; ld_D=1'b0; end 2'b01: begin ld_a_temp =1'b0; ld_B = 1'b1; ld_C = 1'b0; ld_D=1'b0; end 2'b10: begin ld_a_temp =1'b0; ld_B = 1'b0; ld_C = 1'b1; ld_D=1'b0; end 2'b11: begin ld_a_temp =1'b0; ld_B = 1'b0; ld_C = 1'b0; ld_D=1'b1; end endcase end else begin ld_a_temp =1'b0; ld_B = 1'b0; ld_C = 1'b0; ld_D=1'b0; end case ({s1, s0}) 2'b01: data_bus = 2'b10: data_bus = 2'b11: data_bus = default: data_bus endcase end endmodule result_au; result_su; data_in; = result_mu;
- 73 -
APPENDIX B INSTRUCTION SET

ADD A ADD B ADD C ADD D ADD X CMP A CMP B CMP C CMP D CMP X : : : : : : : : : : 11H010 11H011 11H012 11H013 11H014 11H000 11H001 11H002 11H003 11H004 - 74 -
DEC A, A DEC A, B DEC A, C DEC A, D DEC A, X DEC B, A DEC B, B DEC B, C DEC B, D DEC B, X DEC C, A DEC C, B DEC C, C DEC C, D DEC C, X DEC D, A DEC D, B DEC D, C DEC D, D DEC D, X DEC X, A DEC X, B DEC X, C DEC X, D DEC X, X
: : : : : : : : : : : : : : : : : : : : : : : : :
11H0C0 11H0C1 11H0C2 11H0C3 11H0C4 11H0C8 11H0C9 11H0CA 11H0CB 11H0CC 11H0D0 11H0D1 11H0D2 11H0D3 11H0D4 11H0D8 11H0D9 11H0DA 11H0DB 11H0DC 11H0E0 11H0E1 11H0E2 11H0E3 11H0E4 11H080 11H081 11H082 11H083 11H084 11H088 11H089 11H08A 11H08B 11H08C 11H090 11H091
INC A, A INC A, B INC A, C INC A, D INC A, X INC B, A INC B, B INC B, C INC B, D INC B, X INC C, A INC C, B
: : : : : : : : : : : :
- 75 -
INC C, C INC C, D INC C, X INC D, A INC D, B INC D, C INC D, D INC D, X INC X, A INC X, B INC X, C INC X, D INC X, X JMP JMPCD JZ LOAD A LOAD B LOAD C LOAD D LOAD X MOV A, A MOV A, B MOV A, C MOV A, D MOV A, X MOV B, A MOV B, B MOV B, C MOV B, D MOV B, X MOV C, A MOV C, B MOV C, C MOV C, D MOV C, X MOV D, A MOV D, B
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
11H092 11H093 11H094 11H098 11H099 11H09A 11H09B 11H09C 11H0A0 11H0A1 11H0A2 11H0A3 11H0A4 [2B10, < 8-Bit Data>, 1b0] 11H020 [2B11, < 8-Bit Data>, 1b0] 11H180 11H188 11H190 11H198 11H1A0 11H041 11H041 11H042 11H043 11H044 11H048 11H049 11H04A 11H04B 11H04C 11H050 11H051 11H052 11H053 11H054 11H058 11H059
- 76 -
MOV D, C MOV D, D MOV D, X MOV X, A MOV X, B MOV X, C MOV X, D MOV X, X MVI SL A, A SL A, B SL A, C SL A, D SL A, X SL B, A SL B, B SL B, C SL B, D SL B, X SL C, A SL C, B SL C, C SL C, D SL C, X SL D, A SL D, B SL D, C SL D, D SL D, X SL X, A SL X, B SL X, C SL X, D SL X, X SR A, A SR A, B SR A, C SR A, D
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
11H05A 11H05B 11H05C 11H060 11H061 11H062 11H063 11H064 [2B01, < 8-Bit Data>, 1b0] 11H100 11H101 11H102 11H103 11H104 11H108 11H109 11H10A 11H10B 11H10C 11H110 11H111 11H112 11H113 11H114 11H118 11H119 11H11A 11H11B 11H11C 11H120 11H121 11H122 11H123 11H124 11H140 11H141 11H142 11H143
- 77 -
SR A, X SR B, A SR B, B SR B, C SR B, D SR B, X SR C, A SR C, B SR C, C SR C, D SR C, X SR D, A SR D, B SR D, C SR D, D SR D, X SR X, A SR X, B SR X, C SR X, D SR X, X STORE A STORE B STORE C STORE D STORE X SUB A SUB B SUB C SUB D SUB X
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
11H144 11H148 11H149 11H14A 11H14B 11H14C 11H150 11H151 11H152 11H153 11H154 11H158 11H159 11H15A 11H15B 11H15C 11H160 11H161 11H162 11H163 11H164 11H1A0 11H1A1 11H1A2 11H1A3 11H1A4 11H018 11H019 11H01A 11H01B 11H01C
- 78 -
- 79 -
- 80 -
- 81 -
- 82 -
References
[1] R. Aceves, Desarrollo de un enlace inalmbrico para telefona fija empleando una FPGA. Final Project at the ETSII, University of Valladolid, Spain, 2006. [2] M. Alonso, Diseo de un Entorno de Desarrollo de Alto y Bajo Nivel para un Procesador de Propsito General integrado en FPGA, Final Project at the ETSII, University of Valladolid, Spain, 2003. [3] J. del Barrio, Desarrollo sobre FPGA de un Emulador de una Planta de Microgeneracin Elctrica, Final Project at the ETSII, University of Valladolid, Spain, 2004. [4] K. Chapman, PicoBlaze 8-Bit Microcontroller for Virtex-E and Spartan-II/IIE Devices, Xilinx XAPP213 (v2.0), online at http://www.xilinx.com/xapp/xapp213 .pdf, December, 2002 . [5] J. Gray, Designing a Simple FPGA-Optimized RISC CPU and System-on-a-Chip, DesignCon2001, online at http://www.fpgacpu.org/gr/index.html, 2001. [6] J. Gray, FPGA CPU Links, on line at http://www. fpgacpu.org/links.html, September, 2002. [7] S. K. Knapp, XC4000 Series Edge-Triggered and DualPort RAM Capability, Xilinx XAPP065, 1996. [8] J. Kent, Johns FPGA Page, online at http://members. optushome.com.au/jekent/FPGA.htm, January, 2002. [9] G. Moore, Cramming more components onto integrated circuits, Electronics Magazine, 19 April, 1965. [10] Opencores: http://www.opencores.org/ [11] S. de Pablo et al., A soft fixed-point Digital Signal Processor applied in Power Electronics, FPGAworld Conference 2005, Stockholm, Sweden, 2005. [12] I. Rodrguez, Desarrollo en FPGA de un interfaz USB.
- 83 -

Project Report Ashish

Transféré par

Informations du document

Description originale:

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Project Report Ashish

Transféré par

Droits d'auteur :

Formats disponibles

*****DESIGN

OF 8-BIT RISC PROCESSOR*****

***** ASHISH TOMAR*****

***** ARPAN SHAH *****

Department of electronic & communication Engineering

Jagan Nath University Jaipur

Certificate of the Supervisor(s)

(Name and Signature of Supervisor) Date:.

Name of Student(s):ASHISH TOMAR (Roll Number):-0802BTEC016

4.7. BitGen Overview...53

6. CONCLUSION 6.1. Performance Parameters........58 6.2. Future Improvements.58

APPENDIX A RTL CODING

APPENDIX B INSTRUCTION SET..68

8 BIT RISC MICROPROCESSOR ARCHITECHTURE

commonly associated with RISC and CISC arose.

1.2. FIELD PROGRAMMABLE GATE ARRAYS (FPGA)

1.2.1 LOOK UP TABLES (MEMORY)

1.2.2 PROGRAMMABLE LOGIC ARRAY (PLA)

1.2.3 PROGRAMMABLE ARRAY LOGIC (PLA)

1.2.4 FIELD PROGRAMMABLE GATE ARRAYS

1.3 HARDWARE DESCRIPTION LANGUAGE VERILOG

2.1. BLOCK DIAGRAM

Fig 2.1 Functional Block Diagram

1. MOV dst, src

-- dst <= src

2. INC dst, src -- dst <= src + 1

12. JMP immediate_offset 13. JZ immediate_offset 14. JMPCD

A, B, C and D are 8-bit registers.

2.3.2. ARITHMETIC INSTRUCTIONS

2.3.2.6. Shift Left

2.3.3. JUMP INSTRUCTIONS

2.3.3.3. Direct Jump

2.3.4. MEMORY ACCESS OPERATOINS

2.4. TARGETED PERFORMANCE PARAMETERS

More instructions may also be added.

3.1. INSTRUCTION SET ARCHITECTURE

3.1.1 INSTRUCTION FORMAT

12. LOAD 13. STORE

: 00_110_< 3_Bit Destination>_XXX : 00_111_XXX_< 3_Bit Source>

14. JMPCD : 00_000_1XX_XXX

3.1.2. SOURCE / DESTINATION FORMAT

010 011 1XX

3.1.3. INSTRUCTION EXAMPLES

3.2. MODULAR DESIGN

3.3. TOP LEVEL ENTITY

3.3.1. BLOCK DIAGRAM

Xin 8 Xout 8 MAIN PROCESSOR UNIT Clk 16 Rst 11 16

Data_inout 8 Addr_data wr_data rd_data

3.3.2. PORTS DESCRIPTION 1. Xin

3.3.4. SOURCE REGISTER SELECTION

3.3.5. MEMORY ACCESS OPERATIONS

3.3.6. DATA BUS

3.3.7. DESTINATION DECODER

3.3.7. OUTPUT PORT Xout

3.4. MOVE UNIT

3.5. SHIFT UNIT

3.6. ARITHMETIC UNIT

3.6.1. BLOCK DIAGRAM

ARITHMETIC UNIT 8 Result_au S C Z

3.6.2. PORTS DESCRIPTION 1. src

2nd input to the adder inside the unit.

Inst. No. q5 q6 q10 q11 q9

Inst INC DEC ADD SUB CMP

* ASHISH TOMAR*

* ARPAN SHAH *