Académique Documents
Professionnel Documents
Culture Documents
A Project Report submitted in partial fulfilment for the award of the Degree of Bachelor of Technology in Department of electronic & communication Engineering by
-1-
Candidate Declaration
I, ASHSIH TOMAR .hereby declare that the work presented in this report entitled 8 BIT RISC MICROPROCESSOR in partial fulfillment of the requirements for the award of Degree of Bachelor of Technology, submitted in the Department of ELECTRONIC & COMMUNICATION at Jagan Nath University, Jaipur, is an authentic record of my own work under the supervision of ARPAN SHAH
I also declare that the work embodied in the present project report is my original work/extension of the existing work and has not been copied from any Journal/thesis/book, and has not been submitted by me for any other Degree/Diploma.
(Name & Signature of Candidate) Enrolment No.: JNU08BTEC013 Date: 29TH MAY 2012
-2-
This
is
to
certify
that
the
project
report
entitled"8
BIT
RISC
MICROPROCESSOR submitted by ASHISH TOMAR for the award of Degree of Bachelor of Technology in the Department of ELECTRONIC & COMMUNICATION of Jagan Nath University, Jaipur, is a record of authentic work carried out by him/her under my/our supervision.
The matter embodied in this project report is the original work of the candidate and has not been submitted for the award of any other degree or diploma. It is further certified that he/she has worked with me/us for the required period in the Department of ELCTRONIC & COMMUNICATION, Jagan Nath University, Jaipur.
-3-
Acknowledgements
I would like to express my sincere gratitude to my project guide ARPAN SHAH for giving me the opportunity to work on this topic. It would never be possible for us to take this project to this level without his innovative ideas and his relentless support and encouragement.
-4-
Abstract
Field Programmable Gate Array (FPGA) devices offer a large set of advantages due to their reconfigurable nature. Although their performance is not comparable to ASIC devices, their flexibility is usually more important especially when fast time-to-market is an issue and the production is on small scale basis. For that reason they are widely used in electronic applications both during prototyping but also for final-production systems. Processors are the most demanding when is comes to flexibility, cost and time to market.
RISC (Reduced Instruction Set Computer) are machines that have fixed size instructions, that can execute in one clock, and instructions interface with memory via fixed mechanism. There are only a small number of primitive instructions. RISC is based on using many simpler and faster instructions to do the same work as a single complicated instruction on CISC (Complex Instruction Set Computer) machine.
The aim of this project is the design of a 8-bit RISC processor for FPGA implementation. The Processor can execute 14 instructions, including 2 memory access operations. Verilog is chosen HDL for design entry. Xilinx Web Pack -ISE generates the programming file for the target device, SPARTAN -3.
-5-
INDEX
1. INTRODUCTION
1.1. Reduced Instruction Set Computers ..1 1.2. Field Programmable Gate Array 1.2.1. Look Up Tables4 1.2.2. Programmable Logic Array...4 1.2.3. Programmable Array Logic...4 1.2.4. FPGA.5 1.2.5. Spartan-3...7 1.3. Hardware Description Languages 1.3.1. Importance of HDLs8 1.3.2. Verilog HDL.8
2. FUNCTIONAL DESCRIPTION
2.1. Block Diagram9 2.2. Specifications..9 2.3. Instructions 2.3.1. Move Instructions..11 2.3.2. Arithmetic Instructions.11 2.3.3. Jump Instructions..13 2.3.4. Memory Access Instructions14 2.4. Targeted Performance Parameters....14
-6-
3. DESIGN ARCHITECTURE 3.1. Instruction Set Architecture 3.1.1. Instruction Format15 3.1.2. Source/Destination Format.16 3.1.3. Instruction Examples17 3.2. Modular Design..18 3.3. Top Level Entity 3.3.1. Block Diagram...19 3.3.2. Ports Description..19 3.3.3. Architecture...22 3.3.4. Source Register Selection23 3.3.5. Memory Access Operations23 3.3.6. Data Bus.................23 3.3.7. Destination Decoder24 3.3.8. Output Port Xout.24 3.4. Move Unit..25 3.5. Shift Unit26 3.6. Arithmetic Unit 3.6.1. Block Diagram..27 3.6.2. Ports Description..28 3.6.3. Architecture..29 3.6.4. Functionality.31 3.6.5. Flags...32
-7-
3.7. Program Counter..32 3.8. Instruction Register..34 3.9. Instruction Decoder..34 3.10. Control Unit.36 3.11. Data Memory...38 3.12. Program Memory....38
4. DESIGN IMPLEMENTATION 4.1. HDL Entry..39 4.2. Functional Simulation..40 4.3. Synthesis.41 4.3.1. Synthesis Constraints..41 4.3.2. Synthesis Report..42 4.4. Translate 4.4.1. NGD Build Overview.43 4.4.2. Conversion of Netlist to NGD...43 4.5. MAP 4.5.1. MAP Input Files..44 4.5.2. MAP Output Files...45 4.5.3. MAP Report.46 4.5.4. Post MAP Timing Report...46 4.6. Place & Route 4.6.1. Overview..49 4.6.2. Placing...50 4.6.3. Routing.50 4.6.4. Post PAR Timing Report50
-8-
5. SIMULATION RESULTS......54
-9-
- 10 -
1. INTRODUCTION
1.1 REDUCED INSTRUCTION SET COMPUTER (RISC)
An important factor in computer design prior to 1980 was that all memories, including the memory to store program instructions, were very expensive. So if you were a computer designer, you would want to make each of the instructions you design to be short but powerful. That way, when programmers write programs using your instructions, their code will be dense and will require little memory, but each bit of code would do a lot of work.
This would in a bunch of instructions of different lengths. Finally, you would also end up with a very rich collection of instructions that can interface with the computers data memory in many different ways: either dealing directly with the data memory, or demanding that data first be stored into temporary locations (registers) first, or some mix of the two.
Now because of this rich, powerful, and variable-length group of (compact) instructions youve designed, the computer would have several characteristics. First, each instruction might take several clock cycles to complete. Thats because each instruction would be of a different size, so figuring out what each one says is complicated; because each instruction could talk to memory in a different way; and because each instruction could potentially do a lot of work. Second, and for the same reasons just given, the computer speed might be fairly slow.
- 11 -
But as time passed, memory became cheaper, compilers got better, and the motivation for making small but really powerful instructions faded. In 1980, Patterson and Ditzel at Berkeley argued in favor of a different architecture having simple instructions, all of uniform length and that simpler operations. Sure, youd need to specify more of these simpler instructions to equal one of the old-style complicated instructions, and yes, this takes more instruction memory, but memory is cheap, and your computer can run faster and take fewer clocks.
For example, say you had a complicated instruction called MUL that told the computer to take two pieces of data from memory and multiply their sum with a third piece of data and put the result back somewhere else. This one instruction might take 10 clock cycles to complete. Now suppose we had a simple instruction set. To do the same work as MUL did, wed need perhaps 8 different instructions (a few loads, an add, a multiply, a store, etc.). But each instruction completes in a single clock cycle because each is so simple. And mybe the computers clock can run much faster, too. The
downside of the simple system, of course, is that it requires you to store 8 times as many instructions.
A Comparison: Complicated system does MUL: 1 instruction x 10 clocks/instr x 10 nsecond/clock = 100ns Simple system does the same work as MUL: 8 instructions x 1 clock/instr x 9 nseconds/clock = 72ns
Three systems based on this idea were built in the early 80s: the Berkeley machines RISC-I and RISC-II, the Stanford MIPS processor [2], and the IBM 801 [3]. Based on comparisons between these machines and what came before, some characteristics
- 12 -
Reduced Instruction Set Computer (RISC) is based on using many simpler and faster instructions to do the same work as a single complicated instruction on a Complex Instruction Set Computer (CISC).
RISC machines are machines that have Instructions execute in one clock Instructions of a fixed size Instructions interface with memory via fixed mechanism A small number of primitive instructions Pipelining, a way to do more than one instruction at a time.
- 13 -
- 14 -
pins.
A hierarchy of programmable interconnects allows the logic blocks of an FPGA to be interconnected as needed by the system designer, somewhat like a one-chip programmable breadboard. These logic blocks and interconnects can be programmed after the manufacturing process by the customer/designer (hence the term "field programmable") so that the FPGA can perform whatever logical function is needed.
FPGAs are generally slower than their application-specific integrated circuit (ASIC) counterparts, can't handle as complex a design, and draw more power. However, they have several advantages such as a shorter time to market, ability to re-program in the field to fix bugs, and lower non-recurring engineering costs.
The historical roots of FPGAs are in complex programmable logic devices (CPLDs). CPLD logic gate densities range from the equivalent of several thousand to tens of thousands of logic gates, while FPGAs typically range from tens of thousands to several million. The primary differences between CPLDs and FPGAs are architectural. A CPLD has a somewhat restrictive structure consisting of one or more programmable SOP logic arrays feeding a relatively small number of clocked registers. The result of this is less
- 15 -
flexibility, with the advantage of more predictable timing delays and a higher logic to interconnect ratio. The FPGA architectures, on the other hand, are dominated by interconnect. This makes them far more flexible, but also far more complex to design for. Another notable difference between CPLDs and FPGAs is the presence in most FPGAs of higher-level embedded functions (such as adders and multipliers) and embedded memories. A related, important difference is that many modern FPGAs support partial in-system reconfiguration, allowing their designs to be changed "on the fly" either for system upgrades or for dynamic reconfiguration.
A recent trend has been to take the architectural approach a step further by combining the logic blocks and interconnects of traditional FPGAs with embedded microprocessors and related peripherals to form complete "systems on a programmable chip". Examples of such hybrid technologies can be found in the Xilinx Virtex-II PRO and Virtex-4 devices, which include one or more PowerPC processors embedded within the FPGA's logic fabric. An alternate approach is to make use of "soft" processor cores that are implemented within the FPGA logic. These cores include the Xilinx MicroBlaze and PicoBlaze, and the Altera Nios and Nios II processors, as well as third-party processor cores.
Applications of FPGAs include DSP, software-defined radio, aerospace and defense systems, ASIC prototyping, medical imaging, computer vision, speech recognition, cryptography, bioinformatics, computer hardware emulation and a growing range of other areas. As their size, capabilities and speed increased they began to take over larger and larger functions to the state where they are now marketed as competitors for full systems on chips. They now find applications in any area or algorithm that can make use of the massive parallelism offered by their architecture.
To define the behavior of the FPGA the user provides a hardware description language (HDL) or a schematic design. Common HDLs are VHDL and Verilog. Then,
- 16 -
using an electronic design automation tool, a technology-mapped netlist is generated. The netlist can then be fitted to the actual FPGA architecture using a process called place-and-route, usually performed by the FPGA Companys proprietary place-androute software. The user will validate the map, place and route results via timing analysis, simulation, and other verification methodologies. Once the design and validation process is complete, the binary file generated (also using the FPGA company's proprietary software) is used to (re)configure the FPGA device. To simplify the design of complex systems in FPGAs, there exist libraries of predefined complex functions and circuits that have been tested and optimized to speed up the design process. These predefined circuits are commonly called IP cores, and are available from FPGA vendors and third-party IP suppliers. In a typical design flow, an FPGA application developer will simulate the design at multiple stages throughout the design process. Initially the RTL description in VHDL or Verilog is simulated by creating test benches to stimulate the system and observe results. Then, after the synthesis engine has mapped the design to a netlist, the netlist is translated to a gate level description where simulation is repeated to confirm the synthesis proceeded without errors. Finally the design is laid out in the FPGA at which point propagation delays can be added and the simulation run again with these values back annotated onto the netlist.
1.2.5 SPARTAN 3
The Spartan-3 families of FPGA offer densities ranging from 50,000 to five million system gates. Spartan-3 FPGAs are ideally suited to a wide range of consumer electronics applications, including broadband access, home networking,
display/projection & digital television equipment, because of their exceptionally low cost. Features: - Up to 784 I/O pins - 622 Mb/s data transfer rate per I/O - Signal swing ranging from 1.14V to 3.45V
- 17 -
- Double Data Rate (DDR) support - DDR, DDR2 SDRAM support up to 333 Mbps
1.3.2. Verilog HDL Verilog HDL has evolved as a standard hardware description language. Verilog HDL offers many useful features for the hardware design. Verilog is easy to learn and use. It is similar in syntax to the C programming language. Allows different levels of abstraction to be mixed in the same model. Most popular synthesis tools support Verilog HDL.
- 18 -
2. FUNCTIONAL DESCRIPTION
This chapter gives the detailed information about the functionality of the design and the implementation constraints.
2.2. SPECIFICATIONS
The following instructions have to be implemented:
- 19 -
3. DEC dst, src -- dst <= src - 1 4. ADD src 5. SUB src 6. SL dst, src 7. SR dst, src 8. CMP src -- src <= src + A -- src <= src - A -- dst <= shift left src -- dst <= shift right src -- set Z flag if src = A
9. MVI A, immediate -- A <= immediate data 10. LOAD dst 11. STORE src -- dst <= memory contents at -- address [CD] -- memory at [CD] <= src -- jump to PC + imm_offset -- jump to PC + imm_offset if Z=1 -- jump to address pointed by [CD]
Src, dst can be either A, B, C, D or X. PC is the program counter. [CD] represents the contents of register C and D after concatenation. D is the least significant byte.
X is 8 bits wide Input and Output port. X is visible at the periphery as "X In" and "X Out" as I/O ports. When anything is assigned to X, it will appear at "X Out". When X is read, the contents at "X In" will be used.
Z flag is set whenever the result of any operation is zero. C flag is set whenever the result of any arithmetic operation results in a carry. S flag is set whenever the result of any arithmetic operation results in a negative number.
It is assumed that the program memory and the data memory have synchronous writes
- 20 -
and asynchronous reads. Write operation: On a clock edge when the WR is asserted the data on the data bus is written into the location pointed by address. Read operation: When the RD is asserted, the contents of the location pointed by address will be presented at the data bus by the memory. When RD is de-asserted the memory will stop driving the bus.
For the sake of simplicity, it is assumed that both the memories are fast enough to complete the read and write operations in one clock.
2.3. INSTRUCTIONS
2.3.1. MOVE INSTRUCTIONS
There are two move instructions 2.3.1.1. Move INSTRUCTION: MOV dst, src This instruction copies the 8-bit data from the source register to the destination register. Destination & Source can be registers A/B/C/D or the input-output port X
2.3.1.2. Move Immediate Data INSTRUCTION: MVI, immediate data This instruction moves the 8-bit data which is a part of the instruction itself, to the register A.
- 21 -
2.3.2.1. Increment INCINSTRUCTION: INC dst, src This instruction retrieves the 8-bit data from the source register/port, increments it by 1 and stores in the destination register/port. The contents of source register remain unchanged.
2.3.2.2. Decrement INSTRUCTION: DEC dst, src This instruction retrieves the 8-bit data from the source register/port, decrements it by 1 and stores in the destination register/port. The contents of source register remain unchanged. 2.3.2.3. Addition INSTRUCTION: ADD src This instruction retrieves the 8-bit data from the source register/port, increments it by the contents of register A, and stores the result back in the source register/port. 2.3.2.4. Subtraction INSTRUCTION: SUB src This instruction retrieves the 8-bit data from the source register/port, decrements it by the contents of register A and stores the result back in the source register/port. 2.3.2.5. Compare INSTRUCTION: CMP src This instruction retrieves the 8-bit data from the source register/port, compares it with the contents of register A, and sets Z flag high if both are equal. This instruction does not modify the contents of the source register/port.
- 22 -
INSTRUCTION: SL dst, src This instruction retrieves the 8-bit data from the source register/port and left shifts the data by 1-bit and stores the result in destination register/port. This instruction does not modify the contents of the source register/port.
2.3.2.7. Shift Right INSTRUCTION: SR dst, src This instruction retrieves the 8-bit data from the source register/port and right shifts the data by 1-bit and stores the result in destination register/port. This instruction does not modify the contents of the source register/port.
2.3.3.1. Jump by immediate offset INSTRUCTION: JMP immediate_offset The value of the program counter is incremented by the value given as the immediate data. Immediate data is a part of the instruction itself. 2.3.3.2. Jump by immediate offset if Z flag is Set INSTRUCTION: JZ immediate_offset The value of the program counter is incremented by the value given as the immediate data, if the Z flag is high. Immediate data is a part of the instruction itself. If the Z flag is not set, then the program counter will increment by 1 as in other instructions.
- 23 -
INSTRUCTION: JMPCD The value of the program counter is changed to the address pointed by the concatenation of the contents of the register C and D.
2.3.4.2. Store Data INSTRUCTION: STORE src This instruction stores data memory with the 8-bit data of the source register. The address of the data memory where the contents of source is stored is given by the concatenation of the contents at registers C and D.
3. DESIGN ARCHITECTURE
This chapter explains the internal architecture of the top level entity and the sub modules. First the instruction set architecture was finalized an then the final design
1. MVI 2. JMP 3. JZ
: 01_< 8-Bit Immediate Data>_X : 10_< 8-Bit Immediate Offset>_X : 01_< 8-Bit Immediate Offset>_X
Instructions MOV, INC, DEC, SL and SR have both destination and source as the part of the instruction. 4. MOV : 00_001_< 3-Bit Destination>_< 3_Bit Source>
- 25 -
5. INC 6. DEC 7. SL 8. SR
: 00_010_< 3-Bit Destination>_< 3_Bit Source> : 00_011_< 3-Bit Destination>_< 3_Bit Source> : 00_100_< 3-Bit Destination>_< 3_Bit Source> : 00_101_< 3-Bit Destination>_< 3_Bit Source>
The destination register/port for the instructions ADD, CMP and SUB are same as the source, so no need of mentioning the destination in the instruction. 9. CMP 10. ADD 11. SUB : 00_000_00X_< 3_Bit Source> : 00_000_010_< 3_Bit Source> : 00_000_011_< 3_Bit Source>
The source in the case of LOAD instruction is fixed i.e. the data memory, and in the case of STORE instruction, the SOURCE is fixed i.e. Data Memory
The direct jump instruction JMPCD doesnt require any destination, source or immediate data to be the part of the instruction
C D Xin
: : :
Destination can be one of the registers A, B, C, D or the output port Xout Total of 3-bits are required to define the source A B C D : : : : 000 001 010 011 1XX
Xout :
2. ADD D i.e. Add the contents of register D to A and store the result in D Destination is B Source is D Instruction Code : : : Not Required, Same as Source 011 00_000_010_011
3. MVI A7 i.e. Move immediate data A7 to register A Destination is A Data Instruction Code : : : Not Required, It is fixed 1010_0111 01_1010_0111_1 / 01_1010_0111_0
- 27 -
In my design there are modules for arithmetic operations, logical operations, move operations, jump operations, instructions register and control unit. All the units are interconnected inside the Top module. The different modules are:
Move unit Shift Unit Arithmetic Unit Program Counter Instruction Register Instruction Decoder Control Unit Data Memory. Program Memory.
Selection of source register, Selection of the Destination register, Selection of input data to the destination register, control signal for the buffer for Xout and control signal
- 28 -
for data bus connected to the data memory are generated inside the top level entity.
Addr_PC IR_in
- 29 -
Use
: This port can be used by the user for providing immediate data for various instructions
2. Xout
Length Type Use : 8 Bit : Output : This port can be used by the user for getting the immediate result of various instructions
3. Clk
Length Type Use : 1 Bit : Input : This port provides the global clock signal used to synchronize the internal registers, program memory and the data memory
4. Rst
Length Type Use : 1 Bit : Input : This port provides the global reset signal to all the internal registers, program memory, data memory, instruction register etc.
5. Addr_PC
Length Type Use : 16 Bit : Output : This port serves as the address lines for the 6K x 11 Bits program memory
6. IR_in
Length Type Use : 11 Bit : Input : This port provides 11-bit instruction to the processor fetched from the program memory
7. Data_inout
Length Type Use : 8 it : Inout : This port provides the 8-bit data to-and-from the data memory. Buffers control the direction of data flow
- 30 -
8. Addr_data
Length Type Use : 16 Bit : Output : This port serves as the address lines for the 6K x 8 Bits data memory
9. wr_data
Length Type Use : 1 Bit : Output : This port provides the write signal to the data memory when data has to be written to the data memory
10. rd_data
Length Type Use : 1 Bit : Output : This port provides the read signal to the data memory when data has to be read from the data memory
- 31 -
- 32 -
A 8-bit, 4-to-1 multiplexer with the select lines as I[2:1] is used to identify the register. Another 8-bit, 2-to-1 multiplexer with the select line as I[3] is used to select either the input port Xin or the already selected register. For e.g. if I[3] bit is 1 then irrespective of the bits I[2:1], the source will be input port Xin and if I[3] is 0 then the source will be selected according to the value of the bits I[2:1].
select lines for this multiplexer are generated by the control unit.
The destination is represented in the bits I[6:4] of the instruction. Only two (least significant, I[5:4]) of these bits are required to select one of the four registers, the third bit is used to select Xout as the destination. The instructions ADD and SUB have the destination same as the source. So for these two instructions the bits used as input to the destination decoder are I[2:1]. A 2-bit, 2-to1 multiplexer is used for this purpose. The input to this MUX are I[5:4] and I[2:1]. The select line is generated inside the control unit.
One more signal En_dec is used which serves as the enable for the decoder. This signal is also generated inside the control unit. If the control signal for Xout goes high, then also the destination decoder get disabled.
register is also being introduced in the design to store the value of control signal for tristate buffer for Xout. As for the destination decoder, the control signal for the Xout tristate buffer is generated using a 1-bit, 2-to-1 multiplexer. The inputs to this MUX are I[6] and I[3]. The select line is generated inside the control signal. Another signal Xout_buf is used which is ANDed with the output of the MUX. The result is stored in a 1-bit register X_buf, the output of which is connected to the control line of the tristate buffer for Xout. The signal Xout_buf is generated inside the control unit.
3.4.1. ARCHITECTURE
I[9:2]
src
8 1
8 0
I[10]
8 Result_mu - 35 -
Instruction
1. MVI immediate data 2. MOV dst, src : :
Instruction Code
0__1__< 8-bit immediate data>_X 0__0__001_< 3-bit Destination>_< 3-bit source> I[10]
So depending upon the instruction bit I[10 ], the multiplexer will select either the instruction bits I[9:2] (i.e. the immediate data) or the source
3.5.1. ARCHITECTURE
{0, src[7:1]}
{src[6:0], 0}
8 1
8 0
I[7]
8 - 36 Result_su
Instruction
1. SL dst, src 2. SR dst, src : :
Instruction Code
00_10__0_< 3-bit Destination>_< 3-bit source> 00_10__1_< 3-bit Destination>_< 3-bit source> I[7]
So depending upon the instruction bit I[7 ], the multiplexer will either left shift the source by 1-bit or right shift by 1-bit.
- 37 -
src 8 8
Cin
Sub
I[8]
q_S
q_C
2. A
Length Type Use : 8 Bit : Input : This port always provides the contents of register A for SUB, ADD and CMP instructions.
3. Cin
Length Type Use : 1 Bit : Input : This port provides the carry-in signal to the adder inside the arithmetic unit. This signal is generated inside the control unit.
4. Sub
Length Type Use : 1 Bit : Input : This signal is generated inside the control unit. If Sub goes high then the 2nd input the adder is converted to its 2s complement form
5. I[8]
Length Type Use : 1 Bit : Input : This is the 8th bit of the instruction. This line is used to select the - 38 -
6. q_C
Length Type Use : 1 Bit : Input : This signal is enable signal for the carry signal for the carry flag.
7. q_S
Length Type Use : 1 Bit : Input : This signal is enable signal for the Sign signal for the Sign flag.
8. Result_au
Length Type Use : 8 Bit : Output : This port gives the result of the arithmetic unit.
9. Z
Length Type Use : 1 Bit : Output : This signal is given to the Zero flag inside the top entity
10. C
Length Type Use : 1 Bit : Output : This signal is given to the Caary flag inside the top entity
11. S
Length Type Use : 1 Bit : Output : This signal is given to the Sign flag inside the top entity
3.6.3. ARCHITECTURE
The basic block inside the arithmetic unit is an 8-bit ripple carry adder. One input
- 39 -
to the adder is fixed, i.e. the 8-bit source. The second input to the adder depends upon the instruction to execute. The subtraction operations are also performed using the same adder by performing the 2s complement operation of the input to be subtracted by using 8 XOR gates. One input to the arithmetic unit comes from the Source register/port and the second input is fixed to register A Sign, Carry and Zero flags are the part of the top level entity, but their values are generated inside the arithmetic unit only.
Inst. Code 000_ 1 _0_<dst><src> 000_ 1 _1_<dst><src> 000_ 0 _00_10__<src> 000_ 0 _00_11__<src> 000__0__00_0X__<src>
I/P2 0 0 A A A
Cin 1 0 0 1 1
Sub 0 1 0 1 1
- 40 -
0 A
I/P1 Cout
8 Result_au
Depending upon the value of instruction bit I[8], the input 2 will be either 0 or
register A
Instruction nos. given here are generated by the instruction register discussed
later
Thus by controlling the values Cin, Sub and I/P2, different operation can be
performed by the same unit.
- 41 -
3.6.4. FUNCTIONALITY
1. INC: The 2nd input to the adder is 0 and Cin is high, so the result comes out to be source +1 2. DEC: The 2nd input is Zero, Sub is high and Cin is low, the result is source + 1s complement of 0 i.e. 1111_1111 which is also the 2s complement of 1. So the result comes out to be source 1 3. ADD: Cin and Sub both are low, so the 2nd input i.e. A, is passed as it is. The result comes out to be source + contents of register A. 4. SUB: Cin and Sub both are high, so the 2nd input i.e. A, is converted to its 2s complement form i.e. its negative value. The result comes out to be source contents of register A. 5. CMP: Its functionality is exactly the same as Sub, the only difference being that the result in this case is not stored in any register.
3.6.5. FLAGS
The flags are the part of the top level entity, but the values to be loaded in them are generated inside the arithmetic unit 1. Carry: This is be high only if there is a carry out and the instruction being executed is ADD or INC 2. Sign: This is high only if carry out is low and the instruction being executed is SUB, CMP or DEC 3. Zero: This is high if the result of the arithmetic unit is 0
The signals q_S and q_C controlling the Sign and Carry flags are generated inside the Control unit. - 42 -
PROGRAM COUNTER
This unit performs three instructions: 1. JMP immediate offset 2. JZ immediate offset 3. JMPCD
3.7.1. ARCHITECTURE
If instruction is JMPCD i.e. q14 is high then the program counter will be loaded with the value stored in registers C & D If q14 is low then there can be three cases 1. Instruction is JMP 2. Instruction is JZ and Zero flag is set. In both these cases the program counter will be loaded with a new value which is equal to the old value plus the 8- bit immediate offset which is specified in the instruction bits I[9:2]. 3. If all of the above conditions are not met then the program counter will be just incremented by 1.
- 43 -
q14 16 CD 16
rst clk
PROGRAM COUNTER 8
S4 I[9:2] 8 00000001
16
16-BIT ADDER
- 44 -
3. JZ
5. INC
q[5] = I[11] I[10] I[9] I[8] I[7] 6. DEC : 00_011_< 3-Bit Destination>_< 3_Bit Source>
7. SL
8. SR
- 45 -
9. CMP
10. ADD
11. SUB
12. LOAD
13. STORE
14. JMPCD : 00_000_1XX_XXX q[14] = I[11] I[10] I[9] I[8] I[7] I[6]
- 46 -
1. q_C: This is the enabling signal for the carry flag. It is high only if the instruction being executed is ADD(q10) or INC(q5). q_C = q[5] + q[10]
2. q_S: This is the enabling signal for the sign flag. It is high only if the instruction being executed is SUB(q11) or DEC(q6) or CMP(q9). q_C = q[6] + q[9] + q[11]
3. Sub: As shown in the table in arithmetic unit, this signal is high in the case of DEC, CMP and SUB Sub = q[6] + q[9] + q[11]
4. Cin: As shown in the table in arithmetic unit, this signal is high in the case of INC, CMP and SUB Cin = q[5] + q[9] + q[11]
Signals to Program Counter 1. S4: This signal selects the immediate offset to be added to contents of the program counter. It is high if the instruction being executed is JMP or if the instruction begin executed is JZ and Zero flag is set at the same time S4 = q[2] + q[3].Z
Signals to Data Memory 1. wr_data: This signal goes high if the instruction being executed is STORE. wr_data = q[13]
2. rd_data: This signal goes high if the instruction being executed is LOAD. rd_data = q[12]
- 47 -
Signals to Top level Entity 1. ld_flags: This is the load signals for the flags. This signal is high if the instruction being executed in an arithmetic instruction. ld_flags = q[5] + q[6] + q[9] + q[10] + q[11]
2. S2: This signal selects the either destination or the source bits for the input to the destination decoder. This signal is high only if the instructions being executed is ADD or SUB which have destination same as the source. S2 = q[10] + q[11]
3. Xout_buf: This signal is ANDed with the destination bit to generate the control signal for the Xout tristate buffer. This signal is high only if the instruction being executed involves any destination. Xout_buf = q[4] + q[5] + q[6] + q[7] + q[8] + q[10] + q[11] + q[12]
4. En_dec: This signal is NORed with the control signal of Xout tristate buffer to generate the enable signal for the Destination Decoder. This signal is high only if the instruction being executed doesnt involve any destination. So if either the control signal for Xout goes high or this En_dec signal goes high, it will disable the destination decoder. En_dec = q[1] + q[2] + q[3] + q[9] + q[13] + q[14] 5. S1, S0: These are the select lines for the multiplexer which selects the result of which unit should be present on the data bus. Their value is 00 for Move Unit 01 for Arithmetic Unit 10 for Shift Unit 11 for LOAD Instruction
- 48 -
So these signals are generated by 4X2 Encoder. The Input to the encoder begin E[3:0] where: E[0] = q[1] + q[4] E[1] = q[5] + q[6] + q[9] + q[10] + q[11] E[2] = q[7] + q[8] E[3] = q[12]
4. DESIGN IMPLEMENTATION
This chapter details the complete design flow for the FPGA implementation of the design. The target device is SPARTAN 3.
- 49 -
Readable code Faster and simpler simulation Portable code for migration to different device families
- 50 -
Typically with larger hierarchical HDL designs, one should perform separate simulations on each module before testing the entire design. This makes it easier to debug your code.
Once each module functions as expected, a test bench is created to verify that entire design functions as planned. The same test bench is used again for the final timing simulation to confirm that the design functions as expected under worstcase delay conditions.
4.3. SYNTHESIS
After creating HDL design, you must synthesize it. During synthesis, behavioral information in the HDL file is translated into a structural netlist, and the design is optimized for a Xilinx device. Xilinx offers its own synthesis tool, Xilinx Synthesis Technology (XST). XST is a Xilinx tool that synthesizes HDL designs to create Xilinx specific netlist files called NGC files. The NGC file is a netlist that contains both logical design data and constraints that takes the place of both EDIF and NCF files.
- 51 -
FSM Encoding Algorithm Case Implementation Style FSM Style RAM Extraction RAM Style Mux Style Decoder Extraction Priority Encoder Extraction Shift Register Extraction Logical Shifter Extraction
- 52 -
Macro Statistics ---------------# Adders/Subtractors 16-bit adder carry out 8-bit adder carry in/out # Registers 1-bit register 11-bit register 16-bit register 8-bit register # Multiplexers 1-bit 4-to-1 multiplexer 8-bit 4-to-1 multiplexer # Tristates 8-bit tristate buffer # Xors 8-bit xor2 Device utilization summary: --------------------------Selected Device : 3s200pq208-5 Number Number Number Number Number of of of of of Slices: Slice Flip Flops: 4 input LUTs: bonded IOBs: GCLKs:
: : : : : : : : : : : : : : :
2 1 1 9 2 1 1 5 3 2 1 3 3 1 1
82 77 146 71 1
of of of of of
4% 2% 3% 50% 12%
TIMING REPORT ------------Minimum Minimum Maximum Maximum period: 10.599ns (Maximum Frequency: 94.347MHz) input arrival time before clock: 7.845ns output required time after clock: 10.277ns combinational path delay: 7.862ns
4.4. TRANSLATE
4.4.1. NGD Build Overview
NGD Build reads in a netlist file in EDIF or NGC format and creates a NGD file that contains a logical description of the design in terms of logic elements, such as AND gates, OR gates, decoders, flip-flops, and RAMs.
- 53 -
The NGD file contains both a logical description of the design reduced to Xilinx Native Generic Database (NGD) primitives and a description of the original hierarchy expressed in the input netlist. The output NGD file can be mapped to the desired device family.
3. Checks the design by running a Logical Design Rule Check (DRC) on the converted design Logical DRC is a series of tests on a logical design.
4.5. MAP
The MAP program maps a logical design to a Xilinx FPGA. The input to MAP is an NGD file, which is generated using the NGD Build program. The NGD file contains a logical description of the design that includes both the hierarchical components used to develop the design and the lower level Xilinx primitives. The NGD file also contains any number of NMC (macro library) files, each of which contains the definition of a physical macro. MAP first performs a logical DRC (Design Rule Check) on the design in the NGD - 54 -
file. MAP then maps the design logic to the components (logic cells, I/O cells, and other components) in the target Xilinx FPGA. The output from MAP is an NCD (Native Circuit Description) filea physical representation of the design mapped to the components in the targeted Xilinx FPGA. The mapped NCD file can then be placed and routed using the PAR program.
NGD fileNative Generic Database file. This file contains a logical description of the design expressed both in terms of the hierarchy used when the design was first created and in terms of lower-level Xilinx primitives to which the hierarchy resolves. The file also contains all of the constraints applied to the design during design entry or entered in a UCF (User Constraints File). The NGD file is created by the NGD Build program.
NMC fileMacro library file. An NMC file contains the definition of a physical macro. When there are macro instances in the NGD design file, NMC files are used to define the macro instances. There is one NMC file for each type of macro in the design file.
Guide NCD fileAn optional input file generated from a previous MAP run. An NCD file contains a physical description of the design in terms of the components in the target Xilinx device. A guide NCD file is an output NCD file from a previous MAP run that is used as an input to guide a later MAP run.
Guide NGM fileA binary design file containing all of the data in the input NGD file as well as information on the physical design produced by the mapping.
- 55 -
NCD (Native Circuit Description) filea physical description of the design in terms of the components in the target Xilinx device.
PCF (Physical Constraints File)an ASCII text file that contains constraints specified during design entry expressed in terms of physical elements. The physical constraints in the PCF are expressed in Xilinxs constraint language. MAP creates a PCF file if one does not exist or rewrites an existing file.
NGM filea binary design file that contains all of the data in the input NGD file as well as information on the physical design produced by mapping. The NGM file is used to correlate the back-annotated design netlist to the structure and naming of the source design.
MRP (MAP report)a file that contains information about the MAP run. The MRP file lists any errors and warnings found in the design, lists design attributes specified, and details on how the design was mapped (for example, the logic that was removed or added and how signals and symbols in the logical design were mapped into signals and components in the physical design). The file also supplies statistics about component usage in the mapped design.
- 56 -
Section 3 - Informational Section 4 - Removed Logic Summary Section 5 - Removed Logic Section 6 - IOB Properties Section 7 - RPMs Section 8 - Guide Report Section 9 - Area Group Summary Section 10 - Modular Design Summary Section 11 - Timing Report Section 12 - Configuration String Information Section 13 - Additional Device Resource Counts
____________________________________
- 57 -
| Source | clk (edge) | clk (edge) | +-------------+------------+------------+ data_in_out<0>| 1.356(R)| 0.134(R) data_in_out<1>| 1.305(R)| 0.134(R) data_in_out<2>| 1.356(R)| 0.134(R) data_in_out<3>| 1.305(R)| 0.134(R) data_in_out<4>| 1.356(R)| 0.134(R) data_in_out<5>| 1.305(R)| 0.134(R) data_in_out<6>| 1.356(R)| 0.134(R) data_in_out<7>| 1.305(R)| 0.134(R) ir_in<10> | 3.202(R)| 0.643(R) ir_in<11> | 3.202(R)| -1.117(R) ir_in<1> | 3.202(R)| -1.117(R) ir_in<2> | 3.202(R)| -1.117(R) ir_in<3> | 3.202(R)| -1.117(R) ir_in<4> | 3.202(R)| -1.117(R) ir_in<5> | 3.202(R)| -1.117(R) ir_in<6> | 3.202(R)| -1.117(R) ir_in<7> | 3.202(R)| 0.643(R) ir_in<8> | 3.202(R)| 0.643(R) ir_in<9> | 3.202(R)| -1.117(R) xin<0> | 4.237(R)| -0.832(R) xin<1> | 4.247(R)| -0.349(R) xin<2> | 4.026(R)| -0.832(R) xin<3> | 4.036(R)| -0.832(R) xin<4> | 3.815(R)| -0.832(R) xin<5> | 3.825(R)| -0.832(R) xin<6> | 3.380(R)| -0.349(R) xin<7> | 3.004(R)| -0.832(R) +-------------+------------+------------+
Clock clk to Pad +-------------+------------+ | | clk (edge) | | Destination | to PAD | +-------------+------------+ addr_data<0> | 6.407(R) addr_data<10> | 6.407(R) addr_data<11> | 6.407(R) addr_data<12> | 6.407(R) addr_data<13> | 6.407(R) addr_data<14> | 6.407(R) addr_data<15> | 6.407(R) addr_data<1> | 6.407(R) addr_data<2> | 6.407(R) addr_data<3> | 6.407(R) addr_data<4> | 6.407(R)
- 58 -
addr_data<5> | 6.407(R) addr_data<6> | 6.407(R) addr_data<7> | 6.407(R) addr_data<8> | 6.407(R) addr_data<9> | 6.407(R) addr_pc<0> | 6.407(R) addr_pc<10> | 6.407(R) addr_pc<11> | 6.407(R) addr_pc<12> | 6.407(R) addr_pc<13> | 6.407(R) addr_pc<14> | 6.407(R) addr_pc<15> | 6.407(R) addr_pc<1> | 6.407(R) addr_pc<2> | 6.407(R) addr_pc<3> | 6.407(R) addr_pc<4> | 6.407(R) addr_pc<5> | 6.407(R) addr_pc<6> | 6.407(R) addr_pc<7> | 6.407(R) addr_pc<8> | 6.407(R) addr_pc<9> | 6.407(R) data_in_out<0>| 7.565(R) data_in_out<1>| 7.565(R) data_in_out<2>| 7.565(R) data_in_out<3>| 7.565(R) data_in_out<4>| 7.565(R) data_in_out<5>| 7.565(R) data_in_out<6>| 7.565(R) data_in_out<7>| 7.565(R) rd_data | 7.164(R) wr_data | 7.164(R) xout<0> | 6.618(R) xout<1> | 6.618(R) xout<2> | 6.618(R) xout<3> | 6.618(R) xout<4> | 6.618(R) xout<5> | 6.618(R) xout<6> | 6.618(R) xout<7> | 6.618(R) Pad to Pad +--------------+---------------+---------+ | Source Pad |Destination Pad| Delay | ---------------+---------------+---------+ xin<0> |data_in_out<0> | 6.159 xin<1> |data_in_out<1> | 6.159 xin<2> |data_in_out<2> | 6.159 xin<3> |data_in_out<3> | 6.159 xin<4> |data_in_out<4> | 6.159 xin<5> |data_in_out<5> | 6.159
- 59 -
xin<6> |data_in_out<6> | 6.159 xin<7> |data_in_out<7> | 6.159 +--------------+---------------+---------+ Analysis completed Tue May 30 13:11:29 2006
Timing-drivenThe Xilinx timing analysis software enables PAR to place and route a design based upon timing constraints.
Non Timing-driven (cost-based)Placement and routing are performed using various cost tables that assign weighted values to relevant factors such as constraints, length of connection, and available routing resources. Non timing-driven placement and routing is used if no timing constraints are present.
4.6.2 PLACING
The PAR placer executes multiple phases of the placer. PAR writes the NCD after all
- 60 -
the placer phases are complete. During placement, PAR places components into sites based on factors such as constraints specified in the PCF file, the length of connections, and the available routing resources.
4.6.3. ROUTING
After placing the design, PAR executes multiple phases of the router. The router performs a converging procedure for a solution that routes the design to completion and meets timing constraints. Once the design is fully routed, PAR writes an NCD file, which can be analyzed against timing. PAR writes a new NCD as the routing improves throughout the router phases. Note: Timing-driven place and timing-driven routing are automatically invoked if PAR finds timing constraints in the physical constraints file
- 61 -
| Source | clk (edge) | clk (edge) | +-------------+------------+------------+ data_in_out<0>| 3.411(R)| 0.111(R) data_in_out<1>| 3.599(R)| -0.001(R)| data_in_out<2>| 3.271(R)| 0.056(R)| data_in_out<3>| 3.497(R)| -0.016(R)| data_in_out<4>| 3.551(R)| 0.085(R)| data_in_out<5>| 4.543(R)| -0.583(R)| data_in_out<6>| 3.925(R)| -0.144(R)| data_in_out<7>| 3.065(R)| -0.077(R)| ir_in<10> | 2.622(R)| 0.534(R)| ir_in<11> | 2.623(R)| -0.401(R)| ir_in<1> | 2.623(R)| -0.400(R)| ir_in<2> | 2.622(R)| -0.400(R)| ir_in<3> | 2.623(R)| -0.400(R)| ir_in<4> | 2.623(R)| -0.401(R)| ir_in<5> | 2.623(R)| -0.400(R)| ir_in<6> | 2.622(R)| -0.400(R)| ir_in<7> | 2.622(R)| 0.753(R)| ir_in<8> | 2.623(R)| 0.394(R)| ir_in<9> | 2.623(R)| -0.401(R)| xin<0> | 8.194(R)| -2.109(R)| xin<1> | 7.703(R)| -1.613(R)| xin<2> | 7.778(R)| -1.795(R)| xin<3> | 8.995(R)| -1.671(R)| xin<4> | 7.548(R)| -1.576(R)| xin<5> | 8.228(R)| -1.709(R)| xin<6> | 7.885(R)| -1.298(R)| xin<7> | 6.916(R)| -2.378(R)| +-------------+------------+------------+ Clock clk to Pad +-------------+------------+ | Destination | clk (edge) | | | to PAD | +-------------+------------+ addr_data<0> | 9.442(R) addr_data<10> | 9.144(R) addr_data<11> | 9.149(R) addr_data<12> | 8.825(R) addr_data<13> | 8.607(R) addr_data<14> | 8.525(R) addr_data<15> | 8.784(R) addr_data<1> | 8.424(R) addr_data<2> | 8.638(R) addr_data<3> | 9.100(R) addr_data<4> | 8.361(R) addr_data<5> | 8.380(R)
- 62 -
addr_data<6> | 8.640(R) addr_data<7> | 9.172(R) addr_data<8> | 8.907(R) addr_data<9> | 8.852(R) addr_pc<0> | 9.099(R) addr_pc<10> | 8.084(R) addr_pc<11> | 8.290(R) addr_pc<12> | 8.528(R) addr_pc<13> | 8.178(R) addr_pc<14> | 8.735(R) addr_pc<15> | 8.824(R) addr_pc<1> | 9.076(R) addr_pc<2> | 9.333(R) addr_pc<3> | 9.067(R) addr_pc<4> | 11.008(R) addr_pc<5> | 8.909(R) addr_pc<6> | 9.681(R) addr_pc<7> | 9.785(R) addr_pc<8> | 8.384(R) addr_pc<9> | 9.393(R) data_in_out<0>| 12.164(R) data_in_out<1>| 13.308(R) data_in_out<2>| 12.626(R) data_in_out<3>| 12.626(R) data_in_out<4>| 12.632(R) data_in_out<5>| 14.403(R) data_in_out<6>| 12.408(R) data_in_out<7>| 14.370(R) rd_data | 12.080(R) wr_data | 12.561(R) xout<0> | 9.245(R) xout<1> | 9.249(R) xout<2> | 9.616(R) xout<3> | 8.549(R) xout<4> | 9.300(R) xout<5> | 9.623(R) xout<6> | 9.578(R) xout<7> | 9.265(R) +-------------+------------+ Pad to Pad +--------------+---------------+---------+ | Source Pad |Destination Pad| Delay | +--------------+---------------+---------+ xin<0> |data_in_out<0> | 9.217 xin<1> |data_in_out<1> | 8.800 xin<2> |data_in_out<2> | 9.069 xin<3> |data_in_out<3> | 8.827 xin<4> |data_in_out<4> | 8.639 xin<5> |data_in_out<5> | 9.744
- 63 -
xin<6> |data_in_out<6> | 9.310 xin<7> |data_in_out<7> | 10.122 +--------------+---------------+---------+ Analysis completed Tue May 30 13:15:43 2006
The final bit file was downloaded into the FPGA device and real time verification was done.
5. CONCLUSION
The design was successfully implemented on the target device. The design was tested successfully by both Functional and Post PAR Simulation.
- 64 -
: :
2. Number of pipelining stages can be increased to 4-5 from the current number of 2. First pipelining stage is Read-Fetch-Execute and the Second pipelining stage is Write. By dividing the First stage further in to three stages, maximum operating frequency will also be improved by great extent.
- 65 -
output [7:0] result; output S, C, Z; wire [7:0] in2, in2_final; wire cout;
- 66 -
assign in2 = I_8 ? 8'b0 : A; assign in2_final = in2 ^ {8{sub}}; full_adder_8bit a1(.in1(src), .in2(in2_final), .cin(cin), .cout(cout), .sum(result)); assign C = q_c && cout; //CARRY FLAG assign S = q_s && (!cout); //SIGN FLAG assign Z = (!result[7]) & (!result[6]) & (!result[5]) & (!result[4]) & (!result[3]) & (!result[2]) & (!result[1]) & (!result[0]); //ZERO FLAG endmodule
/* ~~~~PROGRAM COUNTER~~~~ */ module program_counter(ld_pc, rst, clk, c, d, I_2_9, s4, q14, PC); input input input input input input input ld_pc; rst; clk; [7:0] c; [7:0] d; [7:0] I_2_9; s4, q14;
output reg [15:0] PC; wire cin=1'b0; wire [15:0] in2, adder_out, pc_in; wire [7:0] in2_half; assign in2_half = s4 ? I_2_9 : 8'b0000_0001;
- 67 -
assign in2 = {8'b0000_0000, in2_half}; assign pc_in = q14 ? {c,d} : adder_out; adder_16 a16 (.in1(PC), .in2(in2), .out(adder_out), .cin(cin)); always@(posedge clk, posedge rst) begin if (rst) PC = 8'b0; else if (ld_pc) PC = pc_in; end endmodule
always@(posedge clk, posedge rst) begin if(rst) I = 11'b0100_0000_000; else if (ld_ir) I = ir_in; end endmodule
/* ~~~~INSTRUCTION DECODER~~~~ */ module instruction_decoder(I_4_11, q); input [11:4] I_4_11; output [14:1] q; assign q[1] assign q[2] assign q[3] assign q[4] (!I_4_11[8]) = = = = & (!I_4_11[11]) & I_4_11[10]; //MVI I_4_11[11] & (!I_4_11[10]); //JMP offset I_4_11[11] & I_4_11[10]; //JZ (!I_4_11[11]) & (!I_4_11[10]) & (!I_4_11[9]) & I_4_11[7]; //MOV
assign q[5] = (!I_4_11[11]) & (!I_4_11[10]) & (!I_4_11[9]) & I_4_11[8] & (!I_4_11[7]); //INC assign q[6] = (!I_4_11[11]) & (!I_4_11[10]) & (!I_4_11[9]) & I_4_11[8] & I_4_11[7]; //DEC assign q[7] = (!I_4_11[11]) & (!I_4_11[10]) & I_4_11[9] & (!I_4_11[8]) & (!I_4_11[7]); //SL assign q[8] = (!I_4_11[11]) & (!I_4_11[10]) & I_4_11[9] & (!I_4_11[8]) & I_4_11[7]; //SR assign q[9] = (!I_4_11[11]) & (!I_4_11[10]) & (!I_4_11[9]) & (!I_4_11[8]) & (!I_4_11[7]) & (!I_4_11[6]) & (!I_4_11[5]); //CMP assign q[10] = (!I_4_11[11]) & (!I_4_11[10]) & (!I_4_11[9]) & (!I_4_11[8]) & (!I_4_11[7]) & (!I_4_11[6]) & I_4_11[5] & (!I_4_11[4]); //ADD assign q[11] = (!I_4_11[11]) & (!I_4_11[10]) & (!I_4_11[9]) & (!I_4_11[8]) & (!I_4_11[7]) & (!I_4_11[6]) & I_4_11[5] & I_4_11[4]; //SUB assign q[12] = (!I_4_11[11]) & (!I_4_11[10]) & I_4_11[9] & I_4_11[8] & (!I_4_11[7]); //LOAD assign q[13] = (!I_4_11[11]) & (!I_4_11[10]) & I_4_11[9] & I_4_11[8] & I_4_11[7]; //STORE assign q[14] = (!I_4_11[11]) & (!I_4_11[10]) & (!I_4_11[9]) & (!I_4_11[8]) & (!I_4_11[7]) & I_4_11[6]; //JMPCD endmodule
- 69 -
module control_unit(q, z, q_c, q_s, s0, s1, s2, wr_data, rd_data, ld_pc, ld_ir, en_dec, sub, cin, s4, xout_buf, ld_flags); input [14:1] q; input z; output output output output output output output output reg q_c, q_s; reg s0, s1, s2, s4; wr_data, rd_data; ld_pc, ld_ir; reg ld_flags; reg en_dec; reg sub, cin; reg xout_buf;
reg [3:0] E; assign ld_pc=1'b1; assign ld_ir=1'b1; assign rd_data=q[12]; assign wr_data=q[13]; always @ * begin E[0]=q[1] | q[4]; E[1]=q[5] | q[6] | q[9] | q[10] | q[11]; E[2]=q[7] | q[8]; E[3]=q[12]; case (E) 4'b0010: 4'b0100: 4'b1000: default: endcase sub=q[6] cin=q[5] q_c=q[5] q_s=q[6] ld_flags | | | | = begin begin begin begin s0=1'b1; s0=1'b0; s0=1'b1; s0=1'b0; s1=1'b0; s1=1'b1; s1=1'b1; s1=1'b0; end end end end
s4=q[2] | (q[3] && z); s2=q[10] | q[11]; en_dec=q[1] | q[2] | q[3] | q[9] | q[13] | q[14]; xout_buf=q[4] | q[5] | q[6] | q[7] | q[8] | q[10] | q[11] | q[12]; end
- 70 -
endmodule
reg Sign, Carry, Zero; reg [7:0] A_reg, B_reg, C_reg, D_reg, X_reg; reg X_buf; reg [7:0] data_bus; reg ld_a_temp, ld_B, ld_C, ld_D; wire wire wire wire wire wire wire wire wire cin, sub, q_C, q_S, s, c, z; s0, s1, s2, s4; xout_buf, en_dec; ld_A, ld_ir, ld_pc, ld_flags; [11:1] I; [14:1] q; [7:0] result_au, result_su, result_mu; [7:0] src; [7:0] data_in;
arithmetic_unit au1(A_reg, src, I[8], cin, sub, q_C, q_S, result_au, s, c, z); control_unit cu1(q, Zero ,q_C, q_S, s0, s1, s2, wr_data, rd_data, ld_pc, ld_ir, en_dec, sub, cin, s4, xout_buf, ld_flags); instruction_decoder id1(I[11:4], q); instruction_register ir1(clk, rst, ld_ir, ir_in, I); move_unit mu1(I[10], src, I[9:2], result_mu); program_counter pc1(ld_pc, rst, clk, C_reg, D_reg, I[9:2], s4, q[14],
- 71 -
addr_pc); shift_unit su1(src, I[7], result_su); assign xout = X_buf ? X_reg : 8'bz; assign ld_A = ld_a_temp || q[1]; assign addr_data = {C_reg, D_reg}; //SRC Multiplexer assign src = I[3] ? xin : (I[2] ? (I[1] ? D_reg : C_reg) : (I[1] ? : A_reg)); assign data_in = rd_data ? data_in_out : 8'bz; assign data_in_out = wr_data ? src : 8'bz; always @ (posedge clk, posedge rst) begin if (rst) begin A_reg=8'b0; B_reg=8'b0; C_reg=8'b0; D_reg=8'b0; X_reg=8'b0; X_buf=1'b0; Sign=1'b0; Carry=1'b0; Zero=1'b0; end else begin X_reg = data_bus; X_buf = xout_buf & (s2 ? I[3] : I[6]); if(ld_flags) begin Carry=c; Zero=z; Sign=s; end if (ld_A) A_reg = data_bus; if (ld_B) B_reg = data_bus;
B_reg
- 72 -
always @ * begin // Destination Decoder if (!((xout_buf & (s2 ? I[3] : I[6])) || en_dec)) begin case (s2 ? I[2:1] : I[5:4]) 2'b00: begin ld_a_temp =1'b1; ld_B = 1'b0; ld_C = 1'b0; ld_D=1'b0; end 2'b01: begin ld_a_temp =1'b0; ld_B = 1'b1; ld_C = 1'b0; ld_D=1'b0; end 2'b10: begin ld_a_temp =1'b0; ld_B = 1'b0; ld_C = 1'b1; ld_D=1'b0; end 2'b11: begin ld_a_temp =1'b0; ld_B = 1'b0; ld_C = 1'b0; ld_D=1'b1; end endcase end else begin ld_a_temp =1'b0; ld_B = 1'b0; ld_C = 1'b0; ld_D=1'b0; end case ({s1, s0}) 2'b01: data_bus = 2'b10: data_bus = 2'b11: data_bus = default: data_bus endcase end endmodule result_au; result_su; data_in; = result_mu;
- 73 -
DEC A, A DEC A, B DEC A, C DEC A, D DEC A, X DEC B, A DEC B, B DEC B, C DEC B, D DEC B, X DEC C, A DEC C, B DEC C, C DEC C, D DEC C, X DEC D, A DEC D, B DEC D, C DEC D, D DEC D, X DEC X, A DEC X, B DEC X, C DEC X, D DEC X, X
: : : : : : : : : : : : : : : : : : : : : : : : :
11H0C0 11H0C1 11H0C2 11H0C3 11H0C4 11H0C8 11H0C9 11H0CA 11H0CB 11H0CC 11H0D0 11H0D1 11H0D2 11H0D3 11H0D4 11H0D8 11H0D9 11H0DA 11H0DB 11H0DC 11H0E0 11H0E1 11H0E2 11H0E3 11H0E4 11H080 11H081 11H082 11H083 11H084 11H088 11H089 11H08A 11H08B 11H08C 11H090 11H091
INC A, A INC A, B INC A, C INC A, D INC A, X INC B, A INC B, B INC B, C INC B, D INC B, X INC C, A INC C, B
: : : : : : : : : : : :
- 75 -
INC C, C INC C, D INC C, X INC D, A INC D, B INC D, C INC D, D INC D, X INC X, A INC X, B INC X, C INC X, D INC X, X JMP JMPCD JZ LOAD A LOAD B LOAD C LOAD D LOAD X MOV A, A MOV A, B MOV A, C MOV A, D MOV A, X MOV B, A MOV B, B MOV B, C MOV B, D MOV B, X MOV C, A MOV C, B MOV C, C MOV C, D MOV C, X MOV D, A MOV D, B
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
11H092 11H093 11H094 11H098 11H099 11H09A 11H09B 11H09C 11H0A0 11H0A1 11H0A2 11H0A3 11H0A4 [2B10, < 8-Bit Data>, 1b0] 11H020 [2B11, < 8-Bit Data>, 1b0] 11H180 11H188 11H190 11H198 11H1A0 11H041 11H041 11H042 11H043 11H044 11H048 11H049 11H04A 11H04B 11H04C 11H050 11H051 11H052 11H053 11H054 11H058 11H059
- 76 -
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
11H05A 11H05B 11H05C 11H060 11H061 11H062 11H063 11H064 [2B01, < 8-Bit Data>, 1b0] 11H100 11H101 11H102 11H103 11H104 11H108 11H109 11H10A 11H10B 11H10C 11H110 11H111 11H112 11H113 11H114 11H118 11H119 11H11A 11H11B 11H11C 11H120 11H121 11H122 11H123 11H124 11H140 11H141 11H142 11H143
- 77 -
SR A, X SR B, A SR B, B SR B, C SR B, D SR B, X SR C, A SR C, B SR C, C SR C, D SR C, X SR D, A SR D, B SR D, C SR D, D SR D, X SR X, A SR X, B SR X, C SR X, D SR X, X STORE A STORE B STORE C STORE D STORE X SUB A SUB B SUB C SUB D SUB X
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
11H144 11H148 11H149 11H14A 11H14B 11H14C 11H150 11H151 11H152 11H153 11H154 11H158 11H159 11H15A 11H15B 11H15C 11H160 11H161 11H162 11H163 11H164 11H1A0 11H1A1 11H1A2 11H1A3 11H1A4 11H018 11H019 11H01A 11H01B 11H01C
- 78 -
- 79 -
- 80 -
- 81 -
- 82 -
References
[1] R. Aceves, Desarrollo de un enlace inalmbrico para telefona fija empleando una FPGA. Final Project at the ETSII, University of Valladolid, Spain, 2006. [2] M. Alonso, Diseo de un Entorno de Desarrollo de Alto y Bajo Nivel para un Procesador de Propsito General integrado en FPGA, Final Project at the ETSII, University of Valladolid, Spain, 2003. [3] J. del Barrio, Desarrollo sobre FPGA de un Emulador de una Planta de Microgeneracin Elctrica, Final Project at the ETSII, University of Valladolid, Spain, 2004. [4] K. Chapman, PicoBlaze 8-Bit Microcontroller for Virtex-E and Spartan-II/IIE Devices, Xilinx XAPP213 (v2.0), online at http://www.xilinx.com/xapp/xapp213 .pdf, December, 2002 . [5] J. Gray, Designing a Simple FPGA-Optimized RISC CPU and System-on-a-Chip, DesignCon2001, online at http://www.fpgacpu.org/gr/index.html, 2001. [6] J. Gray, FPGA CPU Links, on line at http://www. fpgacpu.org/links.html, September, 2002. [7] S. K. Knapp, XC4000 Series Edge-Triggered and DualPort RAM Capability, Xilinx XAPP065, 1996. [8] J. Kent, Johns FPGA Page, online at http://members. optushome.com.au/jekent/FPGA.htm, January, 2002. [9] G. Moore, Cramming more components onto integrated circuits, Electronics Magazine, 19 April, 1965. [10] Opencores: http://www.opencores.org/ [11] S. de Pablo et al., A soft fixed-point Digital Signal Processor applied in Power Electronics, FPGAworld Conference 2005, Stockholm, Sweden, 2005. [12] I. Rodrguez, Desarrollo en FPGA de un interfaz USB.
- 83 -