Académique Documents
Professionnel Documents
Culture Documents
CS M152B
DMY
Overview
DMY
Pipelining
It acts like an assembly line
Station 1 Station 2 Station 3 Station 4
4 1 2 3 4 1 2 3 4 Time
2 1
3 2 1
4 3 2 4 3 4 Time
DMY
Pipelined RISC
RISC is an acronym for Reduced Instruction Set Computer It has a reduced and simple instruction set It has a large number of general-purpose registers
DMY
+
Control Unit
PC
Memory
DMY
+
Control Unit
PC
Memory
DMY
+
Control Unit
PC
Memory
DMY
Execution Stage
IF/ID ID/EX EX/MEM MEM/WB
+
Control Unit
PC
Memory
DMY
+
Control Unit
PC
Memory
DMY
+
Control Unit
PC
Memory
DMY
16-bit ISA
16-bit fixed-length instructions, 16 registers no funct field for R-type, only op field limited number of operations 4-bit opcode field => maximum 16 operations Suggested R-type R-type I-type J-type
3 3 3 3 4
opcode
4
rs
4
rt
4
rd
4
funct
opcode
4
rs
4
rt
4
rd
4
opcode
4
rs
rt
12
address
opcode
target address
DMY
Multiplier Algorithms
Pencil-and-paper method
10101 x 101 10101 101010 000000 + 101010 11100111 0 1 0
requires M cycles for one NxM multiplication implemented with AND, adder, and shift register
DMY
Multiplier Algorithms
Array Multiplier
DMY
Multiplier Algorithms
DMY
Multiplier Algorithms
Wallace Tree
P3j P4j P5j 3-2 compressor P6j P7j P8j 3-2 compressor
c2 j c3j-1 c2j-1
c1 j c1j-1
3-2 compressor
c5 j c6 j c5j-1
3-2 compressor
c4 j c4j-1 c6j-1
increases speed of summing by increased parallelism all bits of PP in each column are added independently and simultaneously x-2 compressor composed of CSAs; x := the number of PPs in column
Carry[j]
9-2 Compressor
DMY
Multiplier Design
limited opcode size made NOP instruction ADD $0, $0, $0 => freed one opcode ADD instruction doesnt change register $0 (constant zero value) latency v. simplicity multiplier lies in critical path; must calculate product in one cycle algorithms trade simplicity of control and/or wiring for faster speed multiplier latency not detrimental if n is small enough => 8x8 multiplier negative and positive integer multiplication 8 LSB of 16-bit operand taken as a twos complement number sign detection unit detects signs operands and sets product sign
DMY
Pipeline Modifications
EPC register tracks the problematic instruction EPC_2 register to hold the instruction to return to, if allowed Expansion of control unit to detect overflow signal and handle exception
IF/ID ID/EX EX/MEM MEM/WB
+
Control Unit
EPC
Overflow PC Instruction Memory Registers ALU Subrt Addr Sign Exd Memory
Clk EPC 2
Data Input
DMY
Software Support
Assurance that MEM and WB stages of pipeline continue execution
Instruction continues to MEM stage
NO
YES
DMY
Software Support
Assurance that MEM and WB stages of pipeline continue execution Interruption of program
Instruction continues to MEM stage
NO
YES
DMY
Software Support
Assurance that MEM and WB stages of pipeline continue execution Interruption of program Request to involve the operating system
Instruction continues to MEM stage
NO
YES
DMY
Software Support
Assurance that MEM and WB stages of pipeline continue execution Interruption of program Request to involve the operating system Enhancement of ISA MFCO - move from coprocessor JR - jump to address stored in reserved register
Instruction continues to MEM stage
NO
YES
DMY
Overflow Example
Instruction stored at address 103: 32 + 65527= 65559
Clock Op A Op B ALU Out 32 65527 xx 0 0 23 0 Clock -------------------------
Note:
xx xx xx xx xx xx xx
Op A
-------------------------
Overflow
PC
49183 00
104 11
105 00
PC Jump
DMY
Conclusion
16-bit processor, enhanced with a multiplier and able to detect arithmetic overflow Harvard Architecture model for memory management 14 multipurpose, 2 reserved registers Advantages and disadvantages of designed 16-bit ISA
DMY
References
Boerger, Egon. Architecture Design and Validation Methods. New York Springer, 2000. Carpinelli, John D. Computer Systems Organization and Architecture. Boston: Addison-Wesley, 2001. Cohen, Ben. VHDL Coding Styles and Methodologies. Boston: Kluwer Academic Publishers, 1999. Dahan, David. 17x17-Bit, High-Performance, Fully Synthesizable Multiplier. Technology Licensing Division DSP Group Inc. Ercegovac, Milos D., Thomas Lang, and Jaime H. Moreno. Introduction to Digital Systems. New York: John Wiley & Sons, Inc., 1999. Hennessy, John L. and David A. Patterson. Computer Organization and Design. 2nd ed. San Francisco: Morgan Kaufmann Publishers Inc., 1997.
High Speed Parallel Multiplier For LEON Processor Algorithm. Lab #5: Implementation of a Multiplier. EE116L course, UCLA. Nahata, Sunny and Rohit Madampath. 8 by 8 bit High Speed Multiplier Design Using (4,2) Counters. 2002. Smith, James E. The Microarchitecture of Superscalar Processors. New York: Madison, 1995. Stalling, William. Computer Organization and Architecture. 6th ed. Upper Saddle River:
Prentice Hall, 2003. Sweetman, Dominic. See MIPS Run. San Francisco: Morgan Kaufmann Publishers Inc., 1999. Tamir, Yuval. Computer Systems Architecture Notes. UCLA. Yeh, Wen-Chang and Chein-Wei Jen. High-Speed Booth Encoded Parallel Multiplier Design. IEEE Transactions on Computers, Vol. 49, No. 7. July 2000.