Vous êtes sur la page 1sur 9


Write a detailed comparison between single cycle, multi cycle and pipeline computer
architecture on the basis of five instruction types ,load type , store type, r - type, branch
and jump type instruction.


Single Cycle CPU Architecture:

A single cycle CPU executes each instruction in one cycle i.e. one cycle is needed to execute
any instruction so we can say that CPI is one. Each cycle requires some constant time i.e. same
amount of time is spent to execute every instruction. In order to ensure the correct operation of
the processor; slowest instruction must be able to compute execution in exactly one clock tick.
This is the major disadvantage of single cycle CPU although its very easy to implement.

We have basic five types of instruction i.e. r-type, load word, store word, branch and jump
instructions. If we implement these instructions using single cycle CPU then all the instructions
will utilize equal amount of time.
R-Type Instructions:

Register-to-register arithmetic instructions use the R-type format.

op is the instruction opcode, and func specifies a particular arithmetic
rs, rt and rd are source and destination registers.

Executing an R-type instruction

1. Read an instruction from the instruction memory.

2. The source registers, specified by instruction fields rs and rt, should be read from
the register file.
3. The ALU performs the desired operation.
4. Its result is stored in the destination register, which is specified by field rd of the
instruction word.
The data path for the R-type instruction is as follows:
Read Instruction I [25 -21] Read Read ALU
address [31-0] register 1 data 1 Zero
I [20 -16] Read 32 Result
Instruction register 2 Read
memory I [15 -11] data 2
register ALUOp
I-Type Instruction
The lw, sw and beq instructions all use the I-type encoding.
rt is the destination for lw, but a source for beq and sw.
address is a 16-bit signed constant.

Two Instruction examples are:

For an instruction like lw $t0, 4($sp), the base register $sp is added to the sign-
extended constant to get a data memory address.
This means the ALU must accept either a register operand for arithmetic instructions,
or a sign-extended immediate operand for lw and sw.
Well add a multiplexer, controlled by ALUSrc, to select either a register operand (0)
or a constant operand (1).
The data path for this instruction is as follows:

Store Word Instruction:

Store Word instruction is sw $s1,16($s2)
The ALUOp must be 010(add) to compute the effective address
Branch Instruction:
Fetch the instruction, like beq $s1, $s2, offset, from memory.
Read the source registers, $s1 and $s2, from the register file.
Compare the values by subtracting them in the ALU.
If the subtraction result is 0, the source operands were equal and the PC should be
loaded with the target address, PC + 4 + (offset x 4).
Otherwise the branch should not be taken, and the PC should just be incremented to PC
+ 4 to fetch the next instruction sequentially.

Jump Instruction:
The jump instruction uses the following Instruction format:

The jump instructions always start on an address that is a multiple of four (they are word-
aligned). So the low order two bits of a 32-bit instruction address are always "00". Shifting
the 26-bit target left two places results in a 28-bit word-aligned address (the low-order two
bits become "00".)

After the shift, we need to fill in the high-order four bits of the address. These four bits
come from the high-order four bits in the PC. These are concatenated to the high-order end
of the 28-bit address to form a 32-bit address. The data path in single cycle is as follows:

Multicycle CPU Architecture:

A multicycle data path is typically faster than a single-cycle data path because it employs a
faster clock. As a result the components take little time to settle as compared to single cycle
data path in which all instructions take place at the speed of the lowest instruction. The big
advantage of multicycle design is that we can use more or less cycles to execute each instruction
e.g. we can we can take five cycles to execute a load instruction, but we can take just three
cycles to execute a branch instruction. The big disadvantage of the multi-cycle design is
increased complexity. Control is now a finite state machine - before it was just combinational
Single memory unit (I and D), single ALU
Several temporary registers (IR, MDR, A, B, ALUOut)
Temporaries hold output value of element so the output value can be used on subsequent
Values needed by subsequent instruction stored in
programmer visible state (memory, RF)
The Multicycle data path is as follows:

Datapath with additional muxes, temporary registers, and new control signals
Most temporaries (except IR) are updated on every cycle, so no write control is
required (always write)

In the multi-cycle implementation, each instruction takes a few cycles. The number of cycles
is different for different instructions.

The first two steps are the same for all instructions. The last three steps are different for
different instructions.
Multi-cycle Steps

1. Instruction Fetch

Instruction fetch
IR = Memory[PC];
PC = PC + 4;
Send PC to memory as the address
Read instruction from memory
Write instruction into IR for use on next cycle
Increment PC by 4
Uses ALU in this first cycle
Set control signals to send PC and constant 4 to ALU
2. Instruction Decode

Decode the instruction concurrently with RF read

Optimistically read registers
Optimistically compute branch target
Well select the right answer on next cycle

Decode and Register File Read

A = Reg[IR[25-21]];
B = Reg[IR[20-16]];
ALUOut = PC + (sign-extend(IR[15-0]) << 2);

3. Execution
Operation varies based on instruction decode
Memory reference:
ALUOut = A + sign-extend(IR[15-0]);
Arithmetic-logical instruction: ALUOut = A op B;
Branch: if (A == B) PC = ALUOut;
Jump: PC = PC[31-28] || (IR[25-0] << 2)

4. Memory / Completion

Load/store accesses memory or arithmetic writes result to the register file

Memory reference: MDR = Memory[ALUOut]; (load)
Memory[ALUOut] = B; (store)
Arithmetic-logical instruction: Reg[IR[15-11]] = ALUOut;

5. Read completion
Finish a memory read by writing read value into the register file
Load operation: Reg[IR[20-16]] = MDR;

Multi Cycle Steps
Instructions always do the first two steps L
Branch can finish in the third step
Arithmetic-logical can finish in the fourth step
Stores can finish in the fourth step
Loads finish in the fifth step
Instruction Number of cycles
Branch / Jump 3
Arithmetic-logical 4
Stores 4
Loads 5

Pipelining is an implementation technique where multiple instructions are overlapped in

execution. The computer pipeline is divided in stages. Each stage completes a part of an
instruction in parallel. The stages are connected one to the next to form a pipe - instructions
enter at one end, progress through the stages, and exit at the other end.

Pipelining does not decrease the time for individual instruction execution. Instead, it increases
instruction throughput. The throughput of the instruction pipeline is determined by how often
an instruction exits the pipeline.

Because the pipe stages are hooked together, all the stages must be ready to proceed at the same
time. We call the time required to move an instruction one step further in the pipeline a
machine cycle . The length of the machine cycle is determined by the time required for the
slowest pipe stage.

The pipeline designer's goal is to balance the length of each pipeline stage . If the stages are
perfectly balanced, then the time per instruction on the pipelined machine is equal to

Time per instruction on nonpipelined machine

Number of pipe stages

In MIPS there would be four pipeline stages

Instruction Fetch
Fetch the instruction from memory
Decode and Operand Fetch
DEC Decode it and fetch operands from the
register file
Execute the instruction in the ALU
Write the result back in to a register Population of the pipeline at each clock cycle:
i1, i2, ... are successive instructions in the instruction
With an n-stage pipeline, after n-1 clock cycles, the pipeline will become full and an instruction
completes on every clock cycle, instead of every n clock cycles. This effectively speeds up the
processor by a factor of n.

A Pipelined Architecture:
The Clock cycle in pipeline stage time is limited by the slowest stage.

The data path for the pipelining stages will be as follows: