Académique Documents
Professionnel Documents
Culture Documents
QUALIFYING EXAM
SPRING 1997
EXAMINEE____________
Number only
COMPUTER ORGANIZATION AND ARCHITECTURE
B. (6 points) Suppose we wanted to pipeline this CPU, with a goal of starting a new instruction
every clock cycle (with no changes in the length of the clock cycle). Ignore for the moment any
conflicts or problems that could occur in this pipeline. Pipelining is normally accomplished by
inserting registers between stages of combinational logic. Is there anywhere in the datapath of
Figure 5-4 that we would have to insert registers to make pipelining possible?
If we could accomplish our goal of starting a new instruction every clock cycle, what would be the
speedup (for the same one thousand instructions) of this pipelined CPU over the non-pipelined
CPU? Does the fact that different instructions require varying numbers of cycles to execute affect
your answer?
SPRING 1997
EXAMINEE____________
Number only
COMPUTER ORGANIZATION AND ARCHITECTURE
C. (6 points) We know that the difficulty of pipelining is overcoming hardware conflicts and data
dependencies. A hardware conflict occurs when two pipelined operations both need to use (i.e.,
write to) the same piece of hardware at the same time. A space-time diagram is a convenient way
to illustrate the use of hardware by a series of pipelined instructions. Below is a space-time
diagram for the CPU, with one row per piece of hardware (ignore, for the time being, the flipflops and the control logic and decoders).
Time =
Memory
AR
PC
DR
AC
IR
TR
OUTR
INPR
SC
Common Bus
ALU
Suppose a STA instruction starts execution at time t=1, followed by an ISZ instruction which starts
execution at time t=2. (Assume both instructions use direct addressing mode.) Show the
hardware they use (i.e., write to) during the fetch, decode, and execute phases on the space-time
diagram. For instance, if instruction Q writes to the Instruction Register on cycle 3, place a Q in
column 3 of the row labeled IR. Then identify (by circling) any hardware conflicts that appear on
your diagram.
You may find that there is a hardware conflict for the Common Bus. How could you prevent
conflicts for the Common Bus? Does your solution reduce the pipelined CPU's performance?
D. (6 points) A control dependency occurs when a branch instruction updates the Program
Counter, altering the address of the next instruction to fetch. One way to deal with branches is to
insert stalls, or No-Op instructions, in the instruction stream until the branch is finished executing.
For this CPU, how many stalls must be inserted for each of the three types of branch instructions?
SPRING 1997
EXAMINEE____________
Number only
COMPUTER ORGANIZATION AND ARCHITECTURE
You were previously given the frequency of each type of instruction, including the branch
instructions. Based upon this instruction frequency, and using stalls to resolve branching
dependencies, how many cycles does it take to execute the same 1000 instructions on this revised
version of the pipelined CPU? For this question, ignore the possibility of hardware conflicts or
other types of dependencies.
E. (6 points) The table below shows the consecutive, pipelined execution of an AND instruction
followed by a STA instruction. Assume that
the CPU has been modified so that conflicts for the Common Bus will not occur
the control unit has been modified to support pipelining. Therefore, operations
involving the control unit are not shown in this table.
interrupts are never enabled
For these two instructions, indicate the remaining unpreserved data dependencies and hardware
conflicts by drawing arrows. (For instance, if the AND and STA instructions both try to write to
the same register at the same time, draw an arrow showing this conflict. Or, if the AND
instruction updates a register after the register contents have already been read by the STA
instruction, that is an unpreserved data dependency.)
cycle
t
t+1
t+2
t+3
t+4
t+5
AND Instruction
AR PC
PC PC+1
AR IR(0-11)
AR M[AR]
DR M[AR]
AC AC DR
STA Instruction
AR PC
PC PC+1
AR IR(0-11)
AR M[AR]
M[AR] AC
If there are any remaining conflicts / unpreserved dependencies, how could you fix them? Why do
you pick this solution?
SPRING 1997
EXAMINEE____________
Number only
COMPUTER ORGANIZATION AND ARCHITECTURE
(b) (3 pts) Compute execution time for a triangular sub-array when k=6 and N=16.
(c) (3 pts) What is the efficiency of the pipeline for a triangular sub-array when k=6 and N=16?
SPRING 1997
EXAMINEE____________
Number only
COMPUTER ORGANIZATION AND ARCHITECTURE
B. For this problem we have an omega network with N=2n input ports and output ports.
(a) (2 pts) How many permutations, or mappings, of input ports to output ports are provided by
the network?
(b) (2 pts) Suppose one input to output path is established. How many permutations of the
remaining N1 input ports to N1 output ports are possible?
(c) (2 pts) Suppose the network failed such that all the switches in the next-to-the-leftmost stage
were stuck in the exchange (crossed) setting. How many permutations are provided by this faulty
network?
(d) (2 pts) Again suppose the network failed such that all the switches in the next-to-the-leftmost
stage were stuck in the exchange (crossed) setting. What do we know about the destination
addresses that can be reached from a given input port? That is, specify or describe the set of output
port addresses that are reachable from input port address (an-1, an-2, ..., a1, a0).
SPRING 1997
EXAMINEE____________
Number only
COMPUTER ORGANIZATION AND ARCHITECTURE
C. One performance model described in both the Stone and Shiva texts for a system of two
processors with un-overlapped communication gives the equation for execution time as:
T = R*max(k, M-k) + C(k)(M-k)
where T represents the total execution time,
R represents individual task execution time,
C represents task communication overhead in units of time,
M represents the total number of tasks to process, and
k represents the number of tasks assigned to one of the processors.
(a) (5 pts) Develop an equation for the value of k in the range 0 k (M/2) where execution time
is the greatest. Note greatest, not least.
(b) (3 pts) Let R=8, C=1, and M=40. What is the worst case value of k and the total execution time
with this task distribution?
SPRING 1997
EXAMINEE____________
Number only
COMPUTER ORGANIZATION AND ARCHITECTURE
SPRING 1997
EXAMINEE____________
Number only
COMPUTER ORGANIZATION AND ARCHITECTURE
Write-validate
Write-around
Write-invalidate
Write-through
Write-back
(b) (3 points) Are these write-miss strategies equally compatible with associative caches?
SPRING 1997
EXAMINEE____________
Number only
COMPUTER ORGANIZATION AND ARCHITECTURE
(c) (6 points each, 12 total) For each of the following programming fragments, tell which strategy
or strategies would perform best (fewest main-memory references) and which would perform
worst (most main-memory references). Then write two or three sentences explaining your
reasoning.
Assumptions: Memory is word addressable, and blocks contain 32 words. Memory is not
interleaved, so caching or writing an n word block takes n times as long as one memory
reference. The cache is large and empty enough to hold all blocks that are fetched (so you
dont have to worry about write-backs of earlier information in the cache). For the write-hit
policy, assume write-back where feasible, otherwise write-through. Where write-back is
used, include the references needed to write the line back to main memory.
(i)
for i := 0 to N 1 do
begin
b [i ] := 0;
for j := 0 to N 1 do
b [i ] := b [i ] + a [i, j ];
end;
(ii)
i := 0; b [0] := 7;
for i := 1 to 100000 do
b [i ] := b [i 1];
SPRING 1997
EXAMINEE____________
Number only
COMPUTER ORGANIZATION AND ARCHITECTURE
(d) (4 points) How would interleaved memory change the answers to part (d) above?
(e) (4 points) Write-validate has been called the worst performing of the policies, because it
fetches a cache line on every cache miss, whereas the other policies fetch a line on only some cache
misses. Comment on the validity of this observation.