Hardw S97

Ph.D.
QUALIFYING EXAM
SPRING 1997
EXAMINEE____________
Number only
COMPUTER ORGANIZATION AND ARCHITECTURE
1. Pipelining Instruction Execution (30 points total)

For all questions, justify your answer.
Chapter 5 of Mano presents a simple instruction set, and a CPU that executes that instruction set.
The complete specification of this instruction set, and the hardware on which it executes, is
attached to this question.
A. (6 points) The CPU presented in Chapter 5 is *not* pipelined. Assume that in the execution of
a typical program, the seven memory reference instructions occur 70% of the time, and other
types of instructions occur 30% of the time. Assume that the different memory instructions all
occur with equal probability. How many clock cycles will it take to execute 1000 instructions?
State any other assumptions you must make.
B. (6 points) Suppose we wanted to pipeline this CPU, with a goal of starting a new instruction
every clock cycle (with no changes in the length of the clock cycle). Ignore for the moment any
conflicts or problems that could occur in this pipeline. Pipelining is normally accomplished by
inserting registers between stages of combinational logic. Is there anywhere in the datapath of
Figure 5-4 that we would have to insert registers to make pipelining possible?
If we could accomplish our goal of starting a new instruction every clock cycle, what would be the
speedup (for the same one thousand instructions) of this pipelined CPU over the non-pipelined
CPU? Does the fact that different instructions require varying numbers of cycles to execute affect
your answer?
Ph.D. QUALIFYING EXAM
SPRING 1997
Number only
C. (6 points) We know that the difficulty of pipelining is overcoming hardware conflicts and data
dependencies. A hardware conflict occurs when two pipelined operations both need to use (i.e.,
write to) the same piece of hardware at the same time. A space-time diagram is a convenient way
to illustrate the use of hardware by a series of pipelined instructions. Below is a space-time
diagram for the CPU, with one row per piece of hardware (ignore, for the time being, the flipflops and the control logic and decoders).
Time =
Memory
AR
PC
DR
AC
IR
TR
OUTR
INPR
SC
Common Bus
ALU
Suppose a STA instruction starts execution at time t=1, followed by an ISZ instruction which starts
execution at time t=2. (Assume both instructions use direct addressing mode.) Show the
hardware they use (i.e., write to) during the fetch, decode, and execute phases on the space-time
diagram. For instance, if instruction Q writes to the Instruction Register on cycle 3, place a Q in
column 3 of the row labeled IR. Then identify (by circling) any hardware conflicts that appear on
your diagram.
You may find that there is a hardware conflict for the Common Bus. How could you prevent
conflicts for the Common Bus? Does your solution reduce the pipelined CPU's performance?
D. (6 points) A control dependency occurs when a branch instruction updates the Program
Counter, altering the address of the next instruction to fetch. One way to deal with branches is to
insert stalls, or No-Op instructions, in the instruction stream until the branch is finished executing.
For this CPU, how many stalls must be inserted for each of the three types of branch instructions?
SPRING 1997
Number only
You were previously given the frequency of each type of instruction, including the branch
instructions. Based upon this instruction frequency, and using stalls to resolve branching
dependencies, how many cycles does it take to execute the same 1000 instructions on this revised
version of the pipelined CPU? For this question, ignore the possibility of hardware conflicts or
other types of dependencies.
E. (6 points) The table below shows the consecutive, pipelined execution of an AND instruction
followed by a STA instruction. Assume that
the CPU has been modified so that conflicts for the Common Bus will not occur
the control unit has been modified to support pipelining. Therefore, operations
involving the control unit are not shown in this table.
interrupts are never enabled
For these two instructions, indicate the remaining unpreserved data dependencies and hardware
conflicts by drawing arrows. (For instance, if the AND and STA instructions both try to write to
the same register at the same time, draw an arrow showing this conflict. Or, if the AND
instruction updates a register after the register contents have already been read by the STA
instruction, that is an unpreserved data dependency.)
cycle
t
t+1
t+2
t+3
t+4
t+5
AND Instruction
AR PC
PC PC+1
AR IR(0-11)
AR M[AR]
DR M[AR]
AC AC DR
STA Instruction
AR PC
PC PC+1
AR IR(0-11)
AR M[AR]
M[AR] AC
If there are any remaining conflicts / unpreserved dependencies, how could you fix them? Why do
you pick this solution?
SPRING 1997
Number only
2. Various Topics in Computer Architecture (35 points total)

A. Given a linear, k-stage pipeline operating with a clock period of , consider processing a
triangular sub-array of data. That is, for processing in the row direction, the vector lengths could
vary from the maximum row dimension down to one element. Assume that data access does not
introduce any delay in vector operations on this data. However, each vector must be completely
processed before the next one can start. For example, if a vector operation completes in clock cycle
t, the next vector operation can begin in cycle t+1.
(a) (4 pts) Develop an execution time formula for processing such a triangular sub-array of data
involving a sequence of vectors that ranges from N elements down to one element.
(b) (3 pts) Compute execution time for a triangular sub-array when k=6 and N=16.
(c) (3 pts) What is the efficiency of the pipeline for a triangular sub-array when k=6 and N=16?
SPRING 1997
Number only
B. For this problem we have an omega network with N=2n input ports and output ports.
(a) (2 pts) How many permutations, or mappings, of input ports to output ports are provided by
the network?
(b) (2 pts) Suppose one input to output path is established. How many permutations of the
remaining N1 input ports to N1 output ports are possible?
(c) (2 pts) Suppose the network failed such that all the switches in the next-to-the-leftmost stage
were stuck in the exchange (crossed) setting. How many permutations are provided by this faulty
network?
(d) (2 pts) Again suppose the network failed such that all the switches in the next-to-the-leftmost
stage were stuck in the exchange (crossed) setting. What do we know about the destination
addresses that can be reached from a given input port? That is, specify or describe the set of output
port addresses that are reachable from input port address (an-1, an-2, ..., a1, a0).
SPRING 1997
Number only
C. One performance model described in both the Stone and Shiva texts for a system of two
processors with un-overlapped communication gives the equation for execution time as:
T = R*max(k, M-k) + C(k)(M-k)
where T represents the total execution time,
R represents individual task execution time,
C represents task communication overhead in units of time,
M represents the total number of tasks to process, and
k represents the number of tasks assigned to one of the processors.
(a) (5 pts) Develop an equation for the value of k in the range 0 k (M/2) where execution time
is the greatest. Note greatest, not least.
(b) (3 pts) Let R=8, C=1, and M=40. What is the worst case value of k and the total execution time
with this task distribution?
SPRING 1997
Number only
D. Suppose we consider the following architectural types: (1) SISD, or conventional

uniprocessors, (2) SIMD, or array processors, (3) MIMD, or multiprocessors, and (4) pipeline or
vector processors. They each tend to be well-suited for certain applications, but ill-suited for
others. Three applications are listed below. For each application, you are to select the one type of
architecture that you believe would be most appropriate and give reasons for your choice. Assume
each application is of sufficient size and complexity that performance is an issue.
(a) (3 pts) Compiling a large program
(b) (3 pts) Searching a large database
(c) (3 pts) Solving a large system of partial differential equations
SPRING 1997
Number only
3. Cache Memory Policies (35 points total)

When a computer system contains a cache, when should information be written to main memory?
We are familiar with the write-through and write-back policies. These control what happens on a
write hit. We can also consider what happens on a write miss.
At least four policies are possible for handling write misses:
Fetch-on-write: A cache line is fetched from main memory, and the write that missed is
directed to the newly allocated cache line.
Write-validate: A new cache line is allocated, and its previous contents are invalidated.
The write that missed is directed to the newly allocated cache line. Thus, after the write,
only one word in the newly allocated line is valid; the other words in the line are treated as
if they were empty.
Write-around: The write is directed to main memory. No cache line is allocated, and the
word is not stored in the cache.
Write-invalidate: The write is directed to the cache before the tag of the line is checked.
Thus, if it turns out that the write has missed, the previously allocated cache line is
corrupted. When this happens, the previous contents are invalidated, and only the newly
written word is valid in the newly allocated line.
(a) (12 points) Which of write-back or write-through is compatible with fetch-on-write, writevalidate, write-around, or write-invalidate? Place an in any square where two policies are
incompatible, and explain your answer.
Fetch-on-write
Write-validate
Write-around
Write-invalidate
Write-through
Write-back
(b) (3 points) Are these write-miss strategies equally compatible with associative caches?
SPRING 1997
Number only
(c) (6 points each, 12 total) For each of the following programming fragments, tell which strategy
or strategies would perform best (fewest main-memory references) and which would perform
worst (most main-memory references). Then write two or three sentences explaining your
reasoning.
Assumptions: Memory is word addressable, and blocks contain 32 words. Memory is not
interleaved, so caching or writing an n word block takes n times as long as one memory
reference. The cache is large and empty enough to hold all blocks that are fetched (so you
dont have to worry about write-backs of earlier information in the cache). For the write-hit
policy, assume write-back where feasible, otherwise write-through. Where write-back is
used, include the references needed to write the line back to main memory.
(i)
for i := 0 to N 1 do
begin
b [i ] := 0;
for j := 0 to N 1 do
b [i ] := b [i ] + a [i, j ];
end;
(ii)
i := 0; b [0] := 7;
for i := 1 to 100000 do
b [i ] := b [i 1];
SPRING 1997
Number only
(d) (4 points) How would interleaved memory change the answers to part (d) above?
(e) (4 points) Write-validate has been called the worst performing of the policies, because it
fetches a cache line on every cache miss, whereas the other policies fetch a line on only some cache
misses. Comment on the validity of this observation.

Hardw S97

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Hardw S97

Transféré par

Droits d'auteur :

Formats disponibles

Ph.D.

1. Pipelining Instruction Execution (30 points total)

Ph.D. QUALIFYING EXAM

Ph.D. QUALIFYING EXAM

Ph.D. QUALIFYING EXAM

2. Various Topics in Computer Architecture (35 points total)

Ph.D. QUALIFYING EXAM

Ph.D. QUALIFYING EXAM

Ph.D. QUALIFYING EXAM

D. Suppose we consider the following architectural types: (1) SISD, or conventional

(b) (3 pts) Searching a large database

(c) (3 pts) Solving a large system of partial differential equations

Ph.D. QUALIFYING EXAM

3. Cache Memory Policies (35 points total)

Ph.D. QUALIFYING EXAM

Ph.D. QUALIFYING EXAM

Vous aimerez peut-être aussi