Comp Arch Course

Scribd UploadaDocument Explore
assume ouaredesigningasmallmipsprocessor. ou anttoincludeasmalll1cachethathas16ro s.eachro contains SearchDocuments
Doc men
BooksFiction BooksNonfiction Health&Medicine Brochures/Catalogs GovernmentDocs HowToGuides/Manuals Magazines/Newspapers Recipes/Menus SchoolWork +allcategories Featured Recent
People
Authors Students Researchers Publishers Government&Nonprofits Businesses Musicians Artists&Designers Teachers +allcategories MostFollowed Popular SignUp | LogIn
C4 = G3 + (P3G2) + (P3P2G1) + (P3P2P1G0) + (P3P2P1P0C0) For step 4: 4. Carry-in for individual bits - Formulas same as in 3, but p , g are for individual bits. The circuits are same as in 3. For step 5 s =a b cn The 1-bit adders compute bits of sum using outputs of 4.
c)
Compute time to obtain carry out of the MSB and all bits of the sum.
[10marks]
1. If a program currently takes 100 seconds to execute and loads and stores account for 20% of the execution times, how long will the program take if loads and stores are made 30% faster? For this, you can use Ahmdahl's law or you can reason it out step by step. Doing it step by step gives (1) Before the improvement loads take 20 seconds (2) If loads and stores are made 30 percent faster they will take 20/1.3 = 15.385 seconds, which corresponds to 4.615 seconds less. (3) Thus, the final program will take 100 - 4.615 = 95.38 Note: In step 2, the performance improves by 30 percent or performance_new = performance_old x 1.3 since ex_time = 1/performance This gives 1/ex_time_new = 1.3/ex_time_old ex_time_new = ex_time_old/1.3
Using Amdahl's law gives EX_TIME_NEW = EX_OLD*(1 - FRAC_EN + FRAC_EN/SPEEDEP_EN) For this problem EX_TIME_NEW = 100*(1 - 0.2 + .2/1.3) = 95.38 What is the maximum speedup that can be achieved by improving the
What is the maximum speedup that can be achieved by improving the performance of loads and stores? The maximum speedup is achieved if loads and stores take no time at all. In this case, the program runs in 80 seconds and the overall speedup is SPEEDUP = EX_TIME_OLD/EX_TIME_NEW = 100/80 = 1.25 2. Suppose each stage of the instruction takes the following times: IF = 7 ns, ID = 8 ns, EX = 15 ns, MA = 10 ns, WB = 8 ns and there is a 2 ns overhead for pipelining. What is the cycle time for a single cycle, multiple cycle, and pipelined processor? How long will each take to implement 4 add instructions assuming no hazards occur? The cycle times are: Single cycle: 7 + 8 + 15 + 10 + 8 = 48 ns (this is sum of the stages) Multiple cycle: 15 ns (this is the time for the longest stage) Pipelined cycle: 17 ns (the longest stage + pipeline overhead)
The execution times for add instructions are (IC = instruction count) Single cycle: 48 x 4 = 192 ns (cycle time x IC) Multiple cycle: 15 x 4 x 4 = 240 ns (cycle time x CPI x IC) The add instruction only requires four stages for the multiple cycle datapath. Pipelined: 17 x 8 (cycle time x (no. stages + IC - 1) 2. What is the execution time for the pipelined processor shown in Figure 6.19 on page 386 to implement the following three instructions. add $1, $2, $3 sub $3, $4, $1 add $2, $3, $3 Show the pipeline diagram for these instructions. Pipeline diagram: Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 ------ ------ ------ ------ -----add | IF | | ID | | EX | | MEM| | WB | ------ ------ ------ ------ ----------- ------ ------ ------ -----stall | IF | | BL | | BL | | BL | | BL | ------ ------ ------ ------ ----------- ------ ------ ------ -----stall | IF | | BL | | BL | | BL | | BL |
stall
stall
sub
| IF | | BL | | BL | | BL | | BL | ------ ------ ------ ------ ----------- ------ ------ ------ -----| IF | | BL | | BL | | BL | | BL | ------ ------ ------ ------ ----------- ------ ------ ------ -----| IF | | ID | | EX | | MEM| | WB | ------ ------ ------ ------ -----Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle10 Cycle11 Cycle12 Cycle13 ------ ------ ------ ------ -----| IF | | BL | | BL | | BL | | BL | ------ ------ ------ ------ ----------- ------ ------ ------ -----| IF | | BL | | BL | | BL | | BL | ------ ------ ------ ------ ----------- ------ ------ ------ -----| IF | | BL | | BL | | BL | | BL | ------ ------ ------ ------ ----------- ------ ------ ------ -----| IF | | ID | | EX | | MEM| | WB | ------ ------ ------ ------ ------
stall
stall
stall
add
Note: BL stands for bubble. Execution time: (17 ns/cycle x 13 cycles) = 221 ns. 3. Repeat problem 2, assuming that data forwarding is used. With forwarding, none of the stalls occur, because data is forwarded from the the execution stage of one instruction to the execution stage of another instruction.
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7
------ ------ ------ ------ -----add | IF | | ID | | EX | | MEM| | WB | ------ ------ ------ ------ -----\ ------ ------ ------ ------ -----sub | IF | | ID | | EX | | MEM| | WB | ------ ------ ------ ------ -----\ ------ ------ ------ ------ -----add | IF | | ID | | EX | | MEM| | WB | ------ ------ ------ ------ ------
Execution time: (17 ns x 7 cycles) = 119 ns. Note: This assumes no overhead in cycle time occurs due to forwarding.
4. How many total bits are required for a directed mapped cache with four byte blocks, if the cache contains 64 Kbytes of data and the virtual address is 32 bits? Assume that byte addressing is used. With four byte blocks, the 64 Kbyte cache contains a total of 64K/4 = 16K blocks. The cache has the following parameters (a) Byte select size: 214 bits (since 2^14 = 16K blocks) (b) Cache index size: bits (since 2^2 = 4 bytes/block) (c) Cache tag size: 16 bits (remaining bits from 32) (d) Block size: 32 bits (4 bytes) (e) Number of blocks: 16K blocks (cache size/block size) Assuming that the cache has a valid bit, a cache tag, and data for each block, the size of the cache is: cache bits = number+ 16 + 1) x (block size + tag size + 1) = 2^14 x (32 of blocks = 802,816 bits How many bits are required if the block size is 32 bytes? With 32 byte blocks, the 64 Kbyte cache contains a total of 64K/32 = 2K blocks. The cache has the following parameters (a) Byte select size: 5 bits (since 2^5 = 32 bytes/block) (b) Cache index size: 11 bits (since 2^14 = 2K blocks) (c) Cache tag size: 16 bits (remaining bits from 32) (d) Block size: 256 bits (32 bytes) (e) Number of blocks: 2K blocks (cache size/block size) cache bits = number of blocks x (block size + tag size + 1) = 2^11 x (256 + 16 + 1) = 559,104 bits Note: Increasing the block size tends to decrease the number of bits needed to implement the cache for that contains a given number of bytes of data. 5. Answer true or false for each of the following: (a) Most computers use direct mapped page tables. (F) (b) Increasing the block size of a cache is likely to take advantage of temporal locality. (F) (c) Increasing the page size tends to decrease the size of the page table. (T) (d) Virtual memory typically uses a write-back strategy, rather than
a write-through strategy. (T) 6. The average memory access time (AMAT) is defined as AMAT = hit_time machine 100 miss_penalty Find the AMAT of a + miss_rate x MHz machine, with a miss penalty of 20 cycle, a hit time of 2 cycles, and a miss rate of 5%. Since the clock rate is 100 MHz, the the cycle time is 1/(100 MHz) = 10 ns which gives AMAT = 10 ns x (2 + 20 x 0.05) = 30 ns Note: Here we needed to multiply by the cycle time because the hit_time and miss_penalty were given in cycles. Suppose doubling the size of the cache decrease the miss rate to 3%, but causes the hit time to increases to 3 cycles and the miss penalty to increase to 21 cycles. What is the AMT of the new machine? For the new machine: AMAT = 10 ns x (3 + 21 x 0.03) = 36.3 ns Which cache would you select? 7. Show the design of a 2-way set associative translation look-aside buffer with 32 entries. Assume the virtual address is 32 bits, the page offset is 12 bits, and the physical address is 20 bits. How many bits are required to implement the TLB if each table entry has a physical page number, a tag, a valid bit, and three access bits. Since the page offset is 12 bits, the remaining 20 bits from the 32 bit address make up the virtual page number. The TLB has the following parameters: (1) Number of sets: 16 sets (Since (no. of entries)/(entries/set = 32/2) (2) TLB index size: 4 bits (Since 2^4 = 16 sets) (3) TLB tag size: 16 bits (Remaining bits in virtual address 32 - 12 - 4) (4) Block size: 8 bits (remaining bits in physical address 20 - 12) (5) # of blocks: 32 blocks (number of TLB entries) TLB bits = number of blocks x (block size + tag size + 4) = 32 x (8 + 16 + 4) = 896 bits 8. Modify the single cycle datapath shown in Figure 5.33 on page 307 so that it implements the jal instruction. Add any control signals and datapath elements that are needed. (a) Change the mux controlled by RegDest to a 3-input mux, with 31 as the third input. (b) Change the mux controlled by MemtoReg to a 3-input mux, using the PC as the third input.
9. Show how the table in Figure 5.23 on page 296 would need to be
9. Show how the table in Figure 5.23 on page 296 would need to be modified to implement the jal instruction. To add this instruction RegDst and MemtoReg both should be increased to two bits and an additional line should be added for jal. After these changes the table looks like:
Instruction Control Bits | r-type | lw | sw | beq | jump | jal -------------------------------------------------------------------------RegDst | 01 | 00 | xx | xx | xx | 10 | ALUSrc | 0 | 1 | 1 | 0 | x | xx | MemtoReg | 00 | 01 | xx | xx | xx | 10 | RegWrite | 1 | 1 | 0 | 0 | 0 | 1 | MemRead | 0 | 1 | 0 | 0 | 0 | 0 | MemWrite | 0 | 0 | 1 | 0 | 0 | 0 | Branch | 0 | 0 | 0 | 1 | 0 | 0 | ALUOp[1] | 1 | 0 | 0 | 0 | x | x | ALUOp[0] | 0 | 0 | 0 | 1 | x | x | Jump | 0 | 0 | 0 | 0 | 1 | 1 | -------------------------------------------------------------------------Note: This table is a sideways view of Figure 5.23, in order to fit the entire figure on the page. Also, the jump instruction has been included. 10. Repeat Problem 8 for the multiple cycle datapath shown in Figure 5.39 on page 323. One solution for this is (a) Change the mux controlled by RegDest to a 3-input mux, with 31 as the third input. (b) Change the mux controlled by MemtoReg to a 3-input mux, using the PC as the third input.
Note: You should also be able to show the RTL, control signals, and finite state machine diagram for a given instruction on the multiple cycle datapath.
*************************** You have only 75 minutes to complete the exam, so budget your time so that you will be able to attempt all sections (8 problem sections on 12 pages).
1 Short Answer Questions [5 points each] 1. What are the main advantages of a single-cycle implementation over a multi-cycle implementation?
implementation? Circle all that apply. (a) Simpler control (b) Less hardware (c) Lower CPI (for a given program) (d) More hardware sharing (e) Fewer here is a. 1 The only answer data hazardspoint was taken of for incorrectly circling b, c, d, or e. 1 point was taken of for not circling a. 2. What are the main functions of Cause and EPC, respectively in exception handling in MIPS? Cause: - A register used to record the cause of the exception (2 pt) PC: A register used to hold the address of the instruction which caused the exception. (3 pt) 3. Suppose a branch is taken on average once out of 10 times. Give the number of mispredictions for both (1) a single bit dynamic branch prediction scheme and (2) for an \assume branch nottaken" static branch prediction scheme. (1) 2 out of 10 (2) 1 out of 10
4. Suppose we have a 6 stage pipelining. When we run a program of 100 lines of instructions, suppose there have been 20 instructions which have 1 stall cycle. What would be the CPI in this execution? Be sure to count the exact number of clock cycles used. xecution time = 100 + (6 - 1) + 20 = 125cycles CPI = 125/100 = 1.25 5. What is a structural hazard? Define it using one sentence. Briefly, how can we resolve a structural hazard? A structural hazard is a resource conflict... more than one instruction tries to use the same resource. Solution: add more resources or serialize the computation.
2 Single Cycle Implementation (20 points) [10 points each] 1. Considering the following single-cycle datapath shown below, draw paths to show the ow of both data and the PC (thereby determining the control settings) for the sw instruction. Please be careful not to make any extra marks on the diagrams. Also determine all the values of control signals. Assume that all the signals are active high. In other words, when they are asserted, they have value of `1'. Use `x' to indicate \don't care" condition. When a signal is don't care, then you should say it as such, otherwise it may be marked wrong.
Signal Name RegDst
Effect when deasserted (0) The register destination number of
Effect when asserted (1) The register destination number of the
RegDst
The register destination number of the write register comes from the bits [20-16] None Select the second input Select the second input None None
The register destination number of the write register comes from the bits [1511] The register on the Write register is written Select the first input Select the first input Data memory contents designated by the addr are put on the Read data. Data memory contents designated by the addr are replaced by the value on the Write data. Select the first data The instruction is a branch.
RegWrite ALUSrc PCSrc MemRead MemWrite
MemToReg Branch sw $8, 1234($7)
Select the second data The instruction is not a branch
Pipelining
http://6004.csail.mit.edu/Fall01/
Problem 1. Consider the following combinational encryption device constructed from six modules:
The device takes an integer value, X, and computes an encrypted version C(X). In the diagram above, each combinational component is marked with its propagation delay in seconds contamination delays are zero for each component.
Comp e A chi ech

InfoandRa ing
waynejonesjnr
Like 7 people
eQ e ion &Sol n
DownloadthisDocumentforFreePrintMobileCollectionsReportDocument Thisisaprivatedocument.
Follow
Sha e&Embed Rela edDoc men

PreviousNext
1. 31p.
40p.
40p.
2. 40p.
6p.
24p.
3. 4p.
55p.
55p.
4. 55p.
49p.
49p.
5. 49p.
100p.
705p.
6. 2p.
14p.
29p.
7. 1p.
Mo ef om hi e
PreviousNext
1. 412p.
5p.
3p.
2. 7p.
5p.
1p.
3. 4p.
6p.
19p.
4. 5p.
47p.
10p.
5. 3p.
6p.
1p.
6. 6p.
10p.
25p.
7. 1p.
18p.
1p.
8. 3p.
4p.
2p.
9. 59p.
Recen Readca e
AddaCommen
Submit
Characters:400 UploadaDocument
assume ouaredesigningasmallmipsprocessor. ou anttoincludeasmalll1cachethathas16ro s.eachro contains SearchDocuments
FollowUs! scribd.com/scribd twitter.com/scribd facebook.com/scribd About Press Blog Partners Scribd101 WebStuff Support FAQ Developers/API Jobs Terms Copyright Privacy Copyright 2012ScribdInc. Language: English

Comp Arch Course

Transféré par

Informations du document

Description originale:

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Comp Arch Course

Transféré par

Droits d'auteur :

Formats disponibles

Scribd UploadaDocument Explore

assume ouaredesigningasmallmipsprocessor. ou anttoincludeasmalll1cachethathas16ro s.eachro contains SearchDocuments

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7

Signal Name RegDst

Effect when deasserted (0) The register destination number of

Effect when asserted (1) The register destination number of the

RegWrite ALUSrc PCSrc MemRead MemWrite

MemToReg Branch sw $8, 1234($7)

Select the second data The instruction is not a branch

Comp e A chi ech

Sha e&Embed Rela edDoc men

assume ouaredesigningasmallmipsprocessor. ou anttoincludeasmalll1cachethathas16ro s.eachro contains SearchDocuments

Vous aimerez peut-être aussi