Académique Documents
Professionnel Documents
Culture Documents
Course Objectives
To evaluate the issues involved in choosing and designing instruction set. To learn concepts behind advanced pipelining techniques. To understand the hitting the memory wall problem and the current state-of-art in memory system design. To understand the qualitative and quantitative tradeoffs in the design of modern computer systems
2
DRAM
Memory Hierarchy
L2 Cache
L1 Cache
VLSI
Instruction Set Architecture
Interconnection Network
Network Interfaces
Topologies, Routing, Bandwidth, Latency, Reliability
Processor-Memory-Switch
Architecture is an iterative process: Searching the space of possible designs At all levels of computer systems
Analysis
Creativity
Cost / Performance Analysis
Good Ideas
Bad Ideas
Mediocre Ideas
6
OS requirements
Address space issues, memory management, protection
Conformance to Standards
Languages, OS, Networks, I/O, IEEE floating pt.
2002
Powerful PCs and SMP Workstations Network of SMP Workstations Mainframes Supercomputers Embedded Computers
10
11
BS (4 yrs)
MS (2 yrs) PhD (5 yrs)
12
Cost of Microprocessors
12
13
Defects_per_unit_area * Die_Area
14 DAP.S98 1
15
Technology Trends
Implement Next Generation System Simulate New Designs and Organizations
16
Workloads
BAD/Sud Concodre
3 hours
1350 mph
132
178,200
Speed of Concorde vs. Boeing 747 Throughput of Boeing 747 vs. Concorde Cost is also an important parameter in the equation which is why concordes are being put to pasture! 18
Measurement Tools
Benchmarks, Traces, Mixes Hardware: Cost, delay, area, power estimation Simulation (many levels)
ISA, RT, Gate, Circuit
Queuing Theory Rules of Thumb Fundamental Laws/Principles Understanding the limitations of any measurement tool is crucial.
19
Metrics of Performance
Application Programming Language Compiler
ISA
(millions) of Instructions per second: MIPS (millions) of (FP) operations per second: MFLOP/s
Megabytes per second Cycles per second (clock rate)
20
I/O level
Spec92 spreadsheet program (sp) Companies noticed that the produced output was always out put to a file (so they stored the results in a memory buffer) and then expunged at the end (which was not measured). 21 One company eliminated the I/O all together.
After putting in a blazing performance on the benchmark test, Sun issued a glowing press release claiming that it had outperformed Windows NT systems on the test. Pendragon president Ivan Phillips cried foul, saying the results weren't representative of real-world Java performance and that Sun had gone so far as to duplicate the test's code within Sun's Just-In-Time compiler. That's cheating, says Phillips, who claims that benchmark tests and real-world applications aren't the same thing. Did Sun issue a denial or a mea culpa? Initially, Sun neither denied optimizing for the benchmark test nor apologized for it. "If the test results are not representative of real-world Java applications, then that's a problem with the benchmark," Sun's Brian Croll said. After taking a beating in the press, though, Sun retreated and issued an apology for the optimization.[Excerpted from PC Online22 1997]
benchmarks useful for 3 years Single flag setting for all programs: SPECint_base95, SPECfp_base95 SPEC CPU2000 (11 integer benchmarks CINT2000, and 14 floating-point benchmarks CFP2000
24
25
26
Performance Evaluation
For better or worse, benchmarks shape a field Good products created when have:
Good benchmarks Good ways to summarize performance
Given sales is a function in part of performance relative to competition, investment in improving product as reported by performance summary If benchmarks/summary inadequate, then choose between improving product for real programs vs. improving product to get more sales; Sales almost always wins! Execution time is the measure of computer performance!
29
Simulations
When are simulations useful?
What are its limitations, I.e. what real world phenomenon does it not account for?
The larger the simulation trace, the less tractable the post-processing analysis.
30
Queueing Theory
What are the distributions of arrival rates and values for other parameters?
Are they realistic?
Amdahl's Law
Speedup due to enhancement E:
ExTime w/o E Speedup(E) = ------------ExTime w/ E =
Suppose that enhancement E accelerates a fraction F of the task by a factor S, and the remainder of the task is unaffected
33
Amdahls Law
ExTimenew = ExTimeold x (1 - Fractionenhanced) + Fractionenhanced
Speedupenhanced ExTimeold ExTimenew 1 = (1 - Fractionenhanced) + Fractionenhanced Speedupenhanced
Speedupoverall =
34
Amdahls Law
Floating point instructions improved to run 2X; but only 10% of actual instructions are FP
ExTimenew =
Speedupoverall =
35
Program
Program
Instruction
Cycle
Program
Clock Rate
Compiler
Inst. Set.
X
X
(X)
X
Organization
Technology
X
X
36
CPI
i =1
* I i
Instruction Frequency
n
CPI =
CPI
i =1
*i F
where F
= i
Instruction Count
38
Chapter Summary, #1
Designing to Last through Trends
Capacity Logic DRAM Disk 2x in 3 years 4x in 3 years 4x in 3 years Speed 2x in 3 years 2x in 10 years 2x in 10 years
6yrs to graduate => 16X CPU speed, DRAM/Disk size Execution time, response time, latency
Chapter Summary, #2
Amdahls Law:
Speedupoverall = ExTimeold ExTimenew = 1 (1 - Fractionenhanced) + Fractionenhanced Speedupenhanced
= Instructions x Program Cycles x Seconds Cycle Program Instruction
CPI Law:
CPU time = Seconds
Execution time is the REAL measure of computer performance! Good products created when have:
area4
40
43