Multicore Keynote Final

3/17/2011
Keynote
K t Add
Address:
Multicore, Meganonsense, and the
Future, if We Get It Right
Brought to you by:
1
3/17/2011
Multi-core, Mega-nonsense, and

What could be available before long
If we get it right
Yale Patt
The University of Texas at Austin
EE Times: Making Multicore Work for You

March 24, 2011
2
3/17/2011
Copyright 2011,
2011 Yale N.
N Patt
3
3/17/2011
The Conference: Making Multicore Work for You
At least three questions:
• What is Multicore?
• Who
Wh is
i You?
Y ?
• What is Work?
4
3/17/2011
Problem
Algorithm
Program
ISA (Instruction Set Arch)
Microarchitecture
Circuits
Electrons
5
3/17/2011
Problem
Algorithm
Program
Microarchitecture
Circuits
Electrons
6
3/17/2011
Problem
Algorithm
Program
Microarchitecture
Circuits
Electrons
7
3/17/2011
Problem
Algorithm
Program
Microarchitecture
Circuits
Electrons
8
3/17/2011
What I want to do today
• Multi-core
– How we got to where we are
– Some of the not unexpected Mega-nonsense
• Who are the users

– Those who want to cure cancer
– Those who just want to be productive
• How do we satisfy
y both
9
3/17/2011
The Conference: Making Multicore Work for You
• Who
Wh is
i You?
Y ?
• What is Work?
10
3/17/2011
Multi-core: How we got to where we are

(i.e., Moore’s Law)
• The first microprocessor (Intel 4004), 1971

– 2300 transistors
– 106 KHz
• The Pentium chip,

p 1992
– 3.1 million transistors
– 66 MHz
• Today
– more than one billion transistors
– Frequencies
F i ini excess off 5 GHz
GH
• Tomorrow ?
11
3/17/2011
How have we used the available transistors?
Cache
nsistors
Microprocessor
Number of Tran
Tim e
12
3/17/2011
Cache
nsistors
Microprocessor
Number of Tran
Tim e
13
3/17/2011
Cache
nsistors
Microprocessor
Number of Tran
Tim e
14
3/17/2011
Cache
nsistors
Microprocessor
Number of Tran
Tim e
15
3/17/2011
Cache
nsistors
Microprocessor
Number of Tran
Tim e
16
3/17/2011
Multi-core: How we got to where we are
• In the beginning: a better and better uniprocessor

– improving performance on the hard problems
– …until it just got too hard
• Followed by: a uniprocessor with a bigger L2 cache

– forsaking further improvement on the “hard” problems
– utilizing chip area sub-optimally
• Today: adding more and more cores

– dual core, quad core, octo core, Larrabee (32 cores)
– Why: It was the obvious most cost
cost-effective
effective next step
• Tomorrow: ???
17
3/17/2011
So, What’s the Point
• Yes,
Yes Multi-core
Multi core is a reality
– We will have 1000 core chips
• No, it wasn’t a technological solution to

performance improvement
• Ergo, we do not have to accept them as they are
• i.e., we can get it right the second time,

and that means:
What goes on the chip
What are the interfaces
18
3/17/2011
First we need to disabuse ourselves of a lot of

Mega-nonsense
• Multi-core was a solution to a performance problem

• Hardware works sequentially
• Make the hardware simple – thousands of cores
• ILP is dead
• Abstraction is a pure good
• All programmers need to be protected
• Thinking in parallel is hard
19
3/17/2011

Mega-nonsense

• ILP is dead
20
3/17/2011

Mega-nonsense

• ILP is dead
21
3/17/2011

Mega-nonsense

• ILP is dead
22
3/17/2011
The Asymmetric Chip Multiprocessor (ACMP)

Niagara Niagara Niagara Niagara Niagara Niagara
-like -like -like -like -like -like
Large
L Large
L core core core core Large
L core core
core core Niagara Niagara Niagara Niagara core Niagara Niagara

-like -like -like -like -like -like
core core core core core core
Niagara Niagara Niagara Niagara Niagara Niagara Niagara Niagara
-like -like -like -like -like -like -like -like
Large Large core core core core core core core core
core core Niagara Niagara Niagara Niagara Niagara Niagara Niagara Niagara
-like -like -like -like -like -like -like -like
core core core core core core core core
“Tile-Large” Approach “Niagara” Approach ACMP Approach
23
3/17/2011
Heterogeneous, not Homogeneous
• Since 2002 (The Robert Chien lecture at Illinois)

– Pentium X / Niagara Y
• More recently ACMP

– First published as a UT Tech Report in February, 2007
– Later: PhD dissertation of M. Aater Suleman
– In between:
• ISCA 2010 (Data Marshalling)
• ASPLOS (ACS – Handling critical sections)
24
3/17/2011
Large core vs. Small Core
Large Small
Core Core
• Out-of-order • In-order
• Wid fetch
Wide f t h e.g. 4-wide
4 id • Narrow Fetch e.g. 2-wide
• Deeper pipeline • Shallow pipeline
• Aggressive branch • Simple branch predictor
predictor (e.g. hybrid)
(e.g. Gshare)
• Many functional units
• Few functional units
• Trace cache
• Memory dependence
speculation
25
3/17/2011
Throughput vs. Serial Performance
9
Niagara
Speedup vs. 1 Large Core
8
7 Tile-Large
6 ACMP
5
4
3
2
1
0
0 0.2 0.4 0.6 0.8 1
Degree of Parallelism
26
3/17/2011

Mega-nonsense

• ILP is dead
27
3/17/2011

Mega-nonsense

• ILP is dead
28
3/17/2011
ILP is dead
• We double the number of transistors on the chip

– Pentium M: 77 Million transistors (50M for the L2 cache)
– 2nd Generation: 140 Million (110M for the L2 cache)
• We see 5% improvement in IPC
• Ergo: ILP is dead!
• Perhaps we have blamed the wrong culprit.
• The EV4,5,6,7,8 data: from EV4 to EV8:

– Performance improvement: 55X
– Performance from frequency: 7X
– Ergo: 55/7 > 7 -- more than half due to microarchitecture
29
3/17/2011
30
3/17/2011
Moore’s Law
• A law of physics
• A law of process technology
• A law of microarchitecture
• A law of psychology
31
3/17/2011

Mega-nonsense

• ILP is dead
32
3/17/2011
Can Abstractions lead to trouble?
• Taxi to the airport
• The Scheme chip
• Sorting
• Microsoft developers (Deeper understanding)
33
3/17/2011

Mega-nonsense

• ILP is dead
34
3/17/2011

Mega-nonsense

• ILP is dead
35
3/17/2011
Thinking in Parallel is Hard?
36
3/17/2011
• Perhaps: Thinking is Hard
37
3/17/2011
• Perhaps: Thinking is Hard
• How do we get people to believe:

Thinking in parallel is natural
38
3/17/2011
Making Multicore Work for You
• Who
Wh is
i You?
Y ?
• What is Work?
39
3/17/2011
Who is you? – People with Problems to Solve
• Those who need the utmost in performance

– and are willing to trade ease of use for extra performance
– For them, we need to provide a more powerful engine
(Heavyweight cores for ILP, Accelerators for specific tasks)
• Those who just need to get their job done

– and are content with trading performance for ease of use
– For them, we need a software layer to bridge the multi-core
40
3/17/2011
Making Multicore Work for You
• Who
Wh is
i You?
Y ?
• What is Work?
41
3/17/2011
What is work? – Providing capability for both

(Harnessing the billion+ transistors on each chip)
• In the old days, just add “stuff”

– Everything was transparent to the software
– Performance came because the engine was more powerful
• Branch Prediction, out-of-order Execution
• Larger caches
– But lately, the “stuff” has been more cores,
which is no longer transparent to the software
• Today, Bandwidth and Energy can kill you
• Tomorrow, both will be worse…
• Unless we find a better paradigm
42
3/17/2011
In my opinion, the better paradigm requires exploiting:
-- The transformation hierarchy

-- Parallel programming
43
3/17/2011
Problem
Algorithm
Program
Microarchitecture
Circuits
Electrons
44
3/17/2011
We need to break the Layers
• (We already have in limited cases)
• Pragmas in the Language
• The Refrigerator
• X + Superscalar
• Th
The algorithm,
l ith the
th language,
l the
th compiler,
il
& the microarchitecture all working together
45
3/17/2011
There are plenty of opportunities
• Compiler,
Compiler Microarchitecture
– Multiple levels of cache
– Block-structured ISA
– Part by compiler, part by uarch
– Fast track, slow track
• Algorithm, Compiler, Microarchitecture

– X + superscalar – the Refrigerator
– Niagara X / Pentium Y
• Microarchitecture, Circuits
– Verification Hooks
– Internal fault tolerance
46
3/17/2011
The problem:
• We train computer people to work within their layer

• Too few understand anything outside their layer
and, as to multiple cores:
• People think sequential
47
3/17/2011
What can we do about it?

The answer is Education
• FIRST, Do not accept the premise:

Parallel programming is hard
• SECOND, Do not accept the premise:

It is okay to know only one layer
48
3/17/2011
Parallel Programming is Hard?
• What if we start teaching parallel thinking

in the first course to freshmen
• For example:
– Factorial
– Streaming
49
3/17/2011
What can we do about it?

The answer is Education
• FIRST, Do not accept the premise:

Parallel programming is hard
• SECOND, Do not accept the premise:

It is okay to know only one layer
50
3/17/2011
Students can understand more than one layer
• What if we get rid of “Objects” FIRST

– Students
St d t dod nott gett it – they
th have
h no underpinnings
d i i
– Objects are too high a level of abstraction
– So, students end up memorizing
– Memorizing isn’t learning (and certainly not understanding)
• What if we START with “motivated”

motivated bottom up
– Students build on what they already know
– Memorizing is replaced with real learning
– Continually raising the level of abstraction
• The student sees the layers from the start

– The student makes the connections
– The student understands what is going on
– The layers get broken naturally
51
3/17/2011
The Bottom Line:
• IF we are willing to continue to pursue ILP

– Because high performance tasks can not do without it
• IF we are willing
g to break the layers
y
– Because we don’t have a choice anymore
• IF we are willing to embrace parallel programming
• IF we are willing to accept more than one interface

– One for those who can exploit details of the hardware
– Another for those who can not
52
3/17/2011
Most importantly,
• IF we are willing to understand more than

our own layer of the abstraction hierarchy
so we really can talk to each other
• That means
– Chip designers understanding what algorithms need
– Algorithms folks understanding what hardware provides
– Knowledgeable Software folks bridging the two interfaces
53
3/17/2011
Then the future multicore can work for all of us
54
3/17/2011
Thank you!
55
3/17/2011
Keynote Address:
Multicore, Meganonsense,
and the Future, if We Get It Right
Brought to you by:
56

Multicore Keynote Final

Transféré par

Informations du document

Description originale:

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Multicore Keynote Final

Transféré par

Droits d'auteur :

Formats disponibles

3/17/2011

Multi-core, Mega-nonsense, and

EE Times: Making Multicore Work for You

The Conference: Making Multicore Work for You

At least three questions:

ISA (Instruction Set Arch)

ISA (Instruction Set Arch)

ISA (Instruction Set Arch)

ISA (Instruction Set Arch)

What I want to do today

• Who are the users

The Conference: Making Multicore Work for You

At least three questions:

Multi-core: How we got to where we are

• The first microprocessor (Intel 4004), 1971

• The Pentium chip,

How have we used the available transistors?

How have we used the available transistors?

How have we used the available transistors?

How have we used the available transistors?

How have we used the available transistors?

Multi-core: How we got to where we are

• In the beginning: a better and better uniprocessor

• Followed by: a uniprocessor with a bigger L2 cache

• Today: adding more and more cores

So, What’s the Point

• No, it wasn’t a technological solution to

• Ergo, we do not have to accept them as they are

• i.e., we can get it right the second time,

First we need to disabuse ourselves of a lot of

• Multi-core was a solution to a performance problem

First we need to disabuse ourselves of a lot of

• Multi-core was a solution to a performance problem

First we need to disabuse ourselves of a lot of

• Multi-core was a solution to a performance problem

First we need to disabuse ourselves of a lot of

• Multi-core was a solution to a performance problem

The Asymmetric Chip Multiprocessor (ACMP)

core core Niagara Niagara Niagara Niagara core Niagara Niagara

“Tile-Large” Approach “Niagara” Approach ACMP Approach

Heterogeneous, not Homogeneous

• Since 2002 (The Robert Chien lecture at Illinois)

• More recently ACMP

Large core vs. Small Core

Throughput vs. Serial Performance

First we need to disabuse ourselves of a lot of

• Multi-core was a solution to a performance problem

First we need to disabuse ourselves of a lot of

• Multi-core was a solution to a performance problem

• We double the number of transistors on the chip

• The EV4,5,6,7,8 data: from EV4 to EV8:

First we need to disabuse ourselves of a lot of

• Multi-core was a solution to a performance problem

Can Abstractions lead to trouble?

• Taxi to the airport

• The Scheme chip

• Microsoft developers (Deeper understanding)

First we need to disabuse ourselves of a lot of

• Multi-core was a solution to a performance problem

First we need to disabuse ourselves of a lot of

• Multi-core was a solution to a performance problem

Thinking in Parallel is Hard?