Bienvenue sur Scribd !

Ignorer le carrousel

Coursera Lecture 1 1 Hetero 2012

Transféré par

Karthikeyan Balasubramaniam

0% ont trouvé ce document utile (0 vote)

79 vues10 pages

Coursera

Copyright

Formats disponibles

PDF, TXT ou lisez en ligne sur Scribd

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Signaler ce document

Coursera

Droits d'auteur :

Formats disponibles

Téléchargez comme PDF, TXT ou lisez en ligne sur Scribd

Signaler comme contenu inapproprié

0% ont trouvé ce document utile (0 vote)

79 vues10 pages

Coursera Lecture 1 1 Hetero 2012

Transféré par

Karthikeyan Balasubramaniam

Coursera

Droits d'auteur :

Formats disponibles

Téléchargez comme PDF, TXT ou lisez en ligne sur Scribd

Signaler comme contenu inapproprié

Passer à la page

Vous êtes sur la page 1sur 10

Rechercher à l'intérieur du document

Heterogeneous Parallel Programming

Lecture 1.1
Introduction to Heterogeneous
Parallel Computing
Wen-mei Hwu
University of Illinois at Urbana-Champaign

Heterogeneous Parallel Computing

Cloud
Services

Use the best match for the job

(heterogeneity in mobile SOC)
Latency
Cores

DSP Cores

Throughput
Cores

Configurable
Logic/Cores

(c) Wen-mei Hwu, Cool Chips

HW IPs

On-chip
Memories

11/28/2012

Blue Waters Supercomputer

Cray System & Storage cabinets:
Compute nodes:

>25,000

Usable Storage Bandwidth:

>1 TB/s

System Memory:

Memory per core module:

Gemin Interconnect Topology:
Usable Storage:
Peak performance:
Number of AMD Interlogos processors:
Number of AMD x86 core modules:
Number of NVIDIA Kepler GPUs:
3

>300

>1.5 Petabytes
4 GB
3D Torus
>25 Petabytes
>11.5 Petaflops
>49,000
>380,000
>3,000

CPU and GPU have very different design philosophy

CPU

GPU

Latency Oriented Cores

Throughput Oriented Cores

Chip

Compute Unit

Core

Cache/Local Mem

SIMD Unit

Control

Registers

SIMD
Unit

Threading

Local Cache

CPUs: Latency Oriented Design

Large caches
Convert long latency memory
accesses to short latency cache
accesses

Sophisticated control
Branch prediction for reduced
branch latency
Data forwarding for reduced data
latency

Powerful ALU

ALU

CPU
Cache

DRAM

Reduced operation latency

David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2012
ECE408/CS483, University of Illinois, Urbana-Champaign

ALU
Control

GPUs: Throughput Oriented Design

Small caches
To boost memory throughput

Simple control
No branch prediction
No data forwarding

GPU

Energy efficient ALUs

Many, long latency but heavily pipelined
for high throughput

Require massive number of threads

to tolerate latencies
David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2012
ECE408/CS483, University of Illinois, Urbana-Champaign

DRAM

Winning Applications Use Both CPU

and GPU
CPUs for sequential
parts where latency
matters
CPUs can be 10+X faster
than GPUs for sequential
code
David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2012
ECE408/CS483, University of Illinois, Urbana-Champaign

GPUs for parallel parts

where throughput wins
GPUs can be 10+X faster
than CPUs for parallel
code

Heterogeneous parallel computing is

catching on. Data

Financial
Analysis

Scientific
Simulation

Engineering
Simulation

Digital Audio
Processing

Digital Video
Processing

Computer
Vision

Biomedical
Informatics

Statistical
Modeling

Ray Tracing
Rendering

Interactive
Physics

Numerical
Methods

Intensive
Analytics

280 submissions to GPU Computing Gems

90 articles included in two volumes
David Kirk/NVIDIA and Wen-mei W. Hwu, 20072012 ECE408/CS483, University of Illinois, Urbana-

Medical
Imaging

Electronic
Design
Automation

GPU computing is catching on.

CANDE 2011

TO LEARN MORE, READ CHAPTER 1

Vous aimerez peut-être aussi

Overview of Parallel Computing: Shawn T. Brown
Document46 pages
Overview of Parallel Computing: Shawn T. Brown
Karthik Kusuma
Pas encore d'évaluation
Gpgpu Final
Document124 pages
Gpgpu Final
Sibghat Rehman
Pas encore d'évaluation
Running in Parallel
Document24 pages
Running in Parallel
mortezagashti
Pas encore d'évaluation
FPGA and OpenCL
Document31 pages
FPGA and OpenCL
alejandro2112
Pas encore d'évaluation
217 Lec1
Document35 pages
217 Lec1
palash
Pas encore d'évaluation
Hyper Threading: Concepts & Architecture
Document28 pages
Hyper Threading: Concepts & Architecture
zainvi.sf6018
Pas encore d'évaluation
Parallel Arch 2
Document9 pages
Parallel Arch 2
manishbhardwaj8131
Pas encore d'évaluation
Lec 3
Document48 pages
Lec 3
zrashad04
Pas encore d'évaluation
High Performance Networking. Low Latency Devices. 'Network Fabric'
Document40 pages
High Performance Networking. Low Latency Devices. 'Network Fabric'
Jason Wong
Pas encore d'évaluation
Implementation of DSP Algorithms
Document20 pages
Implementation of DSP Algorithms
s tharun
Pas encore d'évaluation
2 Central Processing Unit PDF
Document68 pages
2 Central Processing Unit PDF
معتز العجيلي
Pas encore d'évaluation
Introduction To Heterogeneous Parallel Computing
Document10 pages
Introduction To Heterogeneous Parallel Computing
djc12536
Pas encore d'évaluation
CPU
Document41 pages
CPU
akshay
Pas encore d'évaluation
Memory Cache
Document18 pages
Memory Cache
Funsuk Vangdu
Pas encore d'évaluation
Core Performance
Document13 pages
Core Performance
Narendra Sv
Pas encore d'évaluation
Lecture 9-10 Computer Organization and Architecture
Document25 pages
Lecture 9-10 Computer Organization and Architecture
shashank kumar
Pas encore d'évaluation
Socunit 1
Document65 pages
Socunit 1
Sooraj Sattiraju
Pas encore d'évaluation
Csea - 14200122072 - Samrat Mondal (Co)
Document10 pages
Csea - 14200122072 - Samrat Mondal (Co)
Samrat Mondal
Pas encore d'évaluation
cs668 Lec1 ParallelArch
Document18 pages
cs668 Lec1 ParallelArch
IshAurea
Pas encore d'évaluation
Ahmad Aljebaly Department of Computer Science Western Michigan University
Document42 pages
Ahmad Aljebaly Department of Computer Science Western Michigan University
Arushi Mittal
Pas encore d'évaluation
CS 294-73 Software Engineering For Scientific Computing Lecture 14: Performance On Cache-Based Systems, Profiling & Tips For C++
Document34 pages
CS 294-73 Software Engineering For Scientific Computing Lecture 14: Performance On Cache-Based Systems, Profiling & Tips For C++
Edmund Zin
Pas encore d'évaluation
Week 6 A
Document32 pages
Week 6 A
hussmalik69
Pas encore d'évaluation
Unit 5 (Slides)
Document75 pages
Unit 5 (Slides)
Keerthana g.krishnan
Pas encore d'évaluation
Comarch - Week 5 Memory
Document31 pages
Comarch - Week 5 Memory
T Vinassaur
Pas encore d'évaluation
AD Up Dig Design Be A
Document130 pages
AD Up Dig Design Be A
Tutul Banerjee
Pas encore d'évaluation
Multi-Core Processor: From Wikipedia, The Free Encyclopedia
Document10 pages
Multi-Core Processor: From Wikipedia, The Free Encyclopedia
Siva Kumar
Pas encore d'évaluation
Chapter01 EKT333 Part2 NA
Document33 pages
Chapter01 EKT333 Part2 NA
Mat Dunlop Bin Senapang Gajah
Pas encore d'évaluation
The Central Processing Unit:: What Goes On Inside The Computer
Document58 pages
The Central Processing Unit:: What Goes On Inside The Computer
Manish Jangid
Pas encore d'évaluation
Dual Core Processors: Presented by Prachi Mishra IT - 56
Document16 pages
Dual Core Processors: Presented by Prachi Mishra IT - 56
Prachi Mishra
Pas encore d'évaluation
CS 303 Chapter1, Lecture 3
Document18 pages
CS 303 Chapter1, Lecture 3
HARSH MITTAL
Pas encore d'évaluation
Ch.2 Performance Issues: Computer Organization and Architecture
Document25 pages
Ch.2 Performance Issues: Computer Organization and Architecture
Zhen Xiang
Pas encore d'évaluation
Microprocessors, Advanced: Partitioning An Embedded System For Multicore Design
Document36 pages
Microprocessors, Advanced: Partitioning An Embedded System For Multicore Design
Kevin_I
Pas encore d'évaluation
Multicore Processor
Document15 pages
Multicore Processor
Phani Kumar
Pas encore d'évaluation
Multicore Computers
Document29 pages
Multicore Computers
r4h4514
100% (1)
Introduction To OS
Document34 pages
Introduction To OS
Teo Siew
Pas encore d'évaluation
02 - Computer Evolution and Performance
Document37 pages
02 - Computer Evolution and Performance
MAHNOOR ATIF
Pas encore d'évaluation
Unit-III: Memory: Topics
Document54 pages
Unit-III: Memory: Topics
Zain Shoaib Mohammad
Pas encore d'évaluation
Lecture 5
Document31 pages
Lecture 5
Mohsin Ali
Pas encore d'évaluation
Lecture 3 - Threads
Document28 pages
Lecture 3 - Threads
firaol.bulo
Pas encore d'évaluation
GPU Architecture: Alan Gray Epcc The University of Edinburgh
Document30 pages
GPU Architecture: Alan Gray Epcc The University of Edinburgh
Ildson Leno
Pas encore d'évaluation
Lec4 1
Document33 pages
Lec4 1
ASHI KOTHARI
Pas encore d'évaluation
Computer Architecture Note by Redwan (UptoMemorySystem)
Document64 pages
Computer Architecture Note by Redwan (UptoMemorySystem)
Tabassum Reza
100% (1)
The Central Processing Unit:: What Goes On Inside The Computer
Document42 pages
The Central Processing Unit:: What Goes On Inside The Computer
Mag Creation
Pas encore d'évaluation
Unit VI - Multi Core Architectures
Document51 pages
Unit VI - Multi Core Architectures
Ravikiran Holkar
Pas encore d'évaluation
18 Multicore Computers
Document31 pages
18 Multicore Computers
Teguh Setiono
0% (1)
01 Operating System
Document19 pages
01 Operating System
Raza Ahmad
Pas encore d'évaluation
IAS & MIPS Rate
Document42 pages
IAS & MIPS Rate
Waseem Haider
Pas encore d'évaluation
Chapter 1
Document45 pages
Chapter 1
donquixote rosinante
Pas encore d'évaluation
Arduinocourse 30-01-23
Document327 pages
Arduinocourse 30-01-23
Phúc Nguyễn
Pas encore d'évaluation
Computer System: Operating Systems: Internals and Design Principles
Document62 pages
Computer System: Operating Systems: Internals and Design Principles
Ramadan Elhendawy
Pas encore d'évaluation
Computer Architecture: Khiyam Iftikhar
Document21 pages
Computer Architecture: Khiyam Iftikhar
Hassan Asghar
Pas encore d'évaluation
Pankaj
Document27 pages
Pankaj
sanjeev2838
Pas encore d'évaluation
Final Report: Multicore Processors
Document12 pages
Final Report: Multicore Processors
Jigar Kaneriya
Pas encore d'évaluation
The Central Processing Unit:: What Goes On Inside The Computer
Document56 pages
The Central Processing Unit:: What Goes On Inside The Computer
Sam CH
Pas encore d'évaluation
Design Issues: SMT and CMP Architectures
Document9 pages
Design Issues: SMT and CMP Architectures
Senthil Kumar
Pas encore d'évaluation
Chapter01 EKT333 Part2 NA
Document33 pages
Chapter01 EKT333 Part2 NA
Safwan Md Isa
Pas encore d'évaluation
Multicore Expo Keynote Gatherer v2r
Document10 pages
Multicore Expo Keynote Gatherer v2r
Ahmed
Pas encore d'évaluation
Multi-Core Processor
Document20 pages
Multi-Core Processor
ALEXANDRA LONGGANAY
Pas encore d'évaluation
Lecture 10
Document34 pages
Lecture 10
MAIMONA KHALID
Pas encore d'évaluation
Mastering Embedded Linux Programming
D'Everand
Mastering Embedded Linux Programming
Simmonds Chris
Évaluation : 5 sur 5 étoiles
5/5 (1)
Daslides PDF
Document298 pages
Daslides PDF
Karthikeyan Balasubramaniam
Pas encore d'évaluation
Generalized Service Replication Process in Distributed Enviroment S
Document8 pages
Generalized Service Replication Process in Distributed Enviroment S
Karthikeyan Balasubramaniam
Pas encore d'évaluation
Generalized Service Replication Process in Distributed Enviroment S
Document8 pages
Generalized Service Replication Process in Distributed Enviroment S
Karthikeyan Balasubramaniam
Pas encore d'évaluation
134702484
Document25 pages
134702484
Karthikeyan Balasubramaniam
Pas encore d'évaluation
4-42349 Rphybridclouddummies
Document36 pages
4-42349 Rphybridclouddummies
Karthikeyan Balasubramaniam
Pas encore d'évaluation
Bigtable: A Distributed Storage System For Structured Data
Document14 pages
Bigtable: A Distributed Storage System For Structured Data
Max Chiu
Pas encore d'évaluation
06 Dynamic Programming
Document53 pages
06 Dynamic Programming
Karthikeyan Balasubramaniam
Pas encore d'évaluation
RPC PDF
Document21 pages
RPC PDF
Karthikeyan Balasubramaniam
Pas encore d'évaluation
Sample Linked Lists Chapter (Data Structure and Algorithmic Thinking With Python)
Document47 pages
Sample Linked Lists Chapter (Data Structure and Algorithmic Thinking With Python)
Karthikeyan Balasubramaniam
Pas encore d'évaluation
5 Pitfalls To Avoid With Hadoop
Document20 pages
5 Pitfalls To Avoid With Hadoop
Karthikeyan Balasubramaniam
Pas encore d'évaluation
GRE Study Timeline
Document3 pages
GRE Study Timeline
Karthikeyan Balasubramaniam
Pas encore d'évaluation
C Questions
Document14 pages
C Questions
Karthikeyan Balasubramaniam
Pas encore d'évaluation
Untitled
Document2 pages
Untitled
Karthikeyan Balasubramaniam
Pas encore d'évaluation
Advanced Computer Systems Architecture Lect-1
Document31 pages
Advanced Computer Systems Architecture Lect-1
HussainShabbir
Pas encore d'évaluation
Brady 2019 The Challenge of Big Data and Data Science
Document29 pages
Brady 2019 The Challenge of Big Data and Data Science
Bárbara Magalhaes
Pas encore d'évaluation
Parallel, Cluster and Grid Computing: by P.S.Dhekne, BARC Dhekne@barc - Gov.in
Document92 pages
Parallel, Cluster and Grid Computing: by P.S.Dhekne, BARC Dhekne@barc - Gov.in
Mohamed Tallman
Pas encore d'évaluation
Wiley CIO Architecting The Cloud Design Decisions For Cloud Computing Service Models SaaS PaaS and IaaS
Document18 pages
Wiley CIO Architecting The Cloud Design Decisions For Cloud Computing Service Models SaaS PaaS and IaaS
Eric Feunekes
Pas encore d'évaluation
Karmani Agha
Document9 pages
Karmani Agha
1hh2
Pas encore d'évaluation
Parallel Algorithms For Logic Synthesis
Document7 pages
Parallel Algorithms For Logic Synthesis
Kruthi Subramanya
Pas encore d'évaluation
Data Mining
Document18 pages
Data Mining
ani7890
100% (3)
Lesson Plan Cao
Document4 pages
Lesson Plan Cao
Princy Usha
Pas encore d'évaluation
Vlsi Design PDF
Document22 pages
Vlsi Design PDF
Kishore Kumar
Pas encore d'évaluation
Exam - Chinese PDF
Document81 pages
Exam - Chinese PDF
mlachake1978
100% (1)
Flynn's Classification
Document3 pages
Flynn's Classification
Abhishek Lahoti
Pas encore d'évaluation
The Design and Analysis of Parallel Algorithm by S.G.akl
Document415 pages
The Design and Analysis of Parallel Algorithm by S.G.akl
Vikram Singh
70% (10)
Explicit Dynamics Features
Document7 pages
Explicit Dynamics Features
om12315
Pas encore d'évaluation
Chapter 9 - M J Flynn Classification
Document14 pages
Chapter 9 - M J Flynn Classification
Santanu Senapati
Pas encore d'évaluation
ANSYS FENSAP-ICE Installation and Licensing Guide
Document36 pages
ANSYS FENSAP-ICE Installation and Licensing Guide
V Caf
Pas encore d'évaluation
SMM Cap1
Document101 pages
SMM Cap1
GeorgeAnton
Pas encore d'évaluation
Taking Neuromorphic Computing To The Next Level With Loihi 2
Document7 pages
Taking Neuromorphic Computing To The Next Level With Loihi 2
Aleks Human
Pas encore d'évaluation
Contatc Definition in LS-Dyna
Document34 pages
Contatc Definition in LS-Dyna
Mr Polash
Pas encore d'évaluation
CUDA Getting Started Linux PDF
Document32 pages
CUDA Getting Started Linux PDF
Carlos Ruiz
Pas encore d'évaluation
Nvidia Cuda
Document26 pages
Nvidia Cuda
Arpit Vijayvergia
Pas encore d'évaluation
Cu HMM
Document13 pages
Cu HMM
Rupesh Parab
Pas encore d'évaluation
Unit I-Basic Structure of A Computer: System
Document64 pages
Unit I-Basic Structure of A Computer: System
Pavithra Janarthanan
Pas encore d'évaluation
Department of Cse CP7103 Multicore Architecture Unit - 2, DLP in Vector, Simd and Gpu Architectures 100% THEORY Question Bank
Document3 pages
Department of Cse CP7103 Multicore Architecture Unit - 2, DLP in Vector, Simd and Gpu Architectures 100% THEORY Question Bank
Deebika Kaliyaperumal
Pas encore d'évaluation
Embedded
Document9 pages
Embedded
Nanc Joy
100% (1)
Chapter 08 - Pipeline and Vector Processing
Document14 pages
Chapter 08 - Pipeline and Vector Processing
Bijay Mishra
Pas encore d'évaluation
Amdahl Law
Document2 pages
Amdahl Law
suresh012
Pas encore d'évaluation
8 Nvidia PDF
Document48 pages
8 Nvidia PDF
luis900000
Pas encore d'évaluation
Chris Loosley, Frank Douglas, Alex Mimo-High-performance Client - Server-Wiley (1998)
Document551 pages
Chris Loosley, Frank Douglas, Alex Mimo-High-performance Client - Server-Wiley (1998)
RoxanaRacu
0% (1)
DB2BP HPU Data Movement 1212
Document35 pages
DB2BP HPU Data Movement 1212
Gsr Sandeep
Pas encore d'évaluation
RES3DMOD Ver. 2.14: For Windows 95/98/Me/2000/NT
Document9 pages
RES3DMOD Ver. 2.14: For Windows 95/98/Me/2000/NT
lutte88
Pas encore d'évaluation