Académique Documents
Professionnel Documents
Culture Documents
Lecture 1.1
Introduction to Heterogeneous
Parallel Computing
Wen-mei Hwu
University of Illinois at Urbana-Champaign
DSP Cores
Throughput
Cores
Configurable
Logic/Cores
HW IPs
On-chip
Memories
11/28/2012
>25,000
>1 TB/s
System Memory:
>300
>1.5 Petabytes
4 GB
3D Torus
>25 Petabytes
>11.5 Petaflops
>49,000
>380,000
>3,000
GPU
Chip
Compute Unit
Core
Cache/Local Mem
SIMD Unit
Control
Registers
Registers
SIMD
Unit
Threading
Local Cache
Sophisticated control
Branch prediction for reduced
branch latency
Data forwarding for reduced data
latency
Powerful ALU
ALU
ALU
ALU
CPU
Cache
DRAM
ALU
Control
Simple control
No branch prediction
No data forwarding
GPU
DRAM
Financial
Analysis
Scientific
Simulation
Engineering
Simulation
Digital Audio
Processing
Digital Video
Processing
Computer
Vision
Biomedical
Informatics
Statistical
Modeling
Ray Tracing
Rendering
Interactive
Physics
Numerical
Methods
Intensive
Analytics
Medical
Imaging
Electronic
Design
Automation
CANDE 2011