Académique Documents
Professionnel Documents
Culture Documents
Qualcomm Hexagon
DSP: An architecture
optimized for mobile
multimedia and
communications
1
Qualcomm Technologies, Inc. All Rights Reserved
Hexagon DSP processors in Snapdragon products
aDSP: Real-time
media & sensor
Snapdragon processing
800
Audio
Camera Krait Krait
Adreno Sensors
CPU CPU
Display GPU
JPEG Krait Krait
Hexagon Misc.
CPU CPU
Video aDSP Connectivity
Other
2MB L2
Hexagon
Fabric & Memory Controller Modem
mDSP
mDSP: Dedicated
LPDDR3 LPDDR3
modem processing
2
Qualcomm Technologies, Inc. All Rights Reserved
Expansion of Hexagon DSP use cases beyond audio
Image Enhancement
Camera, Still, Video
HexagonV4 based products
HexagonV2/V3
Computer Vision &
Augmented Reality
HexagonV4 based products
Video
HexagonV5 based products
Voice Audio
Sensors
HexagonV5 based products
V4L
V2 V3L 28nm
V1 45nm Nov Apr 2011
65nm 65nm
Dec 2007 2009
Oct 2006
Time
4
Qualcomm Technologies, Inc. All Rights Reserved
Key characteristics of
modem & multimedia applications
Requirements Characteristics
Require fixed real-time Mix of signal processing
performance level & control code
(fps, Mbit/sec, etc.) For modem, Qualcomm does not
Extremely aggressive use a split CPU/DSP architecture.
power & area targets All processing is done on Hexagon
DSP
Multimedia apps have significant
control in the RTOS & frameworks
Heavy L2$ misses
Multimedia is data intensive
Modem is code intensive
5
Qualcomm Technologies, Inc. All Rights Reserved
Hexagon DSP blends features targeted to modem &
multimedia
VLIW
Need multi-issue to
meet performance
Low complexity for
Area & Power
6
Qualcomm Technologies, Inc. All Rights Reserved
VLIW: Area & power efficient multi-issue
Dual 64-bit execution units
Variable sized Standard 8/16/32/64bit data
instruction packets Instruction types
(1 to 4 instructions Cache SIMD vectorized MPY / ALU
per Packet) / SHIFT, Permute, BitOps
Instruction Unit Up to 8 16b MAC/cycle
2 SP FMA/cycle
Device L2
DDR Cache
Memory
/ TCM
7
Qualcomm Technologies, Inc. All Rights Reserved
Maximizing the signal processing code work/packet
Example from inner loop of FFT: Executing 29 simple RISC ops in 1 cycle
8
Qualcomm Technologies, Inc. All Rights Reserved
Maximizing the control code work/packet
Hexagon DSP ISA improves control code efficiency Example C code
over traditional VLIW void example(int *ptr, int val) {
if (ptr!=0) {
*ptr = *ptr + val + 2;
}}
r1 = add(r1,r2) {
memw(r0) = r1 memw(r0) = r1.new
{ memw(r0) = r1 jumpr r31 jumpr r31
memw(r0) = r1
jumpr r31 } }
jumpr r31 }
}
Instr/Packet = Instr/Packet =
7 instr/5 packets = 1.4
7 instr/2packets = 3.5
9
Qualcomm Technologies, Inc. All Rights Reserved
High avg. instructions/packet for targeted use cases
Compound instructions count as 2
5
Average Instructions/VLIW Packet
4.5
4
3.5
3
2.5
2
1.5
1
0.5
0
Computer
Vision Video Imaging Control Audio
10
Source: Qualcomm internal measurements Qualcomm Technologies, Inc. All Rights Reserved
Programmers view of Hexagon DSP HW
multi-threading
Hexagon V5 includes three hardware threads
Architected to look like a multi-core with communication
through shared memory
D D X X D D X X D D X X L2
U U U U U U U U U U U U Cache /
TCM
Register File Register File Register File
11
Qualcomm Technologies, Inc. All Rights Reserved
Hexagon DSP V1-V4: Interleaved multi-threading
Simple round-robin thread scheduling
Number of threads match execution pipe depth
(three threads three execute stages)
All instructions complete before next packet dispatch
Compiler schedules for zero-latency which helps to increase
instructions/VLIW packet
12
Qualcomm Technologies, Inc. All Rights Reserved
Hexagon DSP V5: Dynamic HW multi-threading
Recover some performance when threads idle or stalled
Remove a thread from IMT rotation
On L2 cache misses
Coremarks/ Dhrystone
When in wait-for-interrupt or off MHz DMIPS/MHz
mode 8 5
Additional forwarding to support 7 4.5
2-cycle packets 6
4
3.5
5 3
13
Source: Qualcomm internal measurements Qualcomm Technologies, Inc. All Rights Reserved
Hexagon DSP instructions per cycle
5 Multi-Threaded Apps
4.5
Average Instructions / Cycle
4
3.5
3
2.5
Single-Threaded Apps
2 IPC_DMT
1.5 IPC_IMT
1
0.5
0
14
Source: Qualcomm internal measurements Qualcomm Technologies, Inc. All Rights Reserved
Qualcomm Hexagon DSP architecture
Highly efficient mobile application processordesigned for more
performance per MHz
20
18
DSP Performance per MHz
16
BDTImark2000/MHz
14
12
10
8
6
4
2
0
Mobile Competitor Qualcomm HXGN V4 (1 thread) Qualcomm HXGN V4 (3 threads)
16
Qualcomm Technologies, Inc. All Rights Reserved
MP3 playback power for competitive smartphones
Power
Lower is better
CPU Utilization (%) Detection Time (%) Total Device Power (%)
18
Source: Qualcomm internal measurements. * Power measured at the device battery Qualcomm Technologies, Inc. All Rights Reserved
Hexagon DSP power for different thread utilizations
Excellent near-linear power scalability
(as threads go idle, power used by the thread is nearly eliminated)
Achieved through optimized clock tree design & clock gating
Dhrystone Power, FIR Power,
IMT Mode IMT Mode
100% 100%
90% 90%
80% 80%
70% 70%
60% 60%
50% 50%
40% 40%
Actual Actual
30% Ideal 30% Ideal
20% 20%
10% 10%
0% 0%
19
Source: Qualcomm internal measurements Qualcomm Technologies, Inc. All Rights Reserved
Hexagon DSP Software Development
20
Qualcomm Technologies, Inc. All Rights Reserved
Independent Algorithm Developers on Hexagon DSP
21
Qualcomm Technologies, Inc. All Rights Reserved
Announcing the Hexagon DSP SDK
See the Hexagon DSP SDK in action at Uplinq2013 (www.uplinq.com)
23
Qualcomm Technologies, Inc. All Rights Reserved