Académique Documents
Professionnel Documents
Culture Documents
What is GPGPU?
General-Purpose computing on a Graphics
What is CUDA?
Compute Unified Device Architecture Software architecture for managing data-parallel
programming
Introduction
What is GPU? It is a processor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multithreaded multiprocessor optimized for visual computing. It provide real-time visual interaction with computed objects via graphics images, and video. It serves as both a programmable graphics processor and a scalable parallel computing platform. Heterogeneous Systems: combine a GPU with a CPU
GPU Evolution
1980s No GPU. PC used VGA controller 1990s Add more function into VGA controller 1997 3D acceleration functions: Hardware for triangle setup and rasterization Texture mapping Shading 2000 A single chip graphics processor ( beginning of GPU term) 2005 Massively parallel programmable processors 2007 CUDA (Compute Unified Device Architecture)
GPU Architecture
Processing Element
Memory Architecture
CPU
Fast caches Branching adaptability High performance
GPU
Multiple ALUs Fast onboard memory High throughput on parallel tasks Executes program on each fragment/vertex
CPUs are great for task parallelism GPUs are great for data parallelism
GPUs also contain extensive support of Stream Processing paradigm. It is related to SIMD ( Single Instruction Multiple Data) processing. Each processing unit on GPU contains local memory that improves data manipulation and reduces fetch time.
12
What is CUDA
CUDA is a set of developing tools to create applications that will perform execution on GPU (Graphics Processing Unit).
CUDA compiler uses variation of C with future support of C++ CUDA was developed by NVidia and as such can only run on NVidia GPUs of G8x series and up.
CUDA was released on February 15, 2007 for PC and Beta version for MacOS X on August 19, 2008.
Why CUDA
CUDA provides ability to use high-level languages such as C
to develop application that can take advantage of high level of performance and scalability that GPUs architecture offer.
GPUs allow creation of very large number of concurrently
Software Requirements/Tools
CUDA Toolkit
Occupancy calculator Visual profiler
15
Allocate memory that will be used for the computation (variable declaration and allocation) Read the data that we will compute on (input) Specify the computation that will be performed Write to the appropriate device the results (output)
16
17
Initially:
18
19
Copy content from the hosts memory to the GPU card memory
20
21
22
The Kernel
It is necessary to write the code that will be executed in the stream processors in the GPU card That code, called the kernel, will be downloaded and executed, simultaneously and in lock-step fashion, in several (all?) stream processors in the GPU card How is every instance of the kernel going to know which piece of data it is working on?
23
In the GPU:
Processing Elements
Array Elements
Block 0 Block 1
24
To compile:
nvcc simple.c simple.cu o simple The compiler generates the code for both the host and the GPU Demo on cuda.littlefe.net
25
26
27
28
29
30
31
32
33
34
35
36
With a minor modification to the code, we can print the blockIds and threadIds We will use two arrays instead of just one.
One for the blockIds
37
In the GPU:
Processing Elements
Thread 0 Thread 1 Thread 2 Thread 3 Thread 0 Thread 1 Thread 2 Thread 3
Array Elements
Block 0 Block 1
38
Testing - Matrices
Test the multiplication of two matrices. Creates two matrices with random floating point values. We tested with matrices of various dimensions
Results:
Dim\Time 64x64 128x128 CUDA 0.417465 ms 0.41691 ms CPU 18.0876 ms 18.3007 ms
256x256
512x512 768x768 1024x1024
2.146367 ms
8.093004 ms 25.97624 ms 52.42811 ms
145.6302 ms
1494.7275 ms 4866.3246 ms 66097.1688 ms
41
Applications of CUDA
Electrodynamics and Electromagnetic Nuclear Physics, Molecular Dynamics and Computational Chemistry Video, Imaging and Vision Applications Game Industry Matlab, Labview , Mathematica, R Weather and Ocean Modeling Financial Computing and Options Pricing Medical Imaging, CT, MRI Government and Defence Geophysics
42