CUDA Introduction

Terms
What is GPGPU?
General-Purpose computing on a Graphics
Processing Unit Using graphic hardware for non-graphic computations
What is CUDA?
Compute Unified Device Architecture Software architecture for managing data-parallel
programming
Introduction
What is GPU? It is a processor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multithreaded multiprocessor optimized for visual computing. It provide real-time visual interaction with computed objects via graphics images, and video. It serves as both a programmable graphics processor and a scalable parallel computing platform. Heterogeneous Systems: combine a GPU with a CPU
GPU Evolution

1980s No GPU. PC used VGA controller 1990s Add more function into VGA controller 1997 3D acceleration functions: Hardware for triangle setup and rasterization Texture mapping Shading 2000 A single chip graphics processor ( beginning of GPU term) 2005 Massively parallel programmable processors 2007 CUDA (Compute Unified Device Architecture)
GPU Architecture
Processing Element
Processing element = thread processor = ALU
Memory Architecture
Constant Memory Texture Memory Device Memory
Traditional Graphics Pipeline

Vertex processing Rasterizer Fragment processing Renderer (textures)
CPU vs. GPU
CPU
Fast caches Branching adaptability High performance
GPU
Multiple ALUs Fast onboard memory High throughput on parallel tasks Executes program on each fragment/vertex
CPUs are great for task parallelism GPUs are great for data parallelism
CPU vs. GPU
GPUs contain much larger number of dedicated ALUs then CPUs.
GPUs also contain extensive support of Stream Processing paradigm. It is related to SIMD ( Single Instruction Multiple Data) processing. Each processing unit on GPU contains local memory that improves data manipulation and reduces fetch time.
CPU vs. GPU - Hardware
More transistors devoted to data processing

11
CPU vs. GPU
12
What is CUDA
CUDA is a set of developing tools to create applications that will perform execution on GPU (Graphics Processing Unit).
CUDA compiler uses variation of C with future support of C++ CUDA was developed by NVidia and as such can only run on NVidia GPUs of G8x series and up.
CUDA was released on February 15, 2007 for PC and Beta version for MacOS X on August 19, 2008.
Why CUDA
CUDA provides ability to use high-level languages such as C
to develop application that can take advantage of high level of performance and scalability that GPUs architecture offer.
GPUs allow creation of very large number of concurrently
executed threads at very low system resource cost.

CUDA also exposes fast shared memory (16KB) that can be
shared between threads.

Full support for integer and bitwise operations. Compiled code will run directly on GPU.
Software Requirements/Tools
CUDA device driver CUDA Software Development Kit

Emulator
CUDA Toolkit
Occupancy calculator Visual profiler
15
To compute, we need to:
Allocate memory that will be used for the computation (variable declaration and allocation) Read the data that we will compute on (input) Specify the computation that will be performed Write to the appropriate device the results (output)
16
A GPU is a specialized computer

We need to allocate space in the video cards memory for the variables. The video card does not have I/O devices, hence we need to copy the input data from the memory in the host computer into the memory in the video card, using the variable allocated in the previous step. We need to specify code to execute. Copy the results back to the memory in the host computer.
Supercomputing 2008 Education Program
17
Initially:
array Hosts Memory GPU Cards Memory
18
Allocate Memory in the GPU card
array Hosts Memory
array_d GPU Cards Memory
Supercomputing 2008 Education Program
19
Copy content from the hosts memory to the GPU card memory
array Hosts Memory
20
Execute code on the GPU

GPU MPs
array Hosts Memory
21
Copy results back to the host memory
array Hosts Memory
22
The Kernel
It is necessary to write the code that will be executed in the stream processors in the GPU card That code, called the kernel, will be downloaded and executed, simultaneously and in lock-step fashion, in several (all?) stream processors in the GPU card How is every instance of the kernel going to know which piece of data it is working on?
23
In the GPU:
Processing Elements
Array Elements
Block 0 Block 1
24
To compile:
nvcc simple.c simple.cu o simple The compiler generates the code for both the host and the GPU Demo on cuda.littlefe.net
25
26
27
28
29
30
31
32
33
34
35
36
What are those blockIds and threadIds?

With a minor modification to the code, we can print the blockIds and threadIds We will use two arrays instead of just one.
One for the blockIds
One for the threadIds
The code in the kernel:

x=blockIdx.x*BLOCK_SIZE+threadIdx.x; block_d[x] = blockIdx.x; thread_d[x] = threadIdx.x;
37
In the GPU:
Processing Elements
Thread 0 Thread 1 Thread 2 Thread 3 Thread 0 Thread 1 Thread 2 Thread 3
Array Elements
Block 0 Block 1
38
Testing - Matrices
Test the multiplication of two matrices. Creates two matrices with random floating point values. We tested with matrices of various dimensions
Results:
Dim\Time 64x64 128x128 CUDA 0.417465 ms 0.41691 ms CPU 18.0876 ms 18.3007 ms
256x256
512x512 768x768 1024x1024
2.146367 ms
8.093004 ms 25.97624 ms 52.42811 ms
145.6302 ms
1494.7275 ms 4866.3246 ms 66097.1688 ms
Pixel / Thread Processing
41
Applications of CUDA
Electrodynamics and Electromagnetic Nuclear Physics, Molecular Dynamics and Computational Chemistry Video, Imaging and Vision Applications Game Industry Matlab, Labview , Mathematica, R Weather and Ocean Modeling Financial Computing and Options Pricing Medical Imaging, CT, MRI Government and Defence Geophysics
42

CUDA Introduction

Transféré par

Informations du document

Description originale:

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

CUDA Introduction

Transféré par

Droits d'auteur :

Formats disponibles

Terms

Processing Unit Using graphic hardware for non-graphic computations

Processing element = thread processor = ALU

Constant Memory Texture Memory Device Memory

Traditional Graphics Pipeline

CPU vs. GPU

CPU vs. GPU

GPUs contain much larger number of dedicated ALUs then CPUs.

CPU vs. GPU - Hardware

More transistors devoted to data processing

CPU vs. GPU

executed threads at very low system resource cost.

shared between threads.

CUDA device driver CUDA Software Development Kit

To compute, we need to:

A GPU is a specialized computer

Supercomputing 2008 Education Program

array Hosts Memory GPU Cards Memory

Allocate Memory in the GPU card

array Hosts Memory

array_d GPU Cards Memory

Supercomputing 2008 Education Program

array Hosts Memory

array_d GPU Cards Memory

Execute code on the GPU

array Hosts Memory

array_d GPU Cards Memory

Copy results back to the host memory

array Hosts Memory

array_d GPU Cards Memory

What are those blockIds and threadIds?

One for the threadIds

The code in the kernel:

Pixel / Thread Processing

Vous aimerez peut-être aussi