Vous êtes sur la page 1sur 42

Terms

What is GPGPU?
General-Purpose computing on a Graphics

Processing Unit Using graphic hardware for non-graphic computations

What is CUDA?
Compute Unified Device Architecture Software architecture for managing data-parallel

programming

Introduction
What is GPU? It is a processor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multithreaded multiprocessor optimized for visual computing. It provide real-time visual interaction with computed objects via graphics images, and video. It serves as both a programmable graphics processor and a scalable parallel computing platform. Heterogeneous Systems: combine a GPU with a CPU

GPU Evolution

1980s No GPU. PC used VGA controller 1990s Add more function into VGA controller 1997 3D acceleration functions: Hardware for triangle setup and rasterization Texture mapping Shading 2000 A single chip graphics processor ( beginning of GPU term) 2005 Massively parallel programmable processors 2007 CUDA (Compute Unified Device Architecture)

GPU Architecture

Processing Element

Processing element = thread processor = ALU

Memory Architecture

Constant Memory Texture Memory Device Memory

Traditional Graphics Pipeline


Vertex processing Rasterizer Fragment processing Renderer (textures)

CPU vs. GPU

CPU
Fast caches Branching adaptability High performance

GPU
Multiple ALUs Fast onboard memory High throughput on parallel tasks Executes program on each fragment/vertex

CPUs are great for task parallelism GPUs are great for data parallelism

CPU vs. GPU

GPUs contain much larger number of dedicated ALUs then CPUs.

GPUs also contain extensive support of Stream Processing paradigm. It is related to SIMD ( Single Instruction Multiple Data) processing. Each processing unit on GPU contains local memory that improves data manipulation and reduces fetch time.

CPU vs. GPU - Hardware

More transistors devoted to data processing


11

CPU vs. GPU

12

What is CUDA

CUDA is a set of developing tools to create applications that will perform execution on GPU (Graphics Processing Unit).
CUDA compiler uses variation of C with future support of C++ CUDA was developed by NVidia and as such can only run on NVidia GPUs of G8x series and up.

CUDA was released on February 15, 2007 for PC and Beta version for MacOS X on August 19, 2008.

Why CUDA
CUDA provides ability to use high-level languages such as C

to develop application that can take advantage of high level of performance and scalability that GPUs architecture offer.
GPUs allow creation of very large number of concurrently

executed threads at very low system resource cost.


CUDA also exposes fast shared memory (16KB) that can be

shared between threads.


Full support for integer and bitwise operations. Compiled code will run directly on GPU.

Software Requirements/Tools

CUDA device driver CUDA Software Development Kit


Emulator

CUDA Toolkit
Occupancy calculator Visual profiler

15

To compute, we need to:

Allocate memory that will be used for the computation (variable declaration and allocation) Read the data that we will compute on (input) Specify the computation that will be performed Write to the appropriate device the results (output)

16

A GPU is a specialized computer


We need to allocate space in the video cards memory for the variables. The video card does not have I/O devices, hence we need to copy the input data from the memory in the host computer into the memory in the video card, using the variable allocated in the previous step. We need to specify code to execute. Copy the results back to the memory in the host computer.

Supercomputing 2008 Education Program

17

Initially:

array Hosts Memory GPU Cards Memory

18

Allocate Memory in the GPU card

array Hosts Memory

array_d GPU Cards Memory

Supercomputing 2008 Education Program

19

Copy content from the hosts memory to the GPU card memory

array Hosts Memory

array_d GPU Cards Memory

20

Execute code on the GPU


GPU MPs

array Hosts Memory

array_d GPU Cards Memory

21

Copy results back to the host memory

array Hosts Memory

array_d GPU Cards Memory

22

The Kernel

It is necessary to write the code that will be executed in the stream processors in the GPU card That code, called the kernel, will be downloaded and executed, simultaneously and in lock-step fashion, in several (all?) stream processors in the GPU card How is every instance of the kernel going to know which piece of data it is working on?
23

In the GPU:
Processing Elements

Array Elements
Block 0 Block 1

24

To compile:
nvcc simple.c simple.cu o simple The compiler generates the code for both the host and the GPU Demo on cuda.littlefe.net

25

26

27

28

29

30

31

32

33

34

35

36

What are those blockIds and threadIds?


With a minor modification to the code, we can print the blockIds and threadIds We will use two arrays instead of just one.
One for the blockIds

One for the threadIds

The code in the kernel:


x=blockIdx.x*BLOCK_SIZE+threadIdx.x; block_d[x] = blockIdx.x; thread_d[x] = threadIdx.x;

37

In the GPU:
Processing Elements
Thread 0 Thread 1 Thread 2 Thread 3 Thread 0 Thread 1 Thread 2 Thread 3

Array Elements
Block 0 Block 1

38

Testing - Matrices

Test the multiplication of two matrices. Creates two matrices with random floating point values. We tested with matrices of various dimensions

Results:
Dim\Time 64x64 128x128 CUDA 0.417465 ms 0.41691 ms CPU 18.0876 ms 18.3007 ms

256x256
512x512 768x768 1024x1024

2.146367 ms
8.093004 ms 25.97624 ms 52.42811 ms

145.6302 ms
1494.7275 ms 4866.3246 ms 66097.1688 ms

Pixel / Thread Processing

41

Applications of CUDA
Electrodynamics and Electromagnetic Nuclear Physics, Molecular Dynamics and Computational Chemistry Video, Imaging and Vision Applications Game Industry Matlab, Labview , Mathematica, R Weather and Ocean Modeling Financial Computing and Options Pricing Medical Imaging, CT, MRI Government and Defence Geophysics
42

Vous aimerez peut-être aussi