CUDA - Tutorial 1 - Getting Started - The Super Computing Blog

CUDA - Tutorial 1 - Getting Started | The Supercomputing Blog
5/20/11 12:09 AM
The Supercomputing Blog

Maximize your computing performance
CUDA Tutorial 1 Getting Started

Welcome to the first tutorial for getting started programming with CUDA. This tutorial will show you how to do calculations with your CUDA-capable GPU. Any nVidia chip with is series 8 or later is CUDA -capable. This tutorial will also give you some data on how much faster the GPU can do calculations when compared to a CPU. In this tutorial, there are going to be two arrays which contain floating point number, and an array for results. Our program will simply computer C[i] = A[i] * B[i], and measure how fast this executes on the GPU and on the CPU. Later, well make this equation more complicated and study the results.
Step 1: Get the CUDA SDK

You can freely download and use the CUDA SDK. It is available for Linux, Windows XP, Vista, and Mac OS. After you download and install the SDK, please compile and run a sample program which is included, just to make sure your installation and drivers are all up to date. For windows, youll need Visual Studio 2005, the express version is fine. Visual Studio 2008 is not supported at the time of this writing. Once you get your compiler and computer setup for compiling and running CUDA programs, you may proceed to step 2.
Step 2: Start a new project

When starting a new project, I usually simply copy the convolutionSeperable example in the CUDA SDK, and rename it. This is effective because its one of the smaller examples.
Step 3: Understanding how the memory works

For this tutorial program, we will need to allocate three large arrays, both on the host machine, and on the GPU. Because CUDA kernels can only access memory dedicated to the GPU, we will need to seperately allocate memory space both on the host machine, and on the GPU. This is a very important key concept. Even if the GPU is running on a shared memory device like ION, or any other nVidia motherboard chipset, the program must still be written this way, even if it is just to ensure that the program will run on a GPU with dedicated memory.
float *h_dataA, *h_dataB, *h_resultC; float *d_dataA, *d_dataB, *d_resultC; h_dataA = (float *)malloc(sizeof(float) * MAX_DATA_SIZE); h_dataB = (float *)malloc(sizeof(float) * MAX_DATA_SIZE); h_resultC = (float *)malloc(sizeof(float) * MAX_DATA_SIZE); CUDA_SAFE_CALL( cudaMalloc( (void **)&d_dataA, sizeof(float) * MAX_DATA_SIZE) ); CUDA_SAFE_CALL( cudaMalloc( (void **)&d_dataB, sizeof(float) * MAX_DATA_SIZE) ); CUDA_SAFE_CALL( cudaMalloc( (void **)&d_resultC , sizeof(float) * MAX_DATA_SIZE) );
Note that malloc is being used instead of the C++ operator new. This is because the program is written in C. Dont worry, it is possible and easy to work with CUDA and C++ in the same program, but this will be covered in a later tutorial. To allocate memory on the device, its important to call cudaMalloc(void **ppData, int numBytes).
http://supercomputingblog.com/cuda/cuda-tutorial-1-getting-started/
Page 1 of 2
CUDA - Tutorial 1 - Getting Started | The Supercomputing Blog
5/20/11 12:09 AM
Step 4: Using the high precision timer

Have you ever been bothered by the fact that finding a high-precision timer or counter for a program while maintaining platform compatibility? This problem still exists today, even though it should have been solved a decade ago. Fortunately, the GPU has a highly accurate counter which can be used to accurately measure the performance of GPU or CPU activities.
double gpuTime; unsigned int hTimer; CUT_SAFE_CALL(cutCreateTimer(&hTimer)); CUT_SAFE_CALL( cutResetTimer(hTimer) ); CUT_SAFE_CALL( cutStartTimer(hTimer) ); // Do work here CUT_SAFE_CALL(cutStopTimer(hTimer)); gpuTime = cutGetTimerValue(hTimer);
Step 5: Copy data to and from the device

// Copy the data to the device CUDA_SAFE_CALL( cudaMemcpy(d_dataA, h_dataA, sizeof(float) * dataAmount, cudaMemcpyHostToDevice) ); CUDA_SAFE_CALL( cudaMemcpy(d_dataB, h_dataB, sizeof(float) * dataAmount, cudaMemcpyHostToDevice) ); // Do the multiplication on the GPU multiplyNumbersGPU<<<blockGridRows, threadBlockRows>>>(d_dataA, d_dataB, d_resultC); CUT_CHECK_ERROR("multiplyNumbersGPU() execution failed\n"); CUDA_SAFE_CALL( cudaThreadSynchronize() ); // Copy the data back to the host CUDA_SAFE_CALL( cudaMemcpy(h_resultC, d_dataA, sizeof(float) * dataAmount, cudaMemcpyDeviceToHost) );
In the code above, two data arrays are copied to the device. The kernel is then executed. After the kernel is executed, the results which still reside in the GPU memory must be copied back to the host memory. Notice the interesting syntax for calling the kernel. When the host calls a CUDA kernel function, many threads are spawned. However, we need to specify how many threads are spawned, and how those threads are organized. This will be discussed in the CUDA kernel tutorial. Also notice the last argument in the cudaMemcpy function. This controlls whether data is being sent from the host machine to the CUDA device, or vice versa. It is also possible to use this function to copy data from the CUDA device to another location on the same CUDA device. This concludes this tutorial. We covered how to use the CUDA precision timer, how memory must be allocated both on the device, and on the host machine, and finally, how to copy data to and from the device. Proceed to the next tutorial to learn how to write a kernel, and how threads are organized when executing a kernel. The next tutorial will also present some results to you showing just how fast CUDA functions can be when compared to doing the same calculations on a CPU.
Next: Tutorial 2 The Kernel Back to CUDA Tutorials

This entry was posted by admin on July 11, 2009 at 12:01 am under CUDA. Tagged Basic, CUDA, Getting Started, Tutorial. Responses are currently closed, but you can trackback from your own site. Follow any responses to this entry through the RSS 2.0 feed.
http://supercomputingblog.com/cuda/cuda-tutorial-1-getting-started/
Page 2 of 2

CUDA - Tutorial 1 - Getting Started - The Super Computing Blog

Transféré par

Informations du document

Description originale:

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

CUDA - Tutorial 1 - Getting Started - The Super Computing Blog

Transféré par

Droits d'auteur :

Formats disponibles

CUDA - Tutorial 1 - Getting Started | The Supercomputing Blog

The Supercomputing Blog

CUDA Tutorial 1 Getting Started

Step 1: Get the CUDA SDK

Step 2: Start a new project

Step 3: Understanding how the memory works

CUDA - Tutorial 1 - Getting Started | The Supercomputing Blog

Step 4: Using the high precision timer

Step 5: Copy data to and from the device

Next: Tutorial 2 The Kernel Back to CUDA Tutorials

Vous aimerez peut-être aussi