Académique Documents
Professionnel Documents
Culture Documents
5/20/11 12:09 AM
Note that malloc is being used instead of the C++ operator new. This is because the program is written in C. Dont worry, it is possible and easy to work with CUDA and C++ in the same program, but this will be covered in a later tutorial. To allocate memory on the device, its important to call cudaMalloc(void **ppData, int numBytes).
http://supercomputingblog.com/cuda/cuda-tutorial-1-getting-started/
Page 1 of 2
5/20/11 12:09 AM
In the code above, two data arrays are copied to the device. The kernel is then executed. After the kernel is executed, the results which still reside in the GPU memory must be copied back to the host memory. Notice the interesting syntax for calling the kernel. When the host calls a CUDA kernel function, many threads are spawned. However, we need to specify how many threads are spawned, and how those threads are organized. This will be discussed in the CUDA kernel tutorial. Also notice the last argument in the cudaMemcpy function. This controlls whether data is being sent from the host machine to the CUDA device, or vice versa. It is also possible to use this function to copy data from the CUDA device to another location on the same CUDA device. This concludes this tutorial. We covered how to use the CUDA precision timer, how memory must be allocated both on the device, and on the host machine, and finally, how to copy data to and from the device. Proceed to the next tutorial to learn how to write a kernel, and how threads are organized when executing a kernel. The next tutorial will also present some results to you showing just how fast CUDA functions can be when compared to doing the same calculations on a CPU.
http://supercomputingblog.com/cuda/cuda-tutorial-1-getting-started/
Page 2 of 2