Vous êtes sur la page 1sur 21

CUDA 4.

The ‘Super’ Computing Company


© NVIDIA Corporation 2011
© NVIDIA Corporation 2011
malloc’d

Extra allocation and extra copy required Just register and go!
malloc(a)
cudaMallocHost(b)
memcpy(b, a) cudaHostRegister(a)
cudaMemcpy() to GPU, launch kernels, cudaMemcpy() from GPU
memcpy(a, b)
cudaFreeHost(b) cudaHostUnregister(a)

© NVIDIA Corporation 2011


© NVIDIA Corporation 2011
Data Structures Algorithms

• thrust::device_vector • thrust::sort
• thrust::host_vector • thrust::reduce
• thrust::device_ptr • thrust::exclusive_scan
Etc. Etc.
© NVIDIA Corporation 2011
© NVIDIA Corporation 2011

© NVIDIA Corporation 2011


Share GPUs across multiple threads
Single thread access to all GPUs
Auto Performance Analysis
No-copy pinning of system memory Peer-to-Peer Access C++ Debugging
New CUDA C/C++ features Peer-to-Peer Transfers GPU Binary Disassembler
Thrust templated primitives library cuda-gdb for MacOS
NPP image/video processing library Unified Virtual Addressing
Layered Textures

© NVIDIA Corporation 2011


GPUDirect™

Direct access to GPU memory for 3rd


party devices
Peer-to-Peer memory access,
Eliminates unnecessary sys mem transfers & synchronization
copies & CPU overhead
Supported by Mellanox and Qlogic Less code, higher programmer
Up to 30% improvement in productivity
communication performance

Details @ http://www.nvidia.com/object/software-for-tesla-products.html
© NVIDIA Corporation 2011
NVIDIA GPUDirect™ v2.0

GPU1 GPU2
Memory Memory 1. cudaMemcpy(GPU2, sysmem)
2. cudaMemcpy(sysmem, GPU1)
System
Memory

CPU
GPU1 GPU2

Chip
set

© NVIDIA Corporation 2011


NVIDIA GPUDirect™ v2.0:

GPU1 GPU2
Memory Memory 1. cudaMemcpy(GPU2, GPU1)

System
Memory

CPU
GPU1 GPU2

Chip
set

© NVIDIA Corporation 2011


© NVIDIA Corporation 2011
System GPU0 GPU1 System GPU0 GPU1
Memory Memory Memory Memory Memory Memory

CPU GPU0 GPU1 CPU GPU0 GPU1

© NVIDIA Corporation 2011


Separate options for each permutation One function handles all cases

cudaMemcpyHostToHost
cudaMemcpyHostToDevice cudaMemcpyDefault
cudaMemcpyDeviceToHost (data location becomes an implementation detail)
cudaMemcpyDeviceToDevice

© NVIDIA Corporation 2011


Share GPUs across multiple threads
Single thread access to all GPUs
Auto Performance Analysis
No-copy pinning of system memory Peer-to-Peer Access C++ Debugging
New CUDA C/C++ features Peer-to-Peer Transfers GPU Binary Disassembler
Thrust templated primitives library cuda-gdb for MacOS
NPP image/video processing library Unified Virtual Addressing
Layered Textures

© NVIDIA Corporation 2011


© NVIDIA Corporation 2011
Now available for both Linux and MacOS

automatically updated in DDD

Breakpoints on all instances


of templated functions Fermi
disassembly
(cuobjdump)

C++ symbols shown


in stack trace view

Details @ http://developer.nvidia.com/object/cuda-gdb.html © NVIDIA Corporation 2011


© NVIDIA Corporation 2011











© NVIDIA Corporation 2011


“Don’t kid yourself. .” said Frank Chambers, a GTC
conference attendee shopping for GPUs for his finite element analysis work. “What we are seeing

artificial retinas possible, and that wasn’t predicted to happen until 2060.”
GPU Technology Conference 2011
October 11 -14 | San Jose, CA
The one event you can’t afford to miss





www.gputechconf.com

Vous aimerez peut-être aussi