Académique Documents
Professionnel Documents
Culture Documents
Outline
Introduction GPU Architecture
Multiprocessing Vector ISA
GPUs in Industry
Scientific Computing Image Processing Databases
Introduction
GPUs have evolved to the point where many real world applications are easily implemented on them and run significantly faster than on multi-core systems. Future computing architectures will be hybrid systems with parallel-core GPUs working in tandem with multi-core CPUs.
- Prof. Jack Dongarra, director of the Innovative Computing Laboratory at the University of Tennessee Author of LINPACK
GPU Architecture
Parallel Coprocessor to conventional CPUs
Implement a SIMD structure, multiple threads running the same code.
Shared memory and registers; shared control logic Thread Block Multiprocessor Global memory, can be easily distributed across devices Grid Device(s)
GPU Architecture
Processors also implement vector instructions
Vectors of length 2,3,4 of any fundamental type
integer, float, bits, predicate
To encourage uniform execution, rather than branching for conditionals, use predicates
All instructions can be conditionally executed based on predicate registers
.reg .reg
.s32 .pred
a, b; p;
@p
GPUs in Industry
Many applications have been developed to use GPUs for supercomputing in various fields
Scientific Computing
CFD, Molecular Dynamics, Genome Sequencing, Mechanical Simulation, Quantum Electrodynamics
Image Processing
Registration, interpolation, feature detection, recognition, filtering
Data Analysis
Databases, sorting and searching, data mining
When parallelized on a GPU using multigrid solvers, 10x speedups have been reported
Molecular Dynamics
Large set of particles with forces between them protein behavior, material simulation
Calculating forces between particles can be done in parallel for each particle Accumulation of forces can be implemented as multilevel parallel sums
Genetics
Large strings of genome sequences must be searched through to organize and identify samples
GPUs enable multiple parallel queries to the database to perform string matching Again, order of magnitude speedups reported
Electrodynamics
Simulation of electric fields, Coulomb forces Requires iterative solving of partial differential equations Cell phone modeling applications have reported 50x speedups using GPUs
Image Processing
Medical Imaging was the early adopter
Registration of massive 3D voxel images Both the cost function for deformable registration and interpolation of results are filtering operations
Generic feature detection, recognition, object extraction are all filters For object recognition, one can search a database of objects in parallel Running these algorithms off the CPU can allow real-time interaction
Data Analysis
Huge databases for web services require instant results for many simultaneous users Insufficient room in main memory, disk is too slow and doesnt allow parallel reads GPUs can split up the data and perform fast searches, keeping their section in memory
In practice, the overhead of uploading and downloading from the GPU is far less than the performance gained in the kernel
Conclusions
Certain classes of problem appear in many different fields, and involve very data-parallel operations such as filtering, sorting, or integration Taking advantage of the architecture decisions behind graphics processing units such as their multiprocessing and native vector operations, these problems can be solved quickly and cheaply
References
1. Ziegler, Grenot. Introduction to the CUDA Architecture. [Online] 2009. http://www.cse.scitech.ac.uk/disco/workshops/200907/Day1_01_Intro_CUDA_Architecture.pdf. 2. NVIDIA Corporation. NVIDIA Compute PTX: Parallel Thread Execution ISA Version 1.1. 2007. 3. Gddeke, Dominik. Fast and Accurate Finite-Element Multigrid Solvers for PDE Simulations on GPU Clusters. Berlin : Logos Verlag, 2010. 978-3-8325-2768-6. 4. Accellerating molecular modeling application swith graphics processors. John E Stone, James C Phillips, Peter L Freddolino, David J Hardy, Leonardo G Trabuco, and Klaus Schulten. 2007, Journal of Computational Chemistry, pp. 28:2618-2640. 5. Michael C Schatz, Cole Trapnell, Arthur L Delcher, and Amitabh Varshney. High-throughput sequence alignment using Graphics Processing Units. s.l. : BMC Bioinformatics, 2007. 6. ANSYS, Inc. ANSYS Unveils GPU Computing for Accelerated Engineering Simulations. [Online] 2010. http://investors.ansys.com/releasedetail.cfm?releaseid=509436. 7. Warburton, Tim. Parallel Numerical Methods for Partial Differential Equations. Rocky Mountain Mathematics Consortium. [Online] 2008. http://www.caam.rice.edu/~timwar/RMMC/gpuDG.html. 8. Ansorge, Richard. AIRWC : Accelerated Image Registration With CUDA . BSS Group, Cavendish Laboratory, University of Cambridge UK. 2008. 9. N. Cornelis, L. Van Gool. Fast Scale Invariant Feature Detection and Matching on Programmable Graphics Hardware. s.l. : CVPR 2008 Workshop, 2008. 10. Andrea DiBlas, Tim Kaldewey. Data Monster: Why graphics processors will transform database processing. IEEE Spectrum. [Online] 2009. http://spectrum.ieee.org/computing/software/data-monster/0. 11. Podlozhnyuk, Victor. Image Convolution with CUDA. [Online] 2007. http://developer.download.nvidia.com/compute/DevZone/C/html/C/src/convolutionSeparable/doc/convolutionSeparable.pdf. 12. Goodnight, Nolan. CUDA/OpenGL Fluid Simulation. [Online] 2007. http://new.math.uiuc.edu/MA198-2008/schaber2/fluidsGL.pdf.
Questions?