Vous êtes sur la page 1sur 2

CAN GRAPHICS PROCESSING UNITS BE USED TO IMPROVE VIDEO PROCESSING SYSTEMS?

Ben Cope Department of Electrical and Electronic Engineering, Imperial College London, UK, email: benjamin.cope@imperial.ac.uk
1. INTRODUCTION A combination of higher denition video and transform complexity, commands a continued improvement in performance from video processing systems. Consumer demand for even better video quality in the future will further negotiate an increase in computation complexity and video resolution. Even if other factors are kept constant, the increased resolution alone gives a more demanding throughput requirement. Throughput, in this context, is how many pixels must be clocked in and out of a system per second. That is also how many pixels, on average, must be processed per second. FPGAs are currently used to implement video processing transforms and can provide a throughput rate that meets the high resolution dictated by current video standards. An example is Sonic-on-a-Chip [1] which is capable of processing XVGA format video in real-time. XVGA requires a throughput rate of 75 million pixels per second (MP/s). The FPGAs high performance is achieved through exploiting temporal and spatial parallelism, bit width optimisation, efcient data handling and exible memory access patterns. Architectural improvements such as the addition of embedded memory blocks and multipliers have further improved the processing capability of modern FPGAs. The goal of this work is to analyse grapics processing units and contrast their benet with architectural features of the FPGA. 2. RESEARCH AREA Current video processing systems will need further improvement to meet more demanding throughput requirements in the future. My current research investigates the graphics processing unit (GPU), to see what benets it may contribute to video processing systems. The GPU is chosen specically to explore how the impressive computational power it exhibits for computer graphics can be harnessed for video processing. Graphics hardware, of which the GPU is the processing core, is driven by a large consumer demand for more realistic and impressive video games. Its performance is currently growing at a rate of 2 to 2.5 times per year [2], which exceeds Moores Law. This has led to a growing interest in using the GPU for general purpose applications [3]. The architecture of the GPU, as shown in Fig. 1, is designed for graphics rendering. It contains vertex and fragment processors which are efcient at independent processing of vertices and fragments (pixels) respectively. The fragment processor is most relevant to this work, as this is where per-pixel processing is performed. Multiple pipelines in the fragment processor, of which there are sixteen in Fig. 1, exploit spatial parallelism. The processor instruction set is optimised for operations found in graphics applications. Many of these operations are also common to video processing algorithms. Included in the instruction set are many vector and matrix arithmetic instructions. The texture cache which accesses the DRAM through a memory partition is also relevant. It is through this cache that frame pixel data can be fetched for computations such as convolution. The cache is designed for regular memory access patterns, which occur in texture mapping. Other graphics specic features of the data path, such as background culling, z-compare and blend are currently disregarded. For a full explanation of the GPU architecture refer to [4]. To address the research goals the architectures of the GPU and FPGA are compared. Video processing covers a wide range of applications, this work focuses on processing broadcast video. It is hoped that the results of this comparison can be used to direct suggestions for improvements to video processing systems. 3. SUPPORTING FINDINGS Research conducted to date shows exciting potential for the processing potential of the GPU, for application to broadcast video transforms [5]. The best performance of the GPU is seen in transforms which require low memory access and instructions well supported by the GPU instruction set. That is, instructions which are also found in graphics applications. Branching is an example of an instruction which is not

1-4244-0 312-X/06/$20.00 c 2006 IEEE.

Authorized licensed use limited to: QUAID E AWAM UNIV OF ENGINEERING SCIENCE AND TECH. Downloaded on November 5, 2009 at 04:23 from IEEE Xplore. Restrictions apply.

Host (CPU) Vertex Processing Cull / Clip / Setup Z-cull Texture and Fragment Processing Fragment Crossbar Z-Compare and Blend Memory Partition DRAM(S) Memory Partition DRAM(S) Memory Partition DRAM(S) Memory Partition DRAM(S) Rasterization

Texture Cache

Fig. 1. Diagram of the GeForce 6800 Series GPU internal structure [4] well supported. This is due to the SIMD processing model of the fragment processor [4]. An application which demonstrates the promising performance is primary colour correction. This can be processed at over 100 million pixels per second [5] satisfying the requirement of a target HDTV example of 63 MP/s. A limitation of the GPU has been identied as applications requiring many memory accesses per pixel. The GPU processes all pixels independently and therefore cannot share previously accessed data between pixels. This is exemplied through the application of 2D convolution, which has an exponential decrease in performance as dimensionality n increases [5]. In an FPGA implementation of 2D convolution line buffers can be used to exploit data reuse. This saves a large number of memory accesses at a cost of area. This is one scenario where the exible FPGA architecture is benecial over the xed GPU pipeline. 4. CONCLUSION AND FUTURE WORK Current and ongoing work is on the separation of the ne and coarse grained architectural benets of the GPU for video processing. The data path is a coarse grained feature which is very important in any design. The static GPU data path, shown in Fig. 1, is to be compared to a custom designed FPGA data path. This is expected to produce interesting results on the suitability of the GPU data path to video processing. From carefully chosen implementation examples the benet from the ne grained features, such as oating point units, will also be obtained. The aim is to discover the level to which each of the GPU architectural features contributes to its high performance [3, 5]. In conclusion the GPU has been presented as an architecture where both coarse and ne grained features contribute to its promising performance for video processing. Current and future work aim to extract the relative importance of each of these features. The overall research objective is to identify, by considering the GPU architecture, improvements that can be made to current broadcast video processing systems.
5. REFERENCES [1] P. Sedcole, P. Cheung, G. Constantinides, and W. Luk, A recongurable platform for real-time embedded video image processing, in Proc. IEEE Field Programmable Logic, vol. 1, Dec. 2003, pp. 606615. [2] D. Manocha, General-purpose computations using graphics processors, in Computer, vol. 38, no. 8, Aug. 2005, pp. 85 87. [3] GPGPU, General purpose computation using graphics hardware, [webpage]. http://www.gpgpu.org, 2003. [4] M. Pharr and R. Fernando, GPU Gems 2. 2005. Addison Wesley,

[5] B. Cope, P. Cheung, W. Luk, and S. Witt, Have gpus made fpgas redundant in the eld of video processing? in Proc. IEEE Field Programmable Technology, vol. 1, Dec. 2005, pp. 111118.

Authorized licensed use limited to: QUAID E AWAM UNIV OF ENGINEERING SCIENCE AND TECH. Downloaded on November 5, 2009 at 04:23 from IEEE Xplore. Restrictions apply.

Vous aimerez peut-être aussi