High Level Programming For FPGA Based Image and Video Processing Using Hardware Skeletons

High Level Programming for FPGA Based Image and Video Processing using Hardware Skeletons
K Benkrid, D Crookes, J Smith and A Benkrid School of Computer Science, The Queens University of Belfast, Belfast BT7 1NN, UK VisiCom division of Titan corp., 10052 Mesa Ridge Court San Diego, CA, 92121 USA
Abstract
In this paper, we present a new approach to developing a general framework for efficient FPGA based Image Processing algorithms. This approach is based on the new concept of Hardware Skeletons. A hardware skeleton is a parameterised description of a task-specific architecture, to which the user can supply parameters such as values, functions or even other skeletons. A skeleton contains built-in rules that will apply optimisations specific to the target hardware at the implementation phase. The framework contains a library of reusable skeletons for a range of common Image Processing operations. The library also contains high level skeletons for common combinations of basic image operations. Given a complete algorithm description in terms of skeletons, an efficient hardware configuration is generated automatically. We have developed a library of hardware skeletons for common image processing tasks, with optimised implementations specifically for Xilinx XC4000 FPGAs. This paper presents and illustrates our hardware skeleton approach in the context of some common image processing tasks, based on an implementation on VISICOMs VigraVisionTM FPGA based video board.
1. Introduction
Many modern Image Processing (IP) applications (such as processing video and very large images) are so computationally demanding that special purpose hardware solutions need to be considered. Reconfigurable hardware in the form of FPGAs can offer the performance advantages of a custom hardware solution, while their inherent reprogrammability feature makes them multi-purpose and reusable. However, a big disadvantage is the low level, hardware-oriented programming model needed to fully exploit the FPGAs potential performance. Despite the great amount of research done on FPGAs, many FPGA-based applications have been algorithm specific. An environment for developing applications needs more than just a library of static FPGA configurations, perhaps parameterisable (e.g. in
terms of input data wordlength), since it should allow the user to experiment with alternative algorithms and develop his/her own algorithms. There is a need for bridging the gap between high level application-oriented software and low level FPGA hardware. Many behavioural synthesis tools have been developed to satisfy this requirement [1][2][3]. These tools allow the user to program FPGAs at a high level (e.g. in a C-like syntax) without having to deal with low level hardware details (e.g. scheduling, allocation, pipelining etc.). However, although behavioural synthesis tools have developed enormously, structural design techniques often still result in circuits that are substantially smaller and faster than those developed using only behavioural synthesis tools [4][5]. This paper presents a framework for developing efficient hardware solutions specifically for image processing applications. This framework gives the benefits of an application-oriented, high level programming model, but does not sacrifice significantly the performance of the solution. Our approach to this is to use a concept which has proved relatively successful in developing software for parallel machines, namely skeletons [6][7][8]. Skeletons are reusable, parameterised fragments or frameworks to which the user can supply components (e.g. functions). It is common for skeletons to include functions as parameters which are applied by the skeleton to a data set. The implementation of a skeleton is normally optimised for a specific target machine. In this paper we introduce the concept of hardware skeletons. A hardware skeleton is a parameterised description of a task-specific architecture, to which the user can supply parameters such as values, functions (parameterised functional blocks) or even other skeletons. Certain combinations of basic skeletons can form the basis of additional, higher level skeletons. To present the concept, the paper first identifies a useful high level model for describing image processing operations. The common basic tasks, which we identify, will form the basis of a Hardware Skeleton Library. Next, we outline the strategy which the system employs to generate efficient FPGA configurations from a given operation description. A layered implementation of the hardware skeleton library is then presented. Finally, the
Proceedings of the 9 Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM01) 0-7695-2667-5 $20.00 2001 IEEE
th
implementation of our system on a commercial FPGA video board is presented.
Examples include histogramming and cumulative histogramming. The properties of an item of data (represented by an edge in the DAG) are of two kinds: Data type This is defined by two properties: - Structure: could be an image, a vector or a scalar. - Pixel type: which, for the purpose of this work, could be either an integer or a boolean. Data representation A data representation is defined by three properties: - The data could be in bit serial, or in bit parallel with an associated word size or, in digit serial representation, with a particular digit and word sizes. - If data is in bit serial (or digit serial), it can then be processed either MSB (or MSD) First or LSB (or LSD) First. - Number System which, for the purpose of this work, could be one of unsigned integer, 2s complement, or Signed Digit (SD) number representation [10]. Note that Binary representation corresponds to bit parallel with a word size one (denoted as parallel(1)). A node with a particular set of logical Inputs/Outputs could be implemented by a range of different possible implementations as illustrated, for example, for the Absolute value operation in Figure 2. It is normal (but not compulsory) for the input and output representations to be the same.
Int Bit Parallel Bit Serial 2s Bit Serial SD, MSDF complement, MSBF 2s complement
2. A high level model for IP operations

Many image processing algorithm can be described in terms of a Directed Acyclic Graph (DAG), where vertices represent IP tasks, and the directed edges represent the data flow.
Input1 (e.g. image) Input2 (e.g. image) IP tasks
Output (e.g. image, histogram etc.)
Figure 1. A hypothetical image processing algorithm modelled as a DAG Nodes are typically simple tasks such as adding two input images, or an image convolution. Common IP tasks can be classified in terms of the locality of their data access requirements into three categories: Point operations: The same operation is applied to each individual pixel of one or many source images to produce a corresponding result pixel in the new image. These include: relational operations (e.g. , , =), arithmetic operations (e.g. +, -, *, ), logical operations (e.g. AND, OR) and Look-Up tables. The operation could either be between two images or between an image and a scalar value. Neighbourhood operations: In neighbourhood operations, a new pixel value is calculated using only the pixel values in the neighbourhood of the original pixel and the weights in a window (e.g. convolution). This is done for all image pixels, and results in a new image. Neighbourhood operations are completely defined by a local operation between corresponding pixels and window values (e.g. multiplication), a global operation (e.g. accumulation) which reduces the window of intermediate results to a single result pixel, and a window (with given shape and coefficients) [9]. Global operations: These operations operate globally on the whole image. We can distinguish two common types of simple global operations: - Reduction to Scalar (RS): These operate on the whole image to produce a scalar as a result. Examples include count, global maximum, global minimum and global accumulation (). - Reduction to Vector (RV): This operation operates on the whole image to produce a vector as a result.
Absolute value
Absolute value
Absolute value
Absolute value
Int (a)
Bit Serial Bit Serial 2s Bit Parallel SD, MSDF complement, MSBF 2s complement (c) (b) (d)
Figure 2. A DAG node (a) with several possible implementations (b), (c) and (d) The Hardware Skeleton Library contains parameterised descriptions of architectures not only for the full range of basic operations (nodes), but possibly with different versions for different data representation combinations.
3. Implementation strategy
The users first task will be to represent the algorithm in terms of a DAG, without initially being concerned with data type or data representation considerations. Once this is done, an analysis of the properties of the input and output data formats of the nodes will identify a range of possible implementations of each node. For
th
instance, the result of an N-bit integer image comparison operation could be either an N-bit integer image or a (1bit) binary image. The choice will depend on subsequent processing of the result image, and on what skeletons are available. As a first step, the set of all possible implementations should first be considered by the user. The library of Hardware Skeletons (e.g. neighbourhood operations, point operations, etc.), in which each component has a set of different implementations (e.g. bit serial, bit parallel), is the basis of this phase. The implementations of the library components are optimised for specific target architectures (e.g. bit parallel adder units based on dedicated fast carry logic on Xilinx 4000). The range of possible implementations generated for a particular algorithm depends on the extent of this library. To select the optimum skeleton from the set of possible choices, the cost of each choice needs to be found, in terms of speed and area. This involves estimating the expected performance or effectively generating the FPGA configuration for each option, including the application of the optimisations for each skeleton. This cost based analysis enables the user to settle on a final DAG with all attributes (data type and representation) defined. The corresponding FPGA implementation is finally generated, in the form of EDIF netlist, for the chosen solution. This is performed by a Prolog-based [11] Hardware Description Environment, called HIDE4k, developed at Queens University Belfast [5, 12, 13]. This enables highly scaleable and parameterised component descriptions to be written, and generates pre-placed configurations in EDIF format for Xilinx XC4000 series [14]. The resulting EDIF file is finally fed to Xilinx Placement and Routing (PAR) tools to generate the FPGA configuration bitstream (see Figure 3). Note that during the process of implementing a DAG, the following issues arise: Data representation conversion Since many data representations might be used within the DAG, data representation converters may be needed to convert between different representations (e.g. from bit serial to bit parallel, or from Signed Digit to twos complement etc.) Data synchronisation When there are two or more inputs to a DAG node (vertex), any branch that arrives earlier than the others
Hardware Skeleton Library
should be forced to wait for the slowest branches by adding appropriate delays to the fastest branches. This is performed automatically by our system so that the user does not have to deal with low level data synchronisation issues. As a result, the users programming model is essentially the set of hardware skeletons provided by the Hardware Skeleton Library. These skeletons can be accessed either textually or, even more conveniently, via a GUI.
4. Implementing the Skeleton Library

We have implemented our Hardware Skeleton Library as a hierarchy of three levels of hardware block descriptions. At the bottom level is the arithmetic library (see figure 4). This provides basic arithmetic units (e.g. adders, multipliers) parameterised for different number representations (e.g. bit serial, bit parallel, 2s complement, unsigned etc.). Immediately on top of this level, we find the basic image operations library. The latter provides implementations for the basic image operations presented in section 2 above (e.g. basic neighbourhood operations). Finally, the top level provides implementations for high level (compound) skeletons.
To Image Processing Application Developer
High Level (compound) Skeletons library
Basic Image Operations Library (e.g. neighbourhood operations)
Arithmetic Cores Library
Figure 4. Hierarchical implementation of the Hardware Skeleton Library The following section considers each of these three levels in more detail.
HIDE4k System
Xilinx XC4000 FPGA
Bitstream DAG with logical data types Solution generation Cost Based Analysis Code Generator Xilinx PAR tools
Optimisation
EDIF
A DAG set of available implementations
A DAG set of optimised implementations
DAG with specific data representation choices
Figure 3. Implementation strategy
th
4.1 Arithmetic cores library

This library provides the basic building blocks required for image processing operations (and signal processing in general). It includes adders, multipliers, dividers, shifts and delays. The basic functions required for nearly any signal processing operation include addition/subtraction, shifts and delays. These blocks can then be used to construct the more complicated structures such as multipliers, dividers and maximum/minimum selectors. Versions of these cores are provided for different number representations. At the time of writing, the following number representations are supported: Bit parallel (N bits) Bit serial Bit serial Bit serial 2s complement 2s complement 2s complement Signed Digit MSBF LSBF MSDF
The architecture of a generic PxQ neighbourhood operation (with a local operation L and a global one G) requires (Q-1) line buffers, PxQ replicated local operation blocks, and a single PxQ-input global operation block, implemented as a tree of two-input reduction blocks when bit parallel arithmetic is used, as shown in figure 5.
L
Pixel buffers
G
L L
Line BufferQ-1
G
L
G G
L
The implementation of these cores is optimised for a specific target architecture (Xilinx XC4000 FPGAs for our particular case study). This should take advantage of the particular features of the target architecture (e.g. 4 input LUTs, synchronous RAMs, dedicated fast carry logic for XC4000). The core descriptions are held in HIDE4k with rules for core-specific optimisations as part of the core. For instance, a constant coefficient multiplication will apply CSD coding of the multiplier coefficient to reduce the consumed hardware [15][16]. Such optimisations are often not performed by behavioural synthesis tools.
Line Buffer2
G
L
G
L
Line Buffer1
P
G
L
4.2 Basic image operations library

This library provides implementations of the basic image operations presented in section 2. Consider the case of basic neighbourhood operations. As mentioned in section 2, a neighbourhood operation is defined in terms of a local and global operation. Local operations include multiplication and addition. Global operations include accumulation, maximum and minimum. These form the five basic Image Algebra neighbourhood operations as shown in Table 1 [9]. Neighbourhood Operation Local Op. GlobalOp. Convolution * Multiplicative maximum * Max Multiplicative minimum * Min Additive maximum + Max Additive minimum + Min Table 1. Image Algebra core operation set Figure 5. A general 2D PxQ neighbourhood operation This architecture is parameterisable or scaleable in terms of [17]: - The window size (PxQ) - The window coefficients - The image size (line buffer size LB) - The pixel wordlength - The local and global operations (L and G) - The number representation (arithmetic type) A generic description of a neighbourhood operation would then be given by:
neighbourhood_op(Arithmetic_type, Local_op, Global_op, Window, Pixel_wordlength, Image_Size)
Our HIDE4k system is capable of generating pre-placed FPGA architectures in EDIF format from such generic description. A ~30K line EDIF description is generated in 1~2 sec. The resulting architectures are tailored to the particular neighbourhood operation in hand (e.g. specific window coefficients). Their performance (speed
th
and area) rivals those obtained with a careful hand design [5]. A common skeleton used at this level is the reduce skeleton which reduces a set of N inputs into one result using a tree of 2-operands operations (Op) as shown in Figure 6.
Op Op Op
Op_description = pipeline( [Op1_desc, Op2_desc,, OpN_desc] )
(1)
where OpI_desc {I = 1,2,, N} is the high level description of each operation in the pipeline. For instance, an Open operation (see Figure 8) applied to 256x256 images of 8-bits/pixel, using 2s complement bit parallel arithmetic, would be described by:
Open = pipeline( [neighbourhood(tc_par, add, min, [[0,0,0],[0,0,0],[0,0,0]], 8, 256), neighbourhood(tc_par, add, max, [[0,0,0],[0,0,0],[0,0,0]], 8, 256)] )
Op Op Op Op
Additive Minimum
0 0 0 0 0 0 0 0 0
Additive Maximum
0 0 0 0 0 0 0 0 0
Figure 8. Open operation Parallel skeleton A number of common image processing algorithms comprise several concurrent neighbourhood operations (simple or compound) which share the same input image, and whose templates have the same size and shape. The results of these parallel operations then used in a reduce operation. Sobel, Prewitt, Roberts and Kirsch edge detectors [18], are examples of such operations.
Parallel_opN
Op
Op Op Op
Figure 6. Reduce skeleton At the time of writing, the supported 2-input reduction operations (Op) are addition (+), subtraction (-), maximum (max) and minimum (min).
Par_desc2
Parallel_opN-1
4.3 High level (compound) skeletons library

This library contains efficient implementations of a set of compound skeletons. These compound skeletons result from the process of identifying, by experience, common ways of assembling primitive operations and providing optimised implementations of these. To demonstrate this concept, we will present two examples of such compound skeletons. Pipeline skeleton In this type of operations, two or more IP operations are cascaded in a pipeline as shown in Figure 7. The input of each pipeline stage is provided by the output of the previous pipeline stage.
Op
Parallel_op2
Op
Par_desc1
Parallel_op1
Figure 9. Parallel skeleton The high level description (Par_desc) of this operation would be defined as follows:
Par_desc = Op(Par_desc1, Par_desc2)
(2)
Operation 1
Operation 2
Operation N
Figure 7. Pipeline skeleton This structure is described by:
Par_desc1 and Par_desc2 are defined either recursively as compound operations of the form (2) itself, or as (terminal) pipeline skeletons of the form (1). Note that the prefixes + and - can, for readability, be written in infix form. For instance, a Sobel operation (see Figure 10) will be described by:
th
pipeline([neighbourhood(tc_par, mult, accum, [[1,2,1],[0,0,0],[-1,-2,-1]], 8, Buf_size), abs]) + pipeline([neighbourhood(tc_par, mult, accum, [[1,0,-1],[2,0,-2],[1,0,-1]], 8, Buf_size), abs])
convolution
1 ~ -1 2 ~ -2 1 ~ -1
Absolute operation
+
convolution
-1 -2 -1 ~ ~ ~ 1 2 1
Absolute operation
Figure 10. Sobel operation In such operations, only one set of line and pixel buffers is needed to synchronise the supply of pixels for all parallel operations instead of allocating separate line buffers for each neighbourhood operation. This is because all neighbourhood operations are applied to the same image. Note that the skeletons (1) and (2) can be nested to any depth and interchangeably (FPGA area permitting).
need to generate a bit clock from the pixel clock. The bit clock frequency is N times the pixels clock (for an N-bit pixel). For practical real time video processing, the luminance pixel sampling rate is 13.5 MHz. This implies a bit clock frequency of 108 MHz for 8-bit length pixel processing, and 216 MHz for 16-bit length pixel processing. The XC4013E-3 cannot operate at these frequencies. Thus the architectures used will be implemented from bit parallel-based skeletons. Note that a trade-off in the form of digit serial arithmetic is still possible. However, this implies additional hardware for the digit clock frequency generation, and extra care for data synchronisation. A parallel implementation is easier to implement and can be efficiently implemented using dedicated fast carry logic [16]. Due to the limited memory resources on the FPGA chip, the line buffers have been implemented using the off-chip DRAM. Part of the FPGA is configured as an interface to the onboard DRAM (FIFOs), while the other part is configured to perform the required image processing operation as shown below: FPGA chip
5. Hardware implementation on a commercial FPGA based video board

To assess the effectiveness of our skeleton-based approach, we have implemented our system on a commercial FPGA based video processing board. The latter is a single slot PCI card which combines video acquisition, FPGA based real-time processing, and display [19]. A functional block diagram of the VigraVision board is given by figure 11.
VigraVision Board
RGB Monitor
Line buffers
(DRAM)
DRAM interface
Output video stream
Input video stream
Figure 12. Block diagram of the FPGA chip configuration
Camera
Image acquisition block
FPGA based Video Processor
Image display block
DRAM Host PCI bus
Figure 11. Block diagram of the VigraVision video board Bit parallel arithmetic has been chosen to implement the IP operations on the onboard FPGA (XC4013E-3). This choice is motivated by the fact that bit parallel architectures often lead to a better time-hardware product than bit serial ones. This is mainly due to the existence of dedicated fast carry logic on Xilinx FPGAs [5]. However, in the context of processing real time video, the VigraVision board influences the choice of the arithmetic. If bit serial arithmetic is to be used, there is a
If the user wants to generate a complete configuration, including all the low level hardware details, he or she merely has to provide the required high level description. The latter description must conform to the format in (1) and (2) and can be input textually or even graphically. Based on the skeleton library presented above, our HIDE4k system is capable of generating the corresponding efficient FPGA configuration in seconds in the form of an EDIF netlist. Due to the irregularity of the resulting architectures, the generated EDIF netlist is only partially placed. Once the EDIF description is generated, it is then fed to the Xilinx PAR tools to generate the FPGA configuration bitstream. This may take a long time (~1hr on a Pentium 233 running Windows 95 with 32M of RAM). This is partly due to the fact that the EDIF netlist is only partially placed. Another reason is the small area of the target FPGA (24x24 CLBs only). Nonetheless, the whole process is transparent to the user. At the application level, the user interfaces to the VigraVision board through a C-callable library (VigraVision ToolBox- VTB DLL). The ToolBox includes hardware initialisation and register control functions, image acquisition functions and image processing functions. For instance, the application
th
developer downloads a particular FPGA configuration by invoking the following function: u_loadXilinx(Xilinx_Chip_ID, Configuration_ filename) He or she can then copy the processed image from the video buffer to the host processor, for further analysis, by: u_readRect(LPRECT lpImgRect, LPVOID imgData) where lpImgRect specifies the rectangle in the frame buffer where data is to be read from, and imgData is a pointer to the image transferred to the host memory. The Tool Box is supported in Microsoft Visual C/C++ 5.0. Its functions can be accessed by application software which has been linked with the VTB import library and accessed via Windows as a DLL library. The resulting video coprocessor (see Figure 13) is based on a library of bitstream configurations ready to use (download to the FPGA) from a high level language (VC++ in our case). This library is extensible over time using our HIDE4k system. Thanks to our skeleton oriented approach, this task is relatively easy to perform and requires little FPGA hardware knowledge. This has been illustrated in this paper by two high level skeletons. Other skeletons can be designed using a similar approach and added to the library.
6. Conclusion
In this paper, we have presented a framework for FPGA based Image Processing. Central to this framework is the Hardware Skeleton Library which contains a set of high level descriptions of task-specific architectures specifically optimised for Xilinx XC4000 FPGAs. Although extensible, the library is based on a core level containing the operations of Image Algebra. The library also contains high level skeletons for compound operations, whose implementations include task-specific optimisations. Skeletons are parameterisable, and different skeletons for the same operation can be provided, for instance for different arithmetic representations. This gives the user a range of
implementation choices. This in turn supports experimentation with different implementations and choosing the most suitable one for the particular constraints in hand (e.g. speed and area). We are investigating the possibility of doing some of this experimentation automatically, but for now we do it manually. Given a complete algorithm description in terms of skeletons, an efficient hardware configuration is generated automatically by our system. Our approach was assessed successful by a real hardware implementation of a video coprocessor on a commercial FPGA based video board giving real time processing of video data. This video coprocessor allows for rapid generation of FPGA architectures from very high level, algorithmic, descriptions and opens the way to enabling image processing application developers to exploit the high performance capability of a direct hardware solution, while programming in an application-oriented model. Note that the skeleton oriented approach is not tied to a particular FPGA chip. Moreover, it may have some applicability for VLSI design. Furthermore, other application domains where there is an established algebra such as numerical processing can also benefit from the skeleton approach. Full system development will in practice inevitably hit the problem that some particular task is not readily expressed in terms of the skeletons currently in the library. It will always be necessary to have an ongoing process of skeleton development. This will of course require a skilled architecture designer, although less efficient solutions might be possible using existing skeletons; but the advantage of our approach is that system builders themselves do not require detailed hardware description skills. Future directions include upgrading the system to handle other FPGA series (particularly Xilinx Virtex chips). The extension of the hardware skeleton library, both in supporting more arithmetic types and providing other skeletons for more sophisticated image processing operations (wavelet transform in particular), is being investigated.
VTB library VC++ program
Extensible bitstream configurations library Xilinx PAR tools

EDIF Netlist
High Level IP operations descriptions
HIDE4k system
text OR
VigraVision PCI video board
Hardware Skeleton Library
Figure 13. Overall view of the VigraVision based video coprocessor
th
7. References
[1] Synopsys Inc., Behavioural Compiler, Software documentation, 1998. http://www.synopsys.com/products/beh_syn/ C Level Design Inc, C/C++ Synthesis System Compiler, Product overview, 1998. http://www.cleveldesign.com/products/ The Embedded Solutions Limited, Handel C information sheets, 1999 http://www.embeddedsol.com Hutchings B, Bellows P, Hawkins J, Hemmert S, Nelson B and Rytting M, A CAD suite for High-Performance FPGA design, FCCM99, Preliminary Proceedings. Benkrid K, Design and Implementation of a High Level FPGA Based Coprocessor for Image and Video Processing, PhD Thesis, Department of Computer Science, The Queen's University of Belfast, 2000. Cole M, Algorithmic Skeletons: structured management of parallel computation, MIT Press, 1989. Darlington J, Ghanem M, and To H W, 'Structured Parallel Programming', In Programming Models for Massively Parallel Programming Computers, IEEE Computer Society Press, pp. 160-169, Sept 1993. Michaelson G J, Scaife N R, and Wallace A M, 'Prototyping parallel algorithms in Standard ML', Proceedings of British Vision Conference, Sep 1995. ftp://ftp.cee.hw.ac.uk/pub/funcprog/msw.bmvc95.ps.Z Ritter G X, Wilson J N and Davidson J L, Image Algebra: an overview, Computer Vision, Graphics and Image Processing, No 49, pp 297-331, 1990.
[2]
[3] [4]
[5]
[6] [7]
[8]
[9]
[10] Avizienis A, Signed Digit Number Representation for Fast Parallel Arithmetic, IRE Transactions on Electronic Computer, Vol. 10, pp 389-400, 1961. [11] Clocksin W F and Melish C S, Programming in Prolog, Springer-Verlag, 1994 [12] Crookes D, Alotaibi K, Bouridane A, Donachy P and Benkrid A, An Environment for Generating FPGA Architectures for Image Algebra-based Algorithms, ICIP98, Vol.3, pp. 990-994, 1998. [13] Benkrid K, Crookes D, Bouridane A, Corr P and Alotaibi K, A High Level Software Environment for FPGA Based Image Processing, Proc. IPA'99, IEE Seventh International Conference on Image Processing and its Applications, Manchester, July 1999. pp. 112116. [14] Xilinx Ltd, XC4000E and XC4000X Series Field Programmable Gate Arrays -Product Specification, 1999. http://www.xilinx.com/partinfo/4000.pdf [15] Koren I, Computer arithmetic algorithms, PrenticeHall, Inc, pp. 99-126, 1993. [16] Benkrid K, Crookes D, Smith J, Benkrid A, 'High Level Programming for Real Time FPGA Based Video Programming', Proc. ICASSP'2000, IEEE International Conference on Acoustic, Speech and Signal Processing, Istanbul, June 2000. Volume VI, pp. 3227-3231. [17] Crookes D, Benkrid K, Bouridane A, Alotaibi K and Benkrid A, Design and Implementation of a High Level Programming Environment for FPGA Based Image Processing, IEE proceedings Vision, Image and Signal Processing, Vol. 147, No. 7, pp. 377-384. [18] Ross J, The Image Processing Handbook, CRC Press, 1995. [19] Visicom Laboratories, The VigraVision PCI video board: users manual, 1998. http://www.visicom.com
th

High Level Programming For FPGA Based Image and Video Processing Using Hardware Skeletons

Transféré par

Informations du document

Description originale:

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

High Level Programming For FPGA Based Image and Video Processing Using Hardware Skeletons

Transféré par

Droits d'auteur :

Formats disponibles

High Level Programming for FPGA Based Image and Video Processing using Hardware Skeletons

implementation of our system on a commercial FPGA video board is presented.

2. A high level model for IP operations

Output (e.g. image, histogram etc.)

4. Implementing the Skeleton Library

High Level (compound) Skeletons library

Basic Image Operations Library (e.g. neighbourhood operations)

Arithmetic Cores Library

Xilinx XC4000 FPGA

A DAG set of available implementations

A DAG set of optimised implementations

DAG with specific data representation choices

Figure 3. Implementation strategy

4.1 Arithmetic cores library

4.2 Basic image operations library

Op_description = pipeline( [Op1_desc, Op2_desc,, OpN_desc] )

4.3 High level (compound) skeletons library

Figure 7. Pipeline skeleton This structure is described by:

5. Hardware implementation on a commercial FPGA based video board

Output video stream

Input video stream

Figure 12. Block diagram of the FPGA chip configuration

Image acquisition block

FPGA based Video Processor

Image display block

DRAM Host PCI bus

VTB library VC++ program

Extensible bitstream configurations library Xilinx PAR tools

High Level IP operations descriptions

VigraVision PCI video board

Hardware Skeleton Library

Figure 13. Overall view of the VigraVision based video coprocessor

Vous aimerez peut-être aussi