Vous êtes sur la page 1sur 10

ARCam: an FPGA-based Augmented Reality Framework

Joo Paulo Lima


Joo Marcelo Teixeira

Germano Guimares
Emanoel Xavier

Guilherme Silva

Veronica Teichrieb

Judith Kelner

Centro de Informtica Universidade Federal de Pernambuco (UFPE)


Av. Prof. Moraes Rego S/N, Prdio da Positiva, 1 Andar,
Cidade Universitria, 50670-901, Recife, Pernambuco

{jpsml, gfg, gds, jmxnt, vt, jk} @cin.ufpe.br


ABSTRACT
Several solutions for Augmented Reality that use general purpose
devices, like computers and handhelds, have recently been
developed. In such devices, the processing is done by software,
making it difficult to obtain the real time results without
compromising resolution and frame rate, the use of high clock
frequencies, and consequently leading to higher costs and power
consumption. This paper describes the ARCam framework, which
implements a platform based on Field Programmable Gate Array
technology for Augmented Reality applications in the form of a
dedicated hardware system. The ARCam solution is formed
entirely by hardware components, including the one responsible
for graphics rendering called Hardwire. In order to demonstrate
the framework utilization and evaluate its feasibility, a number of
image processing algorithms have been implemented and some
Augmented Reality applications have been created using ARCam.

Keywords
FPGA, Image Processing Algorithms, Augmented Reality,
Embedded Systems.

1. INTRODUCTION
With the evolution of embedded systems and applications using
Virtual Reality (VR), the creation of systems that amplify in real
time user perception of the real world became feasible.
Augmented Reality (AR) is the research area that studies and
creates systems that support the coexistence of virtual and real
worlds [1]. Unlike VR, the user is not immersed in an artificial
environment, but in the real world, with virtual objects or
information superimposed to it. Ideally, in AR, virtual and real
objects coexist in a natural way. A case in the point is the movie
Who Framed Roger Rabbit, where virtual characters are shown
in the real world. In an industrial environment, AR can help to
identify problems and point to solutions, ranging from a simple
help in a procedure to a simulation of the future based on current
information. Many other areas can use (and they are already
using) AR tools, including medicine, production and repair

manufacturing
applications.

lines,

robotics,

entertainment

and

military

The design and building of a system following such concepts is


the challenge we seek to overcome in this work with the
development of the ARCam framework [2]. It has the benefit of
turning AR technology ubiquitous. The main objective of this
work is to construct a framework for the development of
embedded AR solutions, creating a flexible system that facilitates
the elaboration of new applications using the available hardware
infrastructure, together with a library of common functions for
this type of application. With this framework, it will be possible
to create different types of solutions, for example, smart cameras
programmed to perform equipment inspection.
This paper is organized as follows. Section 2 talks about some
work related to ARCam. Section 3 presents the platform utilized
in the project and the developed architecture. Section 4 presents
some results obtained from the implementation of image
processing components and a 3D rendering component in
ARCam. Two case studies were implemented to evaluate the
framework and are described in Section 5. Section 6 enumerates
some of the difficulties faced during the development of the
project. Finally, conclusions and directions for future work are
described in Section 7.

2. RELATED WORK
Although several AR applications have been developed, they are
targeted to devices which have a general purpose processor, like
computers [3] and PDAs (Personal Digital Assistants) [4]. In
these contexts, all the processing is done by software, which
usually implies in performance loss or image quality reduction
when the application operates under stringent real-time
constraints. Conciliating the use of general purpose processors
with applications that need to maintain good performance in realtime invariably results in a high cost to the overall project, due to
some factors, e.g. the need of higher clock frequency and power.
Often users need to carry a notebook in order to run a real-time
application, which makes the solution less versatile and more
expensive, mainly because of its weight and power consumption
[3].
On the preliminary research performed to produce this project it
was not found any flexible solution from the point of view of
hardware and software for AR applications. Most existent
solutions for AR are still not accessible for the general audience,
because they are in the research phase and / or are dedicated to a
specific applications [3], [5], [6]. For instance, ID CAM [6] is an

ID recognition system with an optical beacon and a fast image


sensor with sufficient space resolution and robustness for longdistance recognition. The ID CAM contains an optical lens, a fast
CMOS (Complementary MetalOxideSemiconductor) image
sensor, a controlling FPGA (Field Programmable Gate Array),
and a USB (Universal Serial Bus) interface to output scene
images and the IDs to a PC. In [7] the authors use an FPGA to
separate pixels with determined color and apply a center of mass
algorithm onto the image to find the center of these pixels in the
scene. Further, the RS-232 serial port is used to transmit the xy
coordinates to the host machine for rendering to the AR display.
While all these systems implement AR applications using a
combined hardware and software approach, the project presented
in this paper has been developed to be constituted by specific
purpose processors. All the functionalities were implemented
using a hardware description language, resulting in a dedicated
hardware system.
The hardware component responsible for graphics rendering,
named Hardwire (and explained later), was also developed using a
hardware description language, since one of its goal is to offer
rendering support for an embedded AR application implemented
with ARCam. Hardwire works like an inner module of the
embedded application, with the specific function of wireframe
rendering, different from Manticore [8], which is an open source
project of a 3D acceleration board.
Manticore is fully written in VHDL (Very-High-Speed Integrated
Circuit Hardware Description Language) and currently is capable
of rendering triangles on a VGA (Video Graphics Array) monitor.
The project includes a module for VGA output (also present in
the prototyping board used by ARCam), an open source SDRAM
(Synchronous Dynamic Random Access Memory) controller
which was fully developed by Manticore authors and a triangle
rasterizer (module responsible for drawing the specified triangles
in regions of the user screen). Eventually this open source project
will incorporate standard 2D graphics primitives, multiple
resolutions and color depths, shading support via hardware and a
PCI (Peripheral Component Interconnect) interface, and probably
an AGP (Accelerated Graphics Port), for connecting to a common
computer. The entire project was originally developed on the
Altera APEX20K200E FPGA, available on a NIOS prototyping
board. The operation frequency accomplished with this board was
50 MHz. Manticore authors also intend to construct their own
board and then create a full 3D accelerator.
Hardwire functionality is validated together with ARCam by
visualizing 3D objects in a monitor connected to the FPGA,
unlike the work done by Daniel Mc Keon [9], where the
verification of the implemented synthesizable graphics
transformations is made only by simulation. The work developed
by Mc Keon consisted of implementing a 3D graphics
transformations synthesizable model. In this way, he intended to
implement vectorial transformations and projections using VHDL
language. This supporting layer would allow a later study about
graphics processing based on clusters from Trinity College. The
implemented functions mainly comprise operations handled using
matrices, like rotation, scaling and translation. The development
of Mc Keons work started by the most basic modules, therefore
even the simplest modules (multipliers, for example) were
developed by him. Once all the implementation had been finished,
the created modules were validated using a simulation tool. This

work proved that FPGAs are capable of performing complex


calculations, like vectorial transformations and projections. As a
consequence, they can be used to share processing with a CPU
(Central Processing Unit) or a GPU (Graphics Processing Unit),
or even integrate a fully embedded solution, like ARCam.

3. FRAMEWORK ARCHITECTURE
Related research on smart camera architectures is normally
targeted picture processing problems, like pattern detection used
to identify gestures, recognize defects and track objects
movement. ARCam intends to implement not only picture
processing modules in hardware, but also to provide the necessary
infrastructure to superimpose virtual elements on the captured
image of the real world, as a form to improve the interface with
the applications user.
The proposed solution in ARCam uses an Altera development
board with the FPGA Stratix II, an image sensor and a VGA
video monitor.
The flexibility of the implementation of firmware through a
hardware description language, such as VHDL, makes possible
the scalability when it is wanted to duplicate a component inside
the FPGA in order to improve processing performance. These
systems which are implemented in an FPGA integrating some
modules are called SoC (System-on-a-Chip). Figure 1 shows the
development environment used in the project, where an image
sensor (lower right) was connected to an FPGA-based
development board.

Figure 1. ARCam development environment.


The system is divided in modules that are responsible for
acquisition, storage, processing and projection of images. The
architecture, illustrated in Figure 2, comprises an image sensor, a
memory, a processing module, a multiplexer and a VGA bus.

Figure 2. ARCam architecture.


The memory size corresponds to a frame with a resolution of

320x240 pixels and a color depth of 24 bits (8 bits per RGB


color). The pixels are stored in a FIFO (First In First Out) queue.
Each pixel stored into memory will be read and handled by the
processing module.
The current architecture is composed by three interconnected
devices: an image sensor, an FPGA and a VGA monitor. The first
two are directly connected, pin to pin, with common ground.
Between the FPGA and the VGA monitor there is a DAC
(Digital-to-Analog Converter) that converts the digital colors in
an analogical level of tension in the three color channels of the
monitor (red, green and blue). The use of a memory external to
the FPGA is being studied, in order to verify if it is necessary in
the hardware project. The use of an external memory will be
necessary in case the available internal memory of the FPGA is
not enough to store the necessary data to the application in the
future.
ARCam offers a component based design model. The developer
has access to a components library, with each component
performing a specific functionality. These components can then
be combined in order to give the AR application the desired
behavior.

3.1 Image acquisition


First, an AR application needs an image acquisition subsystem to
observe the environment. In conventional systems, a camera with
digital output is often used to provide the frames for the
application that will insert virtual components in the observed
environment. In an embedded system, this subsystem can be
restricted only to the image sensor, which is the basic component
of a camera. The environment light sensitizes a matrix of light
sensors (photodiodes, transistors or capacitors), where each point
passes through a DAC and is converted to digital values. The
output of an image sensor is basically composed by three electric
signals: vertical synchronism, horizontal synchronism and pixels.
They inform respectively when a frame finishes, when a line
finishes and when pixel is ready at the bus.

3.2 Image processing


For image handling and processing, an FPGA is used because of
its main feature: being configurable. An FPGA can have different
implementations. Each frame captured by the image sensor is
stored into the memory implemented on the FPGA. Next, each
frame is read and can be processed by the processing module on
the FPGA. This module corresponds to the user application and is
developed according to specified functionality. When the
processing ends, the frame is exhibited in the VGA Monitor.

next section.

4. IMPLEMENTED COMPONENTS
Several image processing components for different purposes were
implemented to compose the ARCam framework. These
components perform typical image functions and are intended to
be used in the design of AR applications. Hardwire, a component
responsible for rendering 3D wireframe virtual elements, was also
implemented.
The overall descriptions of the implemented features along with
the obtained results are given next.

4.1 Image binarization and gray scaling


Most image processing algorithms used in AR applications do not
handle color images. As a result, the original image has to be
converted to a more suitable format, such as gray scale or binary
[10].
A high pass threshold filter transforms the color image in a binary
image based on a threshold value. First the brightness component
is isolated in the RGB images as shown in equation (1).
gray = 0.299R+0.587G+ 0.144B

In the FPGA this operation is implemented by logic gates. This


way, the result of this operation is obtained in only one clock
cycle unlike general purpose processors that need one clock cycle
per arithmetical operation.
After getting the gray value, a parameter is used to threshold the
image in black and white pixels, as shown in process (2).
if gray > threshold value;
Pout = white,

(2)

otherwise
Pout = black
In order to show this result on the display, it is possible to choose
one color to represent the virtual world and another one to
represent the real word. In Figure 3, white color was chosen to
represent the real world and black color represents the virtual
world. Then, when a white pixel is found in memory, the real
world correspondent pixel is printed on the screen, and the black
pixels are printed on the screen representing the virtual world.

3.3 Scene exhibition


The architecture also has a subsystem responsible for exhibiting
the image of the real world captured by the image sensor mixed
with the virtual information added from the processing module.
At a first stage, the output of the system will be VGA so that it
can be plugged to a HMD (Head Mounted Display) or a common
video monitor. At a second stage, we plan to also include an
embedded LCD (Liquid Crystal Display).
The virtual objects presented on the screen can be simple 2D
shapes, like filled squares and circles, or 3D wireframe objects,
which are rendered by the Hardwire component described in the

(1)

Figure 3. Binarization result.

4.2 Labeling
Labeling of a binary image refers to the act of assigning a unique
value to pixels belonging to the same connected region [11]. This
process is often used in marker based AR applications [10].
Definition of a connected region can consider only 4 neighbors
(left, right, up and down) or 8 neighbors. Based on this decision
different results can be obtained. Figure 4 shows this difference.
This process reads the binary image from top to bottom, and left
to right.

Figure 4. Labeling algorithm based on 4 neighbors (left) and 8


neighbors (right).
The first problem of this process is the union of regions not
connected on sequential reading of the image. One solution is
creating equivalences stored in a look up table (LUT). Figure 5
shows an example. Another problem is to maintain the LUT
updated when a label is no longer pointing to its proper value. In
Figure 5, if L2 found a connection with L1, for example, L4 needs
to be updated. Assuming this, L4 now also needs to point to L1.
To solve this problem all new joined equivalences need to verify
if some equivalence in the table also needs to be brought up to
date. The last problem is showing the final result. All connected
pixels need to be printed on the screen with the same color. At
this moment the value pointed on the LUT is printed instead of
the real label value. Figure 6 shows one result of this process on
screen.

values which are unrepresentative of their surroundings. Mean


filtering is usually thought of as a convolution filter. Often a 33
square kernel is used. In a binary image this process can be
implemented by counting how much neighbors a pixel has and by
eliminating lonely pixels. Figure 7 shows the noise that this filter
removes.

Figure 7. Noise removed by the mean filter.

4.4 Edge detection


This process marks the points in a digital image at which the
luminous intensity changes quickly [11]. Usually these changes
reflect points of interest in real world than can be discontinuities
in depth, discontinuities in surface orientation, changes in
material properties, and variations in scene illumination. Edge
detection is widely used in AR solutions in tasks such as marker
tracking and feature matching [10].
The ARCam implementation is based on zero-crossing method.
This method searches for zero-crossings in the second derivative
of the image in order to find edges. Detection of zero-crossings in
the second derivative captures local maxima in the gradient. The
second derivative is calculated using a Laplacian filter that
highlights regions of rapid intensity change. In other words, it
works like an edge detector. Since the input image is represented
as a set of discrete pixels, the second derivatives in the definition
of the Laplacian filter can be approximated by a discrete
convolution mask. Figure 8 shows commonly used masks. The
results shown in Figure 9 are obtained using mask (a) from Figure
8.

Figure 5. LUT with equivalence between labels values.

Figure 8. Laplacian discrete convolution masks.

Figure 6. Labeling result.

4.3 Mean filter


This operation replaces each pixel value in an image with the
mean value of its neighbors, including itself [11]. It is often used
to reduce noise in images. It has the effect of eliminating pixel

Figure 9. Edge detection filter result.

4.5 Generic convolution


In order to use any convolution kernel, we included a VHDL
block where the values of the convolution mask can be easily
changed. Its use is not efficient when the kernel has zero values,
since the product of any number and zero is zero and adding zero
to the convolution result does not change it, so there will be lost
time with operations of memory reading and wasted hardware
blocks of addition and multiplication. Applying a convolution
mask in an image is the process of changing the pixel value to
other value based on the neighbors of the pixel. 3x3 convolution
masks are often used. Figure 10 shows how Pout is calculated
based on its original value (Pin), its neighbors and the
convolution mask.
Figure 11. Quad detection component and its inner modules.

4.7.1 Border tracing


The first step needed in order to perform quad detection is to
identify the borders present in the image. Identifying these
borders allows, in further steps, to separate all the shapes that will
be analyzed to verify if they are quads or not.
Pout = (N11 x C11) + (N12 x C12) + (N13 x C13) +
(N21 x C21) + (Pin x C22) + (N23 x C23) +
(N31 x C31) + (N32 x C32) + (N33 x C33)
Figure 10. Generic convolution process.

4.6 Centroid estimation


A component responsible for finding the center of a colored
object was implemented. The purpose of this centroid component
is to find the center of a region (x and y coordinates) containing
more pixels with a specific color (blue, for example) on the
screen. The operation is performed by calculating the colored
pixels center of mass on the screen. The center of mass is
obtained by means of equation (3).

The first thing to be considered before choosing an algorithm to


perform border tracing is the type of image to be processed. Since
we have a module that performs edge detection in the image and
generates a binary output, presented previously, the easiest way to
trace the borders is to use this binary representation, because this
is the simplest format that is possible to achieve. Therefore, a very
simple algorithm may be applied to perform border tracing.
This algorithm consists in tracing the border using a set of 8
possible directions (8-connectivity). In Figure 12 it can be seen
that using this approach makes it possible to trace the border in
vertical, horizontal and diagonal directions, and this is needed to
detect quads that can be rotated in any degree. Figure 12 also
shows that the directions must be used following an anticlockwise direction.

(3)

As seen in the equation, the center of mass is calculated by


dividing the sum of the x and y coordinates of all colored pixels
by the total number of colored pixels on the screen.

4.7 Quad detection


Marker recognition is widely used by AR applications developers.
A common marker utilized by these developers is a quad shape
[10]. Therefore, this work provides among its functionalities a
component that performs quad detection.
The four steps involved in the quad detection process (border
tracing, vertex reduction, polygon approximation and quad
classification) are performed by three modules that, together,
compose the quad detection component, as shown in Figure 11.
This subsection aims to explain this component, presenting the
steps involved in this processing and the obtained results.

Figure 12. 8-connectivity search.


Before starting to trace the borders present in the image, the
border tracing module needs to store the actual frame received
from the edge detection module. Once the frame is stored into an
inner memory, the tracing algorithm takes place. The algorithm
starts searching for a black pixel in the image from the top left.
Once this pixel is found the 8-connectivity search is applied to
find the neighbor black pixel to this one. If a neighbor black pixel
is found then 8-connectivity is applied again to find a neighbor of
this new one. The x and y coordinates of the black pixels that are
part of the border are stored into two memories. This is done until
the algorithm reaches the beginning of the border again. When

this happens, the next step of the quad detection, namely vertex
reduction, takes place using as its input the memories with the
coordinates of the border pixels.
Once quad detection is finished for the last found border, the
border tracing algorithm searches for another border in the current
frame. This border is then traced and this process is repeated until
the quad detection is executed on each border present on the
current frame and just then a new frame is loaded.
Currently, only the border tracing step is implemented and Figure
13 shows its results. Each border was rendered with a different
color in order to show that they can be distinguished by the
application.

As earlier mentioned, this step remains under development. Once


it is finished, it will provide two memories with the x and y
coordinates of the simplified border to be fed into the next step.

4.7.3 Polygon approximation and quad


classification
The first phase of this process aims to provide the best polygon
shape that represents the last border found on the current frame.
Once the polygon shape is achieved then it is classified as a quad
or not. Figure 15 depicts each stage involved in the polygon
approximation of a given polyline.

Figure 13. Border tracing results.

4.7.2 Vertex reduction


The goal of this process is to simplify the polyline found during
the border tracing step. This is important because not only most
vertices present on the border found during the previous step are
useless for performing the quad detection but also having a big
number of vertices could turn the next step, polygon
approximation, too long.
During the vertex reduction, successive vertices that are clustered
too closely are reduced to a single one. The distance used to
determine the discarded vertices is called tolerance. Figure 14
represents the usage of this algorithm. An initial vertex V0 is
fixed, and then successive vertices Vi are tested. If the distance
between these vertices and V0 is less than the defined tolerance
then these vertices are rejected. Otherwise the vertex is accepted
as part of the new simplified polyline and is used as a new initial
vertex for further simplification.

Figure 15. Polygon approximation.


To perform the polygon approximation the Douglas-Peucker (DP)
algorithm [12] was chosen. This algorithm is based on the
distance between a vertex and an edge segment and on a tolerance
() like the one used by the vertex reduction process. To start the
algorithm, two extreme points of the polygon are connected. This
connection defines the first edge to be used. Then the distance
between each remaining vertex and this edge is tested. If there are
distances bigger than , then the vertex with the biggest distance
away from the edge is added to the simplification. This process
continues recursively for each edge of the current step until all
distances between the vertices of the original polyline and the
simplification are within the tolerance distance.
The second phase of this step consists of analyzing the set of
vertices that are part of a polygon shape to verify if it is a quad or
not. All features related to the quad shape must be verified,
namely, if there are only four vertices, if the polygon is convex, if
its area is relatively wide and if the angles are near 90 degrees.

Figure 14. Vertex reduction.

The result of this verification represents the end of the quad


detection processing. When this result is achieved, the border
tracing step must start to search for another border on the current
frame and then the sequence of steps takes place again.

4.8 Hardwire

4.8.1 3D object representation

An important bottleneck often presented by AR software is the


processing time of information acquired by the camera, which is
commonly done using specific libraries, running over an
operating system, along with many others processes. This harms
the real-time processing required in this type of application, since
the processing result is not immediate to the user. Another
bottleneck in the processing is the rendering of 3D objects that
will be part of the virtual world. In order to solve this last
problem, Hardwire was created. Hardwire is a component
responsible for rendering 3D wireframe objects, applying a series
of coordinate transformations and showing them to user on screen
[13]. Figure 16 shows a wireframe cube rendered using Hardwire.

There are many ways for representing 3D objects, e.g., wireframe,


polygonal faces and binary space partition. The choice of a 3D
representation highly depends on the target application. The
wireframe representation was adopted in Hardwire due to its
simplicity. It allows each 3D object to be defined on the basis of a
coordinate list containing the location of each vertex and a list of
edges formed by the information previously provided.

4.8.2 Visualization transformations


After the 3D object is readily represented in the circuit, some
geometric transformations are executed, defining particular
viewing conditions such as location, size and object orientation.
According to each objects position or particular observer
location, there is a different object view. When the object or the
observer is moved over the coordinate system, the transformations
occur. These transformations are based on translation, scaling and
rotation operations. In the implementation of Hardwire, the
operations between matrices and coordinate vectors are hidden
inside the basic modules created internally.

4.8.3 3D rotations

Figure 16. 3D rotated cube rendered using Hardwire.


Both Java and C programming languages were used in the
development of support applications responsible for testing and
generating the VHDL modules. Java was chosen to be used in the
creation of Hardwire's software prototype. Direct mappings were
performed between some source-code segments written in Java
and Hardwire modules and hardware connections. The C
programming language was used during the verification of
simulation results.
The implementation of Hardwire was based on a set of concepts
related to the computer graphics area, specifically directed to 3D
visualization. These included 3D object representation,
visualization transformations, 3D rotations, 3D projections and
line drawing algorithms, see Figure 17.

The math present on 3D rotation is much more complex than its


2D counterpart, since a rotation axis must be specified. In two
dimensions, the rotation axis is always perpendicular to the xy
plane, whereas in a 3D world a rotation axis may have any
orientation. Currently Hardwire supports rotation around the x
axis. A circuit capable of providing angles, sine and cosine
operations, necessary to the rotation was created and later the
rotation will be extended to the other two remaining axes.

4.8.4 3D projections
Similarly to an artist, when representing on paper the image of a
3D object, there is also a necessity of generating a projection of
the object to be displayed by the computer. A 3D projection is
nothing more than a 2D representation of a 3D object. The
simplest projection is the orthogonal one, and the most used, the
perspective projection. The last one can simulate the projection
used by human vision, when images from an object are captured.
In our initial prototype of Hardwire, a very simple scheme of
orthogonal projection was implemented, achieved simply by
ignoring the vertex z coordinate on the viewer's coordinate
system. This way, the 3D object is projected directly on the
projection plane, keeping the original size of objects independent
from the viewer's distance.

4.8.5 Line drawing algorithms


A line drawing algorithm is responsible for constructing the best
possible approximation to an ideal line, while taking in
consideration the existing limitation of the output device. The
algorithm must satisfy the following constraints: (1) the line must
have a continuous appearance and uniform brightness and
thickness; (2) it must use the pixels as much close to the ideal line
as possible; (3) it must be fast for generating the line.

Figure 17. Hardwire concepts.

Based on these three criteria, the line drawing algorithm used in


Hardwire was from Bresenham [14]. Its technique is optimized
enough to only require low cost hardware operations, such as
additions, subtractions and bit shifting. All the other examined
candidate algorithms would use more prototyping board
resources, and some of them may compromise the final frequency

of the projected circuit, by inserting delay periods on pixel


processing time, due to the complexity of the operations they
perform.

5. CASE STUDY AND RESULTS


Two AR proof of concepts were implemented using ARCam in
order to evaluate its feasibility. The aim of the first case study was
to show that it is possible to create an AR application using the
ARCam infrastructure. Due to this, a prototype AR application
was rapidly created without taking too much modularization into
account. The second case study connects some of the existing
ARCam framework components to build an AR application,
showing that it is possible to have a componentized model for the
design of hardware based AR systems.

determining what is blue and what is red. Next, the processing


verifies the amounts of red or blue points at the top and the
bottom of the screen. If the top of the screen has the biggest
amount of points, the bar will move up. In the same way, the bar
will stop or go down, if the middle or the bottom of the screen,
respectively, has the biggest number of points. In this manner, the
exhibition frame rate is not affected because points processing is
done in parallel with pixels writing to the video memory.

5.1 Pong
An AR application using the architecture described in Section 3
was implemented as a case study, namely the Pong game, where
two players control two bars that can strike an object that moves
on the screen. Each time the player does not move the bar to
prevent the collision of the ball with the side edges of the scene,
he/she loses points and the edges of the scene blink in red,
announcing the collision. Figure 18 shows Pong at the moment
that the ball (red) collides with the center of the left side edge
(blue player), because the player did not strike it. The blue and
green bars are the players representations and compose the scene
of the game, as well as the upper bars that indicate the players
scores.

Figure 19. Pong interaction example.


Processes read from and write to a set of variables that define the
state of the game. Table 1 shows the parallel processes that run in
FPGA to implement the game. Figure 20 shows variables graphic
representation in the game screen (except for variable A).
Table 1. Game state information
State variables

Description

0.5 Hz clock

Ball Y position

Ball X position

Vertical movement

Horizontal movement

Players Y position

Players score

Figure 18. Blue player losing a point.


The players interaction with its bars is given by the movements
done in the real world with two markers (red and blue) that are
captured and identified by the image sensor. The bars are moved
according to the player movement. Each player must hold some
blue or red object, which will be used as markers for the AR
application. When the players move the markers upwards, the
bars move up accordingly, when the marker stands in the screen
center, the bar remains still. Similarly to the upward movement,
when the player moves downwards the marker, the bar moves
down too. Figure 19 shows such interaction example.
The movement done by the player to move his/her bar is
processed by reading pixels from different regions of the captured
image. A threshold is applied to the RGB components for

Figure 20. Location of game information on the screen.

The processes that run in parallel during the execution of the


game are listed in Table 2. Table 3 relates each process with the
type of access to the state variables.

component, and the wireframe cube is rendered at the supplied


position over the real world image. The result of this process,
shown in Figure 21, is then presented to the user.

Pong performs clock division to a frequency that makes the game


possible, since at a 100 MHz clock the ball would walk 100.000
pixels per second. Consequently, the first process makes the clock
division from 100 MHz to 0.5 Hz.
The developed application runs at the same frame rate of images
acquisition from the camera, not having delays related to the
processing of frames.
Table 2. Game processes functions
Process
1

Function
Divide clock from 100 MHz to 0.5 Hz

Move the ball at Y axis and verify if it collides


with the top and bottom of the screen
Move the ball at X axis and verify if it collides
with the sides of the screen or players bars,
decreasing players scores if necessary
Move players bars

Draw virtual objects based on state variables

Table 3. Processes and state variables accessed


Process
1

Read

Write
A

B, D

B, D

C, E, F

C, E, G

4
5

F
B, C, F, G

After the project main infrastructure was established, the


development of Pong was very fast. This demonstrates the
simplicity of developing an AR application given the ARCam
framework.

5.2 Object recognition


Our next step was more challenging in that it involves the creation
of embedded AR applications by joining different ARCam
components. The implemented application identifies blue objects
in the real environment and draws a rotating cube over them.
Basically, two components were used to achieve this result:
centroid estimation and Hardwire.
After the conclusion of the first hardware prototype of Hardwire,
it was possible to visualize a cube (simple 3D object used on
initial tests) on the monitor directly connected to the prototyping
board. Based on this version some changes were performed and
the 3D rotation feature was implemented, enabling the rendered
object to rotate continuously (a screenshot of the rotated cube can
be seen in Figure 16).
The application pipeline works as follows: the image captured by
the camera is sent to the centroid estimation component, where
the x and y coordinates relative to the position of the object are
determined. This information is then sent to the Hardwire

Figure 21. Object centroid estimation and cube rendering.

6. LESSONS LEARNED
Hardware development in FPGA adds a series of difficulties and
challenges different from those encountered in software
development. Some of them are the FPGA size (number of logical
elements available), the frequency clock of the circuit and its
speed. These variables must be carefully studied during the
project of an embedded system.
The time to compile and generate the simulation hardware is
really bigger than a software compilation process. In a matter of
seconds, a simple software can be compiled, while a simple
hardware takes some minutes to be compiled. It is a problem that
makes hardware development more complex and must be
considered in any hardware development agenda.
Concerning the framework architecture, there were some
difficulties to start the image sensor capture, due to the fact that
the camera uses an I2C register bus to configure the image. The
image sensor documentation is incomplete and the authors
experienced some problems in identifying its features. In fact,
even at this moment, one of the problems still faced relates to the
image appearing green, even when using a RGB format (8 bits per
color).
Another problem was parallel access to memory. In the beginning
of the project, there were some moments where two processes
were accessing the memory concurrently leading to data
inconsistency. Currently, these two processes have been merged
and only one process is responsible for memory access.
In Hardwire, lessons like algebraic manipulations with signed and
unsigned vectors, and differences between signed vectors and
arithmetic library were learned.
In the implementation of Pong, the most important challenge was
memory access. The internal memory was not sufficient to store a
640x480 frame and the application game, so the use of an external
memory is important in order to increase frame resolution, which
currently is 320x240 pixels.

7. CONCLUSIONS AND FUTURE WORK


An architecture and two case studies were defined and
implemented, using a hardware infrastructure, to support the
development of AR embedded solutions. This infrastructure
contains an image sensor module responsible for visualizing the
environment, an image processing module, a mixer module of real
and virtual worlds and, finally, an exhibition module of the
combination of the two worlds.
It was verified that having a pre-existent infrastructure makes the
development of hardware based AR applications easier and faster.
The flexibility of the used platform and the library with real-time
image processing components enable a more efficient
development of a wide range of applications. In addition, the
ARCam framework stands as a standalone solution fully
implemented in hardware comprising all the steps needed by an
AR system, including graphics rendering, which is done by
Hardwire.
Performance obtained from the hardware implementation was
shown to be satisfactory, as expected, since there is the possibility
of real parallelism when using an FPGA. More importantly,
hardware implementation of computer graphics algorithms has
proven beneficial to applications that need the manipulation of
complex CPU hungry processing [15].
As future work, performance analysis is going to be done,
comparing hardware and software versions of image processing
algorithms. The quad detection component is going to be finished
and additional image processing components are going to be
implemented, giving more options to the AR developer. With
regard to Hardwire, the implementation of perspective projection
and the rotation around other axes are planned. The use of Zbuffer and textures also might be considered in case of rendering
3D filled objects. The creation of an authoring tool for hardware
based AR applications might be considered, where the user is able
to choose and link the needed components using a GUI
(Graphical User Interface). More complex AR case studies need
to be done, using some of the many components implemented in
the framework. In addition, different AR approaches can also be
exploited, like, for example, markerless AR. An infrastructure for
accessing external memory from the FPGA is also going to be
defined, hence increasing the amount of system memory, instead
of relying on limited internal memory.

8. ACKNOWLEDGMENTS
The authors want to thank MCT and CNPq, for financially
supporting this research (process 507194/2004-7).

9. REFERENCES
[1] Azuma, R. A survey of augmented reality. Presence, 6, 4
(Aug. 1997), 355-385.
[2] Guimares, G., Silva, S., Silva, G., Lima, J., Teichrieb, V.,
and Kelner, J. ARCam: Soluo Embarcada para Aplicaes
em Realidade Aumentada. In Workshop de Realidade
Aumentada (WRA 06) (Rio de Janeiro, Brazil, September
27-29, 2006). Brazilian Computer Society, So Paulo, SP,
2006, pp. 23-26.
[3] Umlauf, E., Piringer, H., Reitmayr, G., and Schmalstieg, D.

ARLib: The Augmented Library. In Proceedings of the First


IEEE International Augmented Reality ToolKit Workshop
(ART 02) (Darmstadt, Germany, September 29, 2002). IEEE
CS, Los Amitos, CA, 2000, 2 pp.
[4] Pasman, W., and Woodward, C. Implementation of an
Augmented Reality System on a PDA. In Proceedings of the
Second IEEE and ACM International Symposium on Mixed
and Augmented Reality (ISMAR 03) (Tokyo, Japan, October
7-10, 2003). IEEE CS, Los Amitos, CA, 2003, 276-277.
[5] Wagner, D., Pintaric, T., Ledermann, F., and Schmalstieg, D.
Towards Massively Multi-user Augmented Reality on
Handheld Devices. In Proceedings of the Third International
Conference on Pervasive Computing (PERVASIVE 05)
(Munich, Germany, May 8-13, 2005), Springer
Berlin/Heidelberg, New York, NY, 2005, 208-219.
[6] Matsushita, N., Hihara, D., Ushiro, T., Yoshimura, S.,
Rekimoto, J., and Yamamoto, Y. ID CAM: A Smart Camera
for Scene Capturing and ID Recognition. In Proceedings of
the Second IEEE and ACM International Symposium on
Mixed and Augmented Reality (ISMAR 03) (Tokyo, Japan,
October 7-10, 2003). IEEE CS, Los Amitos, CA, 2003, 227236.
[7] Smith, R., Piekarski, W., and Wigley, G. Hand Tracking for
Low Powered Mobile AR User Interfaces, In Proceedings of
the Sixth Australasian User Interface Conference (AUIC
05) (Newcastle, Australia, January 31st February 3rd,
2005), Australian CS, Sydney, NSW, 2005, 7-16.
[8] Manticore - open source 3D graphics accelerator. Available:
Manticore site. URL: http://icculus.org/manticore, visited on
October, 2006.
[9] Mc Keon, D. Synthesizable VHDL Model of 3D Graphics
Transformations. Undergraduate Work, Trinity College,
Dublin, Ireland, 2005.
[10] Fiala, M. ARTag Revision 1, a Fiducial Marker System
Using Digital Techniques. Technical Report NRC 47419,
National Research Council Canada, Ottawa, ON, 2004.
[11] Gonzalez, R., and Woods, R. Digital Image Processing,
Addison Wesley, Reading, MA, 1992.
[12] Douglas, D., and Peucker, T. Algorithms for the reduction of
the number of points required to represent a digitized line or
its caricature. The Canadian Cartographer, 10, 2 (1973),
112-122.
[13] Teixeira, J., Teichrieb, V., and Kelner, J. Hardwire: Uma
Soluo de Renderizao para Realidade Aumentada
Embarcada. In Workshop de Realidade Aumentada (WRA
06) (Rio de Janeiro, Brazil, September 27-29, 2006).
Brazilian Computer Society, So Paulo, SP, 2006, pp. 35-38.
[14] Foley, J., Van Dam, A., Feiner, S., and Hughes, J. Computer
Graphics: Principles and Practice. Addison Wesley,
Reading, MA, 2005.
[15] Teixeira, J., Teichrieb, V., and Kelner, J. Desenvolvimento
de Aplicaes de Realidade Aumentada Embarcada:
Analisando Desempenho de Desenho de Objetos 3D. In
Workshop sobre Aplicaes de Realidade Virtual (WARV
06) (Recife, Brazil, November 22-25, 2006). Brazilian
Computer Society, So Paulo, SP, 2006, pp. 29-32.

Vous aimerez peut-être aussi