Vous êtes sur la page 1sur 30

Projected Inter-Active Display

for Public Spaces

--------------------

A Thesis Proposal
Presented to the Faculty of the
Department of Electronics and Communications Engineering
College of Engineering, De La Salle University

--------------------

In Partial Fulfillment of
The Requirements for the Degree of
Bachelor of Science in Electronics and Communications Engineering

--------------------

by

Arcellana, Anthony A.
Ching, Warren S.
Guevara, Ram Christopher M.
Santos, Marvin S.
So, Jonathan N.

July 2006
1. Introduction

1.1.Background of the Study

Human-computer interaction (HCI) is the study of the interaction between the

users and the computers. The basic goal of HCI is to improve the interaction between

users and computers by making the computers more user-friendly and accessible to

users. HCI in the large is an interdisciplinary area. It is emerging as a specialty

concern within several disciplines, each with different emphases: computer science,

psychology, sociology and industrial design (Hewett et. al., 1996). The ultimate goal

of HCI is to design systems that would minimize the barrier between the human’s

cognitive model of what they want to accomplish and the computer’s understanding

of the user’s task.

The thesis applies a new way to interact with sources of information using an

interactive projected display. For a long time the ubiquitous mouse and keyboard has

been used to control a graphical display. With the advent of increased processing

power and technology, there has been great interest from the academic and

commercial sector in developing new and innovative human computer interfaces in

the past decades. (Myers et. al., 1996). Recently advances and research in human

computer interaction (HCI) has paved the way for techniques such as vision, sound,

speech recognition, and context-aware devices that allow for a much richer,

multimodal interaction between man and machine. (Turk, 1998; Porta, 2002). This

type of recent research moves away from traditional input devices which are

essentially blind and unaware into the so called Perceptual User Interfaces (PUI). PUI

are interfaces that emulate the natural capabilities of humans to sense, perceive, and
reason. It models human-computer interaction after human-human interaction. Some

of the advantages of PUIs are as follows: (1) it reduces the dependence on being in

proximity that is required by keyboards and mouse systems, (2) it makes use of

communication techniques found natural in humans, making the interface easy to

use,(3) it allows interfaces to be built for a wider range of users and tasks, (4) it

creates interfaces that are user-centered and not device centered, and (5) it has design

emphasis on being a transparent and unobtrusive interface. (Turk, 1998).

What is interesting in this line of research is the development of natural and

intuitive interface methods that make use of body language. A subset of PUI is Vision

Based Interfaces (VBI) which focuses on the visual awareness of computers to the

people using them. Here computer vision algorithms are used to locate and identify

individuals, track human body motions, model the head and the face, track facial

features, interpret human motion and actions. (Porta, 2002) A certain class of this

research falls under bare hand human-computer interaction which this study is about.

Bare hand human interaction uses as a basis of input, the actions and gestures of the

human hands alone, without the use of attached devices.

1.2.Statement of the Problem

Information-rich interactive viewing modules are usually implemented as

computer based kiosks. However placing computer peripherals such as touch-screens

and mouse and keyboard controlled computers in a public area would require

significant space and have maintenance concerns on the physical hardware being used

by the common public. Using a projected display and a camera based input device,
would eliminate the hardware problems associated with the space and maintenance. It

also attracts people since projected displays are new and novel.

Additionally certain applications require highly innovative and attractive

presentations such as those found in exhibits. Adding an interactive projected display

in these environments will enhance the presentation and immersion of the exhibit.

1.3.Objectives

1.3.1.General Objectives

The general objective of the thesis is to create an interactive projected

display system using a projector and a camera. The projector would display the

interactive content and the user would use his hand to select objects in the

projected display. Computer vision is used to detect and track the hand and

generate the proper response.

1.3.2.Specific Objectives

1.3.2.1. To use a DLP or LCD projector for the display

1.3.2.2. To use a PC camera as the basis of user input

1.3.2.3. To use a PC to implement algorithms to detect hand action as seen

from the camera

1.3.2.4. To use the same PC to house the information content

1.3.2.5. To create an interactive DLSU campus map as a demo application


1.4.Scope and Delimitation

1.4.1.Scope of the Study

1.4.1.1. The proponents will create a real time interactive projected display

using a projector and a camera.

1.4.1.2. The proponents will use development tools for image/video

processing and computer vision to program the PC. Algorithms for

hand detection and tracking will be implemented using these tools.

1.4.1.3. A demo application of the system will be implemented as an

interactive campus map of the school.

1.4.1.4. Only the posture of a pointing hand will be recognize as an input.

Other visual cues to the camera will not be recognized.

1.4.2.Delimitation of the Study

1.4.2.1. The display will be projected in a clean white wall.

1.4.2.2. The projector and the camera set-up will be fixed in such a way that

blocking the projector is not a problem.

1.4.2.3. Trapezoidal distortion which results from projecting from an angle

will be manually compensated if present.

1.4.2.4. Lighting conditions will be controlled to not overpower the projector.

1.4.2.5. The system will be designed to handle only a single user. In the

presence of multiple users, the system would respond to the first user

triggering an event.
1.5.Significance of the Study

The study applies a new way of presenting information using projected

displays and allows the user to interact with it. A projected display conserves space as

the system is ceiling mounted and there is no hardware that directly involves the user.

Using only the hands of the user as an input, the system is intuitive and natural-- key

criteria for effective interfaces for use by the public. This minimizes the learning time

as the user only needs to point. The study presents an alternative to computer based

modules where space can be a problem.

Currently there is a high cost of acquiring and maintaining a projector. But it

is still feasible when maintaining an information center is deemed to be important.

The system can be comparable to large screen displays that are used in malls and

such. Since the system is also a novel way of presenting information. It can be used to

make interactive advertisements that are very attracting to consumers. The display

can transform from an inviting advertisement into detailed product information. With

this said, the cost of the operation of the projector can possibly be justified with the

revenue generated from effective advertising. Other applications of the system may

be for use in exhibits which may have particular requirements in uniqueness and

attractiveness. An interactive projected display can provide visitors of the exhibit a

higher level of immersion and have a high level of impact to its viewers.

The study is an endeavor towards the development of natural interfaces. The

use of a projector and camera provides a means of producing an augmented reality

that is natural-- requiring no special goggles or gloves that the user has to wear. In

public spaces where information is very valuable, a system that can provide an added
dimension to reality is very advantageous and the use of nothing but the hands means

the user can instantly tap the content of the projected interface. Computer vision

provides the implementation of a perceptual user interface and the projection provide

the means of creating an augmented reality. Further developments in these areas

means the presence of computers can be found throughout everyday life without

necessarily being attached to it. With the development of PUI there is no need for

physical interface hardware, only the use of natural interaction skills present in every

human would be needed.

1.6.Description of the Project

The system is comprised of 3 main components; (1) the PC which houses the

information and control content, (2) the projector which displays the information, and

(3) the PC camera which is the input of the system.

Block diagram of the system

The system would acquire the image from the camera. Preliminary image

processing is done to prepare the frames of the video to be ready for hand detection

and recognition. The system would then detect the position and action of the hands of

the user relative to the screen. Then it will generate a response from a specific action.

Techniques of image/video processing and machine vision will be used to facilitate

these functions of the PC.


As a demo application, an interactive map of the school will be developed.

The projector will project the campus directory of De La Salle University Manila.

The user will then pick on which building he/she would like to explore using his/her

hand as the pointing tool. Once the user has chosen a building, a menu will appear

that will give information about the building. Information includes brief history, floor

plans, facilities, faculties, etc. Once the user is finished exploring the building, he/she

can touch the back button to select another building in the campus. The cycle will just

continue until the user is satisfied.

1.7.Methodology

Development of the study will be heavily invested in the software

programming of the PC. On a hardware level, acquiring and setting up the camera and

projector is needed in the initial stages. Hardware requirements will be looked up for

acquiring the camera and selecting the development PC. However cost consideration

for the projector entail a base model projector that will have a lower luminance than

that may be desired. This has led to a delimitation that lighting conditions should be

controlled.

Implementation of the study is primarily based on programming, such that it

is necessary for the researchers to acquire the necessary skills in programming. Quick

familiarization and efficiency with the libraries and tools for computer vision is

necessary for timely progress in the study. Development of the system will be

undertaken in steps. Code will be developed for acquisition, detection, evaluation,


and then event generation. At each stage, the outputs will be tested such that

requirements will be met.

Computer vision tools have image acquisition routines that we can use for the

study. A detection routine based on algorithms discussed in literature will be

implemented in programming. After a working prototype is made, the system will be

evaluated for trial by other users for their inputs. Initial adjustments and fixes will be

made according to their inputs.

Seeking advice from different people may be necessary for speedy progress of

the study. Advice in programming will be very helpful, since the implementation is

PC based. Advice from the panel, adviser, and other people about the interface will be

helpful in removing biases the proponents may have in developing the system.
1.8.

1.9.Estimated Cost

Projector P 50,000

PC Camera P 1500-2000

Computer Vision Tools/Libraries Free

Development / Prototype Computer Available

Miscellaneous P5000 ___

Estimated budget P 57000


2. Review of Related Literature

2.1. Bare Hand Human Computer Interaction

Hardenburg and Berard in their work “Bare Hand Human Computer Interaction”

(2001) published in the Proceedings of Workshop on Perceptive User Interfaces,

describes techniques for barehanded computer interaction. Techniques for hand

segmentation, finger finding, and hand posture classification were discussed. They

applied their work for control of an on-screen mouse pointer for applications such as a

browser and presentation tool. They have also developed a multi-user application

intended as a brainstorming tool that would allow different users to arrange text across

the space in the screen.

Figure 1. Application examples of Hardenburg and Berard’s system


Finger controlled a) Browser, b) Paint, c) Presentation,
and multi user object organization

Hand segmentation techniques such as stereo image segmentation, color, contour,

connected components algorithms and image differencing are briefly discussed as an

overview of present algorithms. It is pointed out that the weaknesses of the different
techniques can be compensated by combining techniques at the cost of computational

expense. For their work, they chose to use a modified image differencing algorithm.

Image differencing tries to segment a moving foreground (i.e. the hand) from a static

background by comparing successive frames. Additionally when compared to a reference

image the algorithm can detect resting hands. Additional modification for image

differencing was maximizing the contrast between foreground and background.

After segmentation, the authors discuss the techniques used for detecting the

fingers and hands. They describe a simple and reliable algorithm based on finding nails

and from which fingers and eventually the whole hand can be identified. The algorithm is

based on a simple model of a fingertip being a circle mounted on a long protrusion. After

searching the fingertips, a model of the fingers and eventually hand can be generated and

this information can be used for hand posture classification.

Figure 2. Finger Model used by Hardenburg and Berard

The end system that was developed had a real time capacity of around 20-25 Hz.

Data from their evaluation shows about 6 frames out of 25 are misclassified with a fast

moving foreground. Accuracy was off in between 0.5 and 1.9 pixels. They have
concluded in their paper that the system developed was simple and effective capable of

working in various lighting conditions.

2.2. Using Marking Menus to Develop Command Set for

Computer Vision Based Gesture Interfaces

The authors Lenman, Bretzer and Thrusson present “Using Marking Menus to

Develop Command Set for Computer Vision” (2002) published in the Proceedings of the

Second Nordic Conference on Human-computer Interaction. This gesture based

interaction will be somewhat a replacement for the present interaction tools such as

remote control, mouse, etc. Perspective and Multimodal User Interfaces are the two main

scenarios discussed for gestural interfaces. First the Perspective User Interface aims for

automatic recognition of human gestures integrated with other human expressions like

facial expressions or body movements. While the Multimode User Interfaces focuses

more on hand poses and specific gestures that can be use as commands in a command

language.

Here they included the three dimensions to be considered in designing gestural

command sets. The first dimension was the Cognitive aspect, this aspect refers to how

easy commands are to learn and remember, therefore command sets should be practical to

the human user. Articulation aspects being the second dimension tackle on how gestures

are easy to perform or how tiring it will be for the user. The last dimension was on the

Technical aspects. This refers to the command sets must be state of the art or futuristic

and will meet the expectations of the upcoming technology.


The authors concentrate on the cognitive side. Here they considered that having a

menu structure would be of great advantage because commands can be then easily

recognize. Pie and Marking menus are the two types of menu structures that the authors

discussed and explained. Pie menus are pop-up menus with included alternatives that are

arranged radially. Marking menus, specifically Hierarchic Marking Menu is a

development of pie menus that allows more complex choices by implementing sub-

menus.

As a test, a prototype for hand gesture interaction was performed. Lenman,

Bretzer and Thrusson have chosen a hierarchic menu system for controlling functions of a

T.V., CD players and a lamp. As their chosen computer vision system was the

representation of the hand. The system will search and then recognize the hand poses

based on a combination of multiscale color detector and particle filtering. Hand poses are

then represented in terms of hierarchies of color image features with qualitative

interrelations in terms of position, orientation and scale. Their menu system has three

hierarchical levels and four choices. Menus then are shown on a computer screen which

is inconvenient and in the future an overlay on the TV screen will be presented.

As for their future work, they are attempting to increase the speed and tracking

stability of the system in order to acquire more position independence for gesture

recognition, increase the tolerance for varying light conditions and increase recognition

performance.

2.3. Computer Vision-Based Gesture Recognition for an

Augmented Reality Interface


Granum et. al. present “Computer Vision-Based Gesture Recognition for an

Augmented Reality Interface” (2004) published in the Proceedings of the 4th

International Conference on Visualization, Imaging and Image processing. It contains or

talks about different areas like gesture recognition and segmentation and etc. that are

needed to complete the research and the techniques that will be used for it. Already there

has been a lot of research on vision-based hand gesture recognition and finger tracking

application. Because of our growing technology, researchers our finding ways for

computer interface to perform naturally, limitations such as sensing the environment with

sense of sight and hearing must be imitated by the computer.

This research is done in one application for a computer-vision interface for an

augmented reality system. The computer-vision is centered on gesture recognition and

finger tracking used as interface in the PC. There structure will project a display on a

Place Holder Object (PHO) and by the use of your own hand the system can create

controls and situations for the display, movements and gestures of the hand are detected

by the Head mounted camera which serves as the input for the system.

There are two main areas of problem and the presentation of their solutions is the

main bulk of the paper. The first was segmentation; the use of segmentation was to detect

the PHO and hands in 2D images that are captured by the camera. Problems in detection

of the hands the varying forms of the hand as it move and the varying of its size from

different gestures. To solve this problem the study used a colour pixel-based

segmentation which provides extra dimension compared to gray tone methods. Colour

pixel-based segmentation creates a new problem on illumination which is dependent on

the intensity changes and colour changes. This problem is resolve by using normalized
RGB also called chromatics but implementing this method creates several issues one of

which is that normally cameras have limited dynamic intensity range. After segmentation

of hand pixels from the image next task is to recognize the gesture, they are subdivided

into two approach first is detection of the number of outstretched hands and second is for

the point and click gesture. For gesture recognition a simple approach is done to resolve

counting of hands is done by polar transformation around the center of the hand and

count the number of fingers which is rectangle in shape present in each radius, but in

order to speed up the algorithm the segmented image is samples along concentric circles.

Second area of concern is detection of point and click gestures. The algorithm in the

gesture recognition is used and when it detects only one finger it represents a pointing

gesture tip of the finger is defined to be an actual position. The center of the finger is

found for each radius and the values are fitted into a straight line, this line is searched

until the final point is reached.

Figure 3. Polar transformation on a gesture image.

The paper is a research step for gesture recognition, it is implemented as a part of

a computer-vision system for augmented reality. The research has proven qualitatively
that it can be a useful alternate interface for use in augmented reality. Also it was proven

that it is robust enough for the augmented reality system.

2.4 Creating Touch-Screens Anywhere with Interactive

Projected Displays

Claudio Pinhanez et. al. researchers of “Creating Touch-Screens Anywhere with

Interactive Projected Display” (2003) published in Proceedings of the Eleventh ACM

International Conference on Multimedia, started few years ago working and developing

systems which could transforms an available physical space into an interactive “touch-

screen” style projected display. In this paper, the authors demonstrated the technology

named Everywhere Display (ED) which can be used for Human-Computer Interactions

(HCI). This particular technology was implemented using an LCD projector with

motorized focus and zoom and a computer controlled pan-tilt zoom camera. They also

come up with a low-end version which they called ED-lite which functions same as the

high-end version and differs only in the devices used. In the low-end version the group

used a portable projector and an ordinary camera. A several group of professionals were

researching and working for a new method of improving the present HCI. The most

common method they make use of for HCI is through the use of mouse, keyboard and

touch-screens. But these methods require an external device for humans to communicate

with computers. The goal of researchers was to develop a system that would eliminate the

use of such external device that would link the communication of human and computers.
The most popular method under research was through computer vision. Computer vision

is used nowadays since it offers a methodology similar to human-human interaction. The

goal in the advancement of technology in HCI is to create a methodology to implement

the said advancement that is more likely to a human-human interaction. IBM researches

used computer vision to implement ED and ED-lite. With the aid of computer vision, the

system was able to steer the projected display from one surface to another. Creating a

touch-screen like interaction is made possible by using techniques and algorithms for

machine vision.

Figure 4. Shows configuration of ED (left), ED-lite (upper right), and sample interactive
projected display (bottom right).

The particular application used by IBM for demonstration is a slide presentation

using Microsoft Powerpoint. They were able to create a touch-screen like function using

devices which was mentioned earlier. The ED unit was installed at ceiling height on a

tripod to cover greater space. A computer is used to control the ED unit and performs all

other functions such as vision processing from interaction and running application

software. The specific test conducted was a slide presentation application using Microsoft
Powerpoint controlled via hand gestures. There is a designated location in the projected

image which the user could use to navigate the slide or to move the content of the

projected display from one surface area to another. The user controls the slide by

touching the buttons superimposed in the specified projected surface area. With this

technology the user interacts with the computer using bare hand and without using such

input devices attached to the directly to the user and computer.

2.5. Interactive Projection

Projector designs are now shrinking and are now just in the threshold of being

compact for handheld use. That is why Beardsley and his colleagues of Mitsubishi

Electric Research Labs propose “Interactive Projection” (2005) published in IEEE

Computer Graphics and Applications. Their work is only an investigation of mobile,

opportunistic projection which can make every surface into displays, a vision to make the

world as its desktop.

The prototype has buttons that serves as the I/O of the device. It also has a built-in

camera to detect the input of the user. Here, it discusses three broad applications of

interactive projection. First class is using a clean display surface for the projected display.

Another class creates a projection on a physical surface. This, typically, is what we call

augmented reality. The first stage is object recognition and the next is to project an

overlay that gives some information about the object. The last class is to project physical

region-of-interest, which can be used as an input to a computer vision processing. This is

similar to a mouse that creates a box to select the region-of-interest, but instead of using a

mouse, the pointing finger is used.


Figure 5. Handheld Projector Prototype

There are two main issues when using a handheld device to create projections.

First, is the keystone correction to produce undistorted projection and its correct aspect

ratio. Keystoning occurs when the projector is not perpendicular to the screen producing

a trapezoidal shape instead of a square. Keystone correction is used to fix this kind of

problem. Second is the removal of the effects of hand motions. Here it describes the

technique of how to make a static projection on a surface even when in motion. They use

distinctive visual markers called fiducials to define a coordinate frame on the display

surface. Basically, a camera is used to sense the markers and to infer the target area in

camera image coordinates and these coordinates are transformed to projector image

coordinates and the projection data is mapped into these coordinates giving the right

placement of projection.

Examples of applications for each main class given above are also discussed. An

example of the first class is a projected web browser. This is basically a desktop Windows

environment that is modified so that the display goes to the projector and the input is

taken form the buttons of the device. An example application of the second class is a

projected augmented reality. The third application is a mouse-button hold-and-drag

defining a Region of Interest (ROI) just like in a desktop but without the use of a mouse.
2.6. Ubiquitous Interactive Displays in a Retail

Environment

Pinhanez et. al. in their work “Ubiquitous Interactive Displays in a Retail

Environment (2003) published in the Proceedings of ACM Special Interest Group on

Graphics (SIGGRAPH): Sketches, proposes an interactive display that is set to a retail

environment. It uses a pan/tilt/mirror/zoom camera with a projector using computer

vision methods to detect interaction with the projected image. They call this technology

the Everywhere Display Projector (ED projector). They proposed using it in a retail

environment to help the customers find and give them information about a certain

product and it also tells where the product is located. The ED projector is installed on the

ceiling and it can project images on boards that are hung on every aisle of the store. At

the entrance of the store, there is a table where a larger version of the product finder is

projected. Here, a list of product is projected on the table and the user can move the

wooden red slider to find a product. The camera detects this motion and the list scrolls up

and down copying the motion of the slider.


Figure 6. Setup of the Project Finder

2.7 Real-Time Fingertip Tracking and Gesture

Recognition

Professor Kenji Oka and Yoichi Sato of University of Tokyo together with

Professor Hideki Koike of University of Electro-Communications, Tokyo worked on

“Real-time fingertip Tracking and Gesture Recognition” (2002) published by IEEE

Volume 22, Issue 6, that introduced method in determining fingertip location in an image

frame and measuring fingertip trajectories across image frames. They also propose a

mechanism in combining direct manipulation and symbolic gestures based on multiple

fingertip motion. Several augmented desk interface have been developed recently.

DigitalDesk is one of the earliest attempts in augmented desk interfaces and using only

charged-coupled device (CCD) camera and a video projector the users can operate

projected desktop application with fingertip. Inspired by DigitalDesk the group

developed an augmented desk interface called EnhancedDesk that lets users performs
tasks by manipulating both physical and electronically displayed objects simultaneously

with their own hands and fingers. An example application demonstrated in the paper was

EnhancedDesk’s two handed drawing system. The application uses the proposed tracking

and gesture recognition methods which assigns different roles to each hand. The gesture

recognition lets users draw objects of different shapes and directly manipulate those

objects using right hand and fingers. Figure 7 shows the set-up used by the group which

includes infrared camera, color camera, LCD projector and Plasma display.

Figure 7. EnhancedDesk’s set-up

The detection of multiple fingertips in an image frame involves extracting hand

regions, finding fingertips and finding palm’s center. In the extraction of hand regions an

infrared camera was used to measure temperature and compensate with the complicated

background and dynamic lighting by raising the pixel values corresponding to human
skin above other pixels. In finding fingertips a search window for fingertips were defined

rather than arm extraction since the searching process in this method is more

computationally expensive. Based on the geometrical features, fingertip finding method

uses normalized correlation with a template of a properly sized corresponding to a user’s

fingertip size.

Figure 8. Fingertip Detection

Measuring fingertip trajectories involved determining trajectories, predicting

fingertip location and examining fingertip correspondences between successive frames.

In the determination of possible trajectories predicting the locations of fingertips in the

subsequently frame is done then compare it to the previous. Finding the best combination

among these two sets of fingertips will determine multiple fingertip trajectories in real

time. Kalman filter is used in the prediction of fingertip location in one image frame

based on their locations detected in the previous frame.


Figure 9. (a) Detecting fingertips. (b) Comparing detected and predicted
figertip to determine trajectories

In the evaluation of the tracking method, the group used Linux based PC with

Intel Pentium III 500-MHz and Hitatchi IP5000 image processing board, and a Nikon

Laird-S270 infrared camera. The testing involves seven test subjects which was

experimentally evaluated the reliability improvement by considering fingertip

correspondences between successive image frames. The method reliably tracks multiple

fingertips and could prove useful in real time human-computer interaction applications.

Gesture recognition works well with the tracking method and able the user to achieve

interaction based on symbolic gesture while performing direct manipulation with hands

and fingers. Interaction based direct on manipulation and symbolic gestures works by

first determining from the measured fingertip trajectories whether the user’s hand motion

represent direct manipulation or symbolic gesture. Then it selects operating modes such

as rotate, move, or resize and other control mode parameters if direct manipulation is

detected. While, if symbolic gesture is detected, the system recognizes gesture types

using a symbolic gesture recognizer in addition to recognizing gesture locations and sizes

based on trajectories.
The group plans to improve the tracking method’s reliability by incorporating

additional sensors. The reason why additional sensors were needed is because the infrared

camera didn’t work well on cold hands. A solution is by using color camera in addition to

infrared camera. The group is also planning to extend the system to 3D tracking since the

current system is limited to 2D motion on a desktop.


2.8 Occlusion Detection for Front-Projected Interactive
Displays
Hilario and Cooperstock creates an “Occlusion Detection System for Front-

Projected Displays” (2004) published by Austrian Computer Society. Occlusion happens

in interactive display systems when a user interacts with the display or inadvertently

blocks the projection. Occlusion in these systems can lead to distortions in the projected

image, also information is loss in the occluded religion. Therefore detection of occlusion

is essential to reduce if not prevent unwanted effects, also occlusion detection can be used

for hand and object tracking. This work of Hilario and Cooperstock detects occlusion by

a camera-projector color calibration algorithm that estimates the RGB camera response to

projected colors. This allows predicted camera images to be generated for projected

scene. The occlusion detection algorithm consists of offline camera-projector calibration

then online occlusion detection for each video frame. Calibration is used for constructing

predicted images to the projected scene. This is needed because Hilario and

Cooperstock’s occlusion detection occurs by pixel-wise differencing predicted and

observed camera images. System is used with a single camera and projector; it also

assumes a planar Lambertian surface with constant lightning conditions and negligible

intra-projector color calibration to be used. Calibration is done by two steps, first is

offline geometric registration which will compute the transformation from projector to

camera frames of reference. It will center the projected image and aligned the images to

the specified world coordinate frame. For geometric registration the paper adopted the

same approach based on the work of Sukthankar et al. which projector prewarping

transformation are obtained by detecting the corners of a projected and printed grid in

camera view. Second step in the calibration process is the offline color calibration. Due
to certain dynamics a projected display is unlikely to produce an image whose colors

match exactly those from the source of image. For us to determine predicted camera

images correctly we must determine the color transfer function of the camera to the

projection. This is done by iterating through the projection of primary colors of varying

intensities, measuring RGB camera response storing it as color lookup table. This

response is the average RGB color over corresponding patch pixels measured over

multiple camera images. Then the predicted camera response can be computed by

summing the predicted camera responses to each of the projected color components.

Camera-projector calibration results are used in the online occlusion detection. It is stated

in their preliminary results that it is critical to perform general occlusion detection for

front projected-display system.

Figure 10. Figure shows the projector image (left), predicted image (middle), and
observed camera image (right).

Figure 11. a.) shadows as the occluded object, b.) direct contact of occluding object with
the display surface.
References
Beardsley, P. (2005). Interactive projection. IEEE computer graphics and applications.
Available: http://www.merl.com/papers/docs/TR2004-107.pdf

Granum, E., Liu, Y. Moeslund, T., Storring, M. (2004). Computer vision-based gesture
recognition for an augmented reality interface. Proceedings of the 4th
international conference on visualization, imaging and mage processing.
Marbella, Spain. Retrieved from http://www.cs.sfu.ca/~mori/courses/cmpt882/
papers/augreality.pdf

Hardenberg, C., Bérard, F., (2001). Bare-hand human-computer interaction. Proceedings


of workshop on perceptive user interfaces, PUI'01. Orlando, Florida. Retrieved
from http://portal.acm.org/affiliated/ft_gateway.cfm?id=971513&type=pdf&coll=
GUIDE&dl=ACM -

Hewett, et. al. (1996) Chapter 2: Human computer interaction. ACM SIGCHI curricila for
human computer interaction. Retrieved from:
http://sigchi.org/cdg/cdg2.html#2_3 retrieved June 2, 2006.

Hilario, M. N., Cooperstock, J. (2004). Occlusion detection for front-projected Interactive


display. Austrian Computer Society. Retrieved from
http://www.cim.mcgill.ca/sre/publications/pervasive.pdf

Lenman S., Bretzner, L. and Thuresson, B. (2002). Using marking menus to develop
command sets for computer vision based hand gesture interfaces. In the
proceedings of the second Nordic conference on Human-computer interaction.
Aarhus, Denmark.
Available: http://delivery.acm.org/10.1145/580000/572055/p239-
lenman.pdf?key1=572055&key2=1405429411&coll=GUIDE&dl=ACM&CFID=
77345099&CFTOKEN=54215790

Myers B., et. al. (1996). Strategic directions in human-computer interaction. ACM
Computing Surveys Vol.28 No.4. Retrieved from http://www.cs.cmu.edu/~bam/
nsfworkshop/hcireport.html

Oka, K. et. al. (2002). Real-time fingertip tracking and gesture recognition. Computer
graphics and applications, IEEE Volume 22, Issue 6. Retrieved from
http://portal.acm.org/.../citation.cfm?id=618944&dl=ACM&coll=ACM&CFID=1
5151515&CFTOKEN=6184618

Pinhanez, C. et. al. (2003). Creating touch-screens anywhere with interactive projected
display. Proceedings of the eleventh ACM international conference on
Multimedia. Retrieved from http://delivery.acm.org/10.1145/960000/957112/
p460-pinhanez.pdf?key1=957112&key2=2832983511&coll=portal&dl=ACM&
CFID=15151515&CFTOKEN=6184618
Pinhanez, C. et. al. (2003). Ubiquitous interactive displays in a retail environment.
Proceedings of ACM special interest group on graphics (SIGGRAPH): Sketches.
San Diego, California. Retrieved from:
http://www.research.ibm.com/ed/publications/sketches03.pdf

Porta, M. (2002) Vision-based user interfaces: methods and applications. international


journal of human computer studies. Elsevier Science Retrieved from
http://vision.unipv.it/research/papers/02p-vbui.html

Turk, M. (1998). Moving from GUIs to PUIs. symposium on intelligent information


media. Microsoft Research Technical Report MSR-TR-98-69. Retreived from
http://www.cs.ucsb.edu/~mturk/Papers/MSR%20TR%2098-69.pdf