Vous êtes sur la page 1sur 6

FACEAND GESTURE

EFFICIENT REXOGNITION TECHNIQUES


FOR ROBOTCONTROL
Chao Hy Xiang Wang, Mrinal K. Mandal, Max Meng, and Don& Li
Department of Electrical and Computer Engineering
University of Alberta, Edmonton,AB, T6G 2V4, Canada
Email: {hcsfds,mandal, max.meng}@ee.ualbertarta.ca

Abstract features. Wu et al. presented an automatic feature


This paper presents a visual recognition system for extraction algorithm [2] that uses a second chance region
interactively controlling a mobile robot. First, the robot
growing method to estimate the face region. A genetic
search algorithm is then applied to extract the facial
identiJies the operator by humanfacial recognition, and then
features. Fong et al. presented a virtual joystick technique
determines the actions analyzing the human hand gestures.
with static gestures to drive remote vehicle [7]. Here, hand
For facial recognition, the adaptive region-growing
algorithm isproposed to estimate the locatim offace region.
motions are tracked with a color and stereo vision system.
A genetic algorithm is then applied to searchfor the accurate
Moy proposed a technique for visual interpretation of 2 D
dynamic hand gestures in complex environments. The
facial feature positions. For gesture recognition, we use
technique is based on hand segmentation and feature
adaptive color segmentation, handfinding and labeling with
blocking, morphological filtering, and gesture actions a=
extraction, and used for humans to communicate and
interact with a pet robot [8]. Iba et al. proposed an
found by template matching, and skeletonizing. The results
architecture of Hidden Morkov Model (HMMs) for
show 95% correct recognition ratio compared to less than
90% mentioned in other papers.
gesture-based control of mobile robots [9]. For robot
control, the performances of these existing techniques are
Keywords: Face and Gesture Recognition, Vision, Robot
not satisfactory. Some techniques have very high
Control complexity and others provide poor recognition.
In this paper, we present a novel visual human-
machine interface for a mobile robot moving in the public
1. INTRODUCTION places. A two-step procedure is employed to detect the
Mobile robots are becoming popular in automated action control input from the operator. As shown in Fig.1,
applications, such as the service robots in public places. the face of the operator is first identified from the captured
Efficient techniques are needed to control the robots so image. The action gesture of the operator is then
that the desired tasks are performed. Typically, the robots determined using a novel gesture recognition technique.
are controlled by human operator using input devices, Video Camera
such as keyboard, mouse, sensor gloves, and wireless
controller, based on the field messages received from the
video camera and other sensors o f the robot. These
methods have the drawbacks of indirect and unnatural Mobile
communication between the operator and the robot. Hence Robot
it is desirable to develop more friendly and effective
interactive tools between the human operators and robot.
A few face and gesture recognition techniques have
been proposed in the literature. Alattar et al. developed a
model-based algorithm for locating facial features [l]. Figure 1. Robot control by face and gesture recognition.
The algorithm estimates the parameters of the ellipse The organization of this paper is as follows. In section
which best fits the head view in the image and uses these 2 we introduce the methods and procedures for human
parameters to calculate the locations of the facial features. facial feature recognition. In section 3, we present the
It then refines locations by exploiting projections of the gesture recognition technique for robot control. The
pixels in windows around the estimated locations ofthe results are also shown in section 2 and section 3,
respectively.
CCECE2003 - CCGEI 2003,Montreal, May/mai 2003
2003 IEEE
0-7~03-77~1-~/03~$i7.00 o
- 1757 -

Authorized licensed use limited to: Military College of Signals. Downloaded on April 23, 2009 at 14:52 from IEEE Xplore. Restrictions apply.
2.2 Estimating and refining the feature
2. HUMAN FACE RECOGNITION
locations
In order to identify the robot operator, we must locate
the facial features. The recognition procedure can he The feature locations are estimated according to
anthropometric model. Eye and mouth regions are
divided into three steps as shown in Fig. 2. Face region is
located first, and then facial features are estimated by estimated based on the face model [ I ] . Then nose
projection analysis. According to estimation location, position is estimated based on the model of face and the
refined positions of eyes and mouth.
genetic algorithm is applied to extract the accurate feature
locations.

Figure 2. Steps for face recognition.

2.1 Preprocessing Figure 4. (a) Small region (b) Proper region


In the face estimation step, the location of the face
A
region is estimated by adaptive region-growing algorithm.
Because we make the face region locate roughly in the
center of an image, the algorithm can be performed by ..
selecting the central point of the image as an initial seed.
After the region is grown from the seed, its size must he
checked to make sure that it is a reasonable face region. . ,. . ,.. .c
Assume that the initial seed is represented by SO, and the
region grown from SO is R. The size of R, denoted by /RI,
should be restricted to a hounded range, that is R I 51R ~ 2 ,
Figure 5. Spiral function.
where R1 and R2 are predefined constants. As a fact, the
size of the grown region depends on the predetermined
growing threshold. When the contrast between the face For the refining procedure, we can let x, y he the pixel
region and the background region is low and the threshold location and f(x,y) the gray-level of the pixel. The x
is set larger than a suitable value, the associated grown projection h (x) and t h e y projection v(y) in rectangular
region will cover some background regions, and then windows are used to find the accurate feature locations.
JRI>R2 (see Fig. 3a). In this case, the threshold should he These rectangular windows are defined according to
decreased to reduce IR/. Sometimes it is possible that the anthropometry of the face model and the estimated
central point of the facial image is located in a small locations of the features. And h(x) and v(y) are calculated
bright or dark region, then R will be that region, instead of asfollows:
h
the face region, and IRI<Rl (see Fig. 4a). This problem
can he solved by selecting a new seed using spiral h(x) =
Y=O
(x,y) (1) cf
function as shown in Fig. 5 . When the face region is v
extracted, a minimum rectangle can be found to surround V(Y) = C f ( x , y ) (2)
it, within which the following feature extraction algorithm 1 4

will be performed. Fig. 6 shows the y-projection and x-projection of eyes.


Note that for eyes, the valleys of v(y) and h ( x ) occur at
the irises, and therefore indicate the vertical and horizontal
coordinates of the two eyes. Hence, we can use these
coordinates to refine the estimated positions of the eyes.
Similarly, we can get refined positions of nose and mouth.
Fig. 7 illustrates the results of the refined positions.
2.3 Template matching by genetic algorithm
First, a generic feature template[Z] is defined to extract
Figure 3. @)Largeregion (b) Proper region all critical facial features. The template is a square with an
extended area, which can avoid mistaking eyebrow with

- 1758 -

Authorized licensed use limited to: Military College of Signals. Downloaded on April 23, 2009 at 14:52 from IEEE Xplore. Restrictions apply.
Then, the subregions are defined according to the
refined positions of the features. But in order to avoid
mistaking the nose with the mouth, the mouth region
should be defined after both eye feature points have been
found, which is the shadow part in Fig. 8. (e, is the central
point of the left eye, e, is the central point of the right eye
and the e, is in the middle of the el and q.)

0 2 4 6 8 10 12 14 16 18 20
Y-coordinate
-Y Proi%ction

(a) Y-Projection

I XProiection idtheEyeWindow

Figure 8. Mouth subregion.


-JM
3c <:w Last, the genetic algorithm (GA) is used here to
.-v IUX reduce the computational complexity of template
K
El
IIm matching. The solution space S in this application is an
image subregion, and every image pixel in the subregion
0
0 5 IQ I? iP i' >n I! .Lo 4' 50 is an element in S , each of which has a fitness value. First,
X-coordinate initial elements are formed, and then these elements are
-
-- Xprojeaion
-1hrenholded Xproiection
evolved to other elements by performing two genetic
operations (uniform crossover and mutation). In this paper,
we reduce the iteration number by selecting the initial
(b) X-Projection elements using the spiral function shown in FigS. The
Figure 6.Projections of eye window. elements with larger fimess values have higher probability
to be kept in the next generation and to propagate their
offspring. Therefore, the accuracy of positions improves
with each generation.
After a suitable number of generations, we can find
reasonably accurate positions of features. Analyzing these
accurate feature positions, the robot can identify the
operator. The final results are shown in the Fig. 9.

Figure 7. The estimated and refined positions

eyes or mistaking nose with mouth. Every possible feature


point within the feature region is evaluated based on the
predefined cost functions associated with the feature
template and its matching value. The candidate point with
the best matching value in each feature region is selected
Figure 9. Feature locations after template matching
as the associated feature point.
using genetic algorithm

- 1759 -

Authorized licensed use limited to: Military College of Signals. Downloaded on April 23, 2009 at 14:52 from IEEE Xplore. Restrictions apply.
3. GESTURE RECOGNITION From x, y, z, we can get

The gestures should he clear, natural, and easy to H = tg- I(-)


Y
X
demonstrate. For vehicle robot movement, we define
L =z
seven 2-hand gestures corresponding to robot actions:
“Turn-left”, “Turn-right”, “Move-forward”, “Move-hac!?,
“Increase-speed”, “Decrease-speed”, and “Stop” (as
J5
S = 1- -mm(R,G,B)
L
.
shown in Fig. IO. The fist is a sign, used for activating the
The segmentation is realized with the global threshold
gesture and for action “stop”. Other 6 actions are
by controlling the absolute errors between the HLS values
determined by the direction formed by index finger and
of the pixels and the preset average HLS (Ha, La, Sa)
the middle finger.
values. Let G(x, y ) denote the binary value of the
segmented image ( 1: inside the segmented regions, 0:
outside). G(x, y ) is calculated as follows:
1 ifabs(H(xy)-Ha)< H , &
abs(L(x,y)-La)<L, & abs(S(xyJ-Sa)<S, (3)
G ( x , Y )=
@)Turn-left @)Turn-right @)Move-forward(d)Move-hack 0 otherwise
where H E LE and Sr are the thresholds.
It has been found that HLS method provides a better
performance than RGB, hut it is slower.

3.1.2 Adaptive Segmentation


The HLS values of the hands depend on the
Figure 10. Gestures for I actions.
illumination conditions, and hence we do not achieve
The processing steps for gesture recognition are good performance if the segmentation parameters are
shown in Fig. 11. After capturing the image, the hands are fixed, as shown in Fig. 12(h) (lower threshold) and Fig.
segmented, and noises are filtered. The hands are then 12(c) (high threshold). Therefore we use an adaptive
labeled. Finally the rohnt action is identified by proper segmentation technique. The image is first segmented
gesture recognition methods. with a set of pre-set average HLS values and thresholds.
The average values Ha, La, Sa are then calculated again
Image Morphl. Hand Gesture using the regions found in the first step. This procedure
Segmt. Filtering Label. Recogn. goes on several times. Finally the image is segmented
pretty clearly as shown Fig. 12(d). Furthermore,
Figure 11. Processing steps for gesture recognition. thresholds HE Ln Sr are determined by 3 0 (HLS deviation
values of the hands) rule. Fig. 13 shows the segmentation
3.1 Image Segmentation results (from original images to segmented ones) of
different gestures with this adaptive method.
Several image segmentation techniques have been
reported in the literature. Here we employ color and
adaptive techniques to perform the segmentation.
3.1.1 Color Segmentation
For color segmentation, we employ HLS color space
instead of traditional RGB space. HLS means hue(H),
lightness(L), and saturatinn(S). To change RGB to HLS Figure 12 Segmentation with fued and adaptive thresholds.
space, we fust have
1 3.1.3 Morphological Filtering
x = -(2R-G-B)
& The segmented images generally have noise. Hence
1 we employ morphological operations, opening and closing,
y = -(G - B)
Jz to filter out the noises. Fig. 14 shows two original images
and the corresponding filtered images. It is observed that
1
z = -(R + G + B ) the gestures are improved by the filtering operations, and
J5 the noises have been eliminated.

- 1760 -

Authorized licensed use limited to: Military College of Signals. Downloaded on April 23, 2009 at 14:52 from IEEE Xplore. Restrictions apply.
simplify the problem to find the gesture action by
analyzing the index hand.

3.3 Recognition Methods


(a) Original images After the fist and the index hand are found, we analyze
the index hand to recognize the gesture. Two methods are
applied: skeletonizing and template matching, which are
presented in the following.

3.3.1 Recognition by skeletonizing


(b) Segmented images We use Zhang-suen transform for skeletonizing, and
Figure 13. Adaptive segmentation. the performance is shown in Fig. 17. When the skeleton
has no branch, the gesture curve is represented by 2
vectors, relative to thumb and fingers, using linear fitting
method. When there are branches, three longest skeleton
curves are changed to three vectors. Physically, these 3
vectors represent fingers, thumb, and part of the palm to
the arm. Because the gesture action is determined by the
fingers, the action rules can he determined by analyzing
Figure 14. Segmented images after morphological filtering. these vectors.

3.2 Hand Finding and Labeling


The segmented image may have more then two bright
regions. Hence we need to determine which regions (a) Without branches
represent these two hand regions. Here we design an
effective and fast method, called rectangle blocking. As
shown in Fig. 15, the row and column lines with maximum
bright pixels are first found for each bright region. The
hand rectangles are then found by the boundary extended @)With branches
from these maximum row and column lines. Figure 17. Vector representation of skeletons
The recognition performance is good. Out of the 18
images, only one is not correct, providing an efficiency of
94%. The error is owing to a noise hole inside the index
hand. Hence it is necessary to remove any noise inside the
index hand to apply the skeletonizing.
Figure 15 (a) Rectangle block (b) Hand finding
For the 2 hands, one is just a fist, and we denote 3.3.2 Template Matching.
another one as "index hand" which determines the robot Eighteen training templates are created with rectangle
action. The fist is smaller and pixels inside the rectangle hand blocks after hand labeling. All training gesture
block have more bright to dark pixel ratio than the index blocks with different width and height are changed to
hand. Thus the hands are laheled as shown io Fig.16.
square ones with SOX 80 pixels using geological transform
(as shown in Fig.l8), where x ' = x * j 2 / j , , and
y'= y * i, /i, . The resulting images are shown in Fig. 19.

(a) Hand labeling @)Index hand (e) Index Hand


Figure 16. Hand Labeling
The real action can be determined by the index hand
if two hands are found in the image. In this case, we can
Figure 18. Geological transform

- 1761 -

Authorized licensed use limited to: Military College of Signals. Downloaded on April 23, 2009 at 14:52 from IEEE Xplore. Restrictions apply.
In addition, we use 1 3 test images 1j=l,2, ..., 13) genetic algorithm, the computational complexity is
also changed to same square templates. We match each reduced significantly, while the accuracy is bener than
test image with 18 training templates .Sj(i=1,2,..., 18). The other algorithms. For gesture recognition, the optimal
matching error Rjj between the test image and training procedure is with HLS segmentation, morphological
templates is calculated as follows: filtering, hand block labeling, geological transform and
template matching. It provides good correct recognition
(4) ratio, robustness and speed. The simulation results
demonstrate that the proposed techniques provide a very
good performance.

References:
[l] k M. Alattar and S.A. Rajala, “Facial features localizatonin h n t
new head and shoulders images,” Pmc. ofICASSP, Vol. 6, pp. 3557-
35-50, P h e U S A , 1999.
[2] C. H. Lin and I. L. Wu, “Automatic facial f e extraction by
genetic algorithms,” IEEE Tram Image Processing, vol. 8, :no. 6, pp.
834-845,June 1999.
[3] K M. Lam and H. Yan, ”Location and exhaction the eye in human
k c images,” P d t m Recognition,vol. 29, no. 6, pp. 771-779: 1996.
c
Figure 19. Square templates. [4] Olivelti Research Labomtory hce database,
hm:llwww.ukresearch.an“ifacedatabase.hh
The action of the training template that provides the [5] K S.Tang, K F. M q S.Kwong and Q. He, “Genetic algorithm
minimum matching error is used as the identified action. and their applications,” IEEE S i p / Processing Mzgazine, vol. 13, pp.
The results are perfect with 100% correct decisions. This
U-37, Nov. 1996.
method provides excellent recognition ratio if there are
[6] M. Srinivas and L. M. Pam& ‘’Genetic algorithm A~ surfey;’
enough training templates. In addition, the computational
~ ~ Vol. ~ 6, pp.
t e27, No. , 17-26, June 1994.
complexity of this method is not high. Using C-program,
gesture recognition is done in about 0 . 8 second for the [AT.Fong, F. Con& S. Grange and C. Baur, “Novel interfaces for
whole recognition procedure. m o t e driving: Gesture, haptic and PDA”,h c . ofSPLE, Vol. 4195,
pp.300.311, Boston,200l.
[8] M. C. May, “Gesture-based Interaction with a Pet Rotat,” Proc.
4. CONCLUSIONS
of 16th National Conference on Artificial Intelligence, pp. 628-
In this paper, we have presented efficient techniques for 633, Orlando, USA, July 18-22, 1999
face and gesture recognition for robot control. The face [9] S. lba, W. J. M. Vandq C. 1. J. Paredis, and P. K Khwla, “An
recognition algorithm is robust to variations in subject Architecture for Geshm-hased Control of Mobile Robot$” Proc. of
head shape, eye shape, age, and motion such as tilting and the I E E E / W International Conference on Intelligent Robots
nodding of the head. Because we use the projection andsystems (IROS), Vol. 2, pp. 851-857, Oct 1999.
analysis to estimate the feature locations before applying

- 1762 -

Authorized licensed use limited to: Military College of Signals. Downloaded on April 23, 2009 at 14:52 from IEEE Xplore. Restrictions apply.

Vous aimerez peut-être aussi