Académique Documents
Professionnel Documents
Culture Documents
Abstract nize characters in human living space and read out the
text information for the user. Some robots with char-
Disability of visual text reading has a huge impact acter recognition capability have been proposed so far
on the quality of life for visually disabled people. One [2, 3, 4, 6]. Iwatsuka et al. proposed a guide dog sys-
of the most anticipated devices is a wearable camera tem for blind people [2]. We presented a text capturing
capable of finding text regions in natural scenes and robot equipped with an active camera [8]. Our robot
translating the text into another representation such as can find and track multiple text regions in the surround-
sound or braille. In order to develop such a device, ing scene. In these robot applications, camera move-
text tracking in video sequences is required as well as ment is constrained by some simple and steady robot
text detection. We need to group homogeneous text re- movements. On the other hand, a wearable camera can
gions to avoid multiple and redundant speech syntheses be moved freely. Thus, a robust text tracking method is
or braille conversions. needed to develop a text capturing device.
We have developed a prototype system equipped with Some duplicate text strings may appear in the con-
a head-mounted video camera. Text regions are ex- secutive video frames. Recognizing all the text strings
tracted from the video frames using a revised DCT fea- in the images is a waste of time. More importantly, the
ture. Particle filtering is employed for fast and robust camera user would not want to hear repeatedly a syn-
text tracking. We have tested the performance of our thesized voice originated from the same text. Merino
system using 1,000 video frames of a hall way with eight and Mirmehdi presented a framework for realtime text
signboards. The number of text candidate images is re- detection and tracking and demonstrated a system [5].
duced to 0.98%. The text tracking performance was not so satisfactory
and a lot of improvements are still needed.
In this paper, we present a wearable camera system
1. Introduction that can automatically find and track text regions in the
surrounding scenes. The system is equipped with a
head-mounted video camera. The text strings are ex-
We human beings make the most of text information
tracted using the revised DCT-based method [1]. The
in surrounding scenes in our daily lives. Disability of vi-
text regions are then grouped into image chains by a
sual text reading has a huge impact on the quality of life
text tracking method based on particle filtering.
for visually disabled people. Although there have been
In Section 2, we present the overview of the wearable
several devices designed for helping visually-impaired
camera system and the algorithms used. In Section 3,
people to “see” objects using an alternative sense such
the text tracking algorithm is given. Section 4 describes
as sound and touch, the development of text reading de-
experimental results and performance evaluations.
vices is still at an early stage. Character recognition for
the visually disabled is one of the most difficult tasks
since the characters have complex shapes and are very 2. Wearable Camera System
small compared with physical obstacles. One of the
most anticipated devices is probably a wearable cam- 2.1. Overview of the system
era capable of finding text regions in natural scenes and
translating the text into another representation such as Figure 1 shows the prototype of the wearable camera
sound or braille. system which we have constructed. The system con-
An alternative device is a helper robot that can recog- sists of a head-mount camera (380k-pixel color CCD),