Vous êtes sur la page 1sur 5

Walking guide system for the visually impaired by using three-dimensional sound

Makoto Kobayashi* & Michio Ohta**


*Department of Computer Science, Tsukuba College of Technology
4-12-7 Kasuga, Tsukuba City, Ibaraki 305-0821 JAPAN
koba@cs.k.tsukuba-tech.ac.jp

* *Institute of Engineering Mechanics, Tsukuba University


1-1-lTennoudai, Tsukuba City, Ibaraki 305-8573 JAPAN
ohta@paradise.kz.tsukuba.ac.jp
ABSTRACT
In this paper, we describe a walking guide system for the
visually impaired. The system is constructed with an omnivision image sensor and three-dimensional sound processor. The
image sensor consists of a spherical mirror and a CCD camera.
The three-dimensional sound processor presents position
information acquired by an omni-vision sensor to a subject as a
sound source of pink noise, and subject walks in the direction
that the system indicates. In the experiment, we first requested
the subjects to move to the sound source in a virtual space with
operating joystick. Omni-vision sensor was not used in this
experiment. From the results of the experiment, it was clear that
the subjects were able to home to the sound source, so we made
the next experiment with an omni-vision sensor. In the next
experiment, we set an infrared LED marker on the table, and
requested disoriented subjects to walk to the table. As a result,
the three-dimensional sound is available for the walking guide.

1. INTRODUCTION
Most visually impaired people have many troubles when they
want to go outside or walk in a place that they are not familiar
with. Even if they have known the location well, they would
have many problems if there were obstacles on the ground or if
unexpected events occurred. The purpose of our study is to
develop a visual aid system that gives spatial information of
surroundings. We hope the system helps the visually impaired to
avoid such troubles mentioned above. We selected an auditory
sense as a sense of presentation information because the
processing faculty of hearing is next to the visual one. And we
opted for CCD camera for an input device to detect the
information of surroundings. A person who puts on the system
can get information actively, for the camera is mounted on the
head.
There are many computer visual aid systems that give visually
impaired surrounding information. However, most of them have
only verbal expressions or verbal interfaces by voice generator.
Verbal information is very important, but it is also important for
visually impaired people to get non-verbal information, like
spatial information that enables them to recognize their

0-7803-5731-0/99/$10.00
01999 IEEE

environment. Moreover, most existing systems require


infrastructure equipment. That means the visually impaired are
not able to move freely unless the infrastructure exists. So we
tried to develop a non-verbal sound display system that can be
carried. As a research project of that kind of system for visually
impaired people, we can cite Peter B. L. Maijer [2]. His portable
system is able to convert images to sound by certain formulas.
However, it is not an intuitive method. Furthermore, a research
project of navigation system for the blind by Jack M h o m i s et
al. [3][4] is useful for trips to places they dont know well. The
system navigates for the blind by three-dimensional verbal
information. But a large amount of mapping data is necessary for
the system to run.
To evaluate our system described above, we had made some
experiments which measured subjects ability to recognize
shapes of simple bodies and letters, such as ABC [l].The
system detects positions on the central scanning line of images
acquired by head mounted camera and presents sound sources by
using stereo effects. We found that subjects can distinguish
simple objects by the system through these experiments, but it
took too much time and a sound expression of the system was
not accurate enough. In order to make sound information more
accurate and intuitive, we considered that it is better to use a
three-dimensional sound effect. Thanks to recent remarkable
advances of computer technology, we can readily produce threedimensional sound by various sound processor devices
associated with Digital Signal Processor (DSP). They usually
implement Head Related Transfer Functions (HRTFs). In
addition, looking at the daily life-style of visually impaired
people, we realized the importance of actions such as searching
for something, moving somewhere and following somebody
rather than a case of recognizing static objects. Therefore, we
decided to develop a system that guide the visually impaired by
three-dimensional sound when they are walking.
In this paper, we describe two experiments and results. In the
first experiment, to confirm that a subject moves to the sound
source generated by the processor, we ordered subjects to move
to a sound source in a virtual space. The subjects operate a
joystick and then, the sound source comes near if they approach
the position correctly. In the other experiment, subjects mounted
an omni-vision sensor on hisher head and walked practically. By

1-126

an image from the sensor, the system detects a direction and the
distance of an LED marker, and generates a three-dimensional
sound source from the position information. The subjects can
walk to a marker relying on the sound source generated by the
system.

LoudsDeakerl

LoudsDeakeR

2. PRINCIPLE OF 3D SOUND
In this chapter, the principle of three-dimensional sound
processors is simply described. Most three-dimensional sound
processors use Head Related Transfer Functions (HRTFs) to
produce three-dimensional sound. The basic principle is shown
in Fig.1. In the figure, HCjw) indicates transfer functions from a
sound source image produced by the processor to the subject's
ears, and
GCjw) indicates a transfer function from the
loudspeakers to the subject's ears, and SGw) indicates output
signals from the loudspeakers or a sound source image. The
subscripts R and L mean sides of the ears to which the sound
reaches, and the superscripts 1 and 2 mean the number of the
loud speakers. Here, the convolutions between the output signals
of sound source image and its transfer functions can be
described by the other functions:

S ( j w ) - H , ( j w ) = S ' ( j w ) * G A ( j w )+ S2(j w ) - G ; ( j w )

(a) Loudspeaker mode

Sound source

@) Headphone mode

Fig.1: Principle of three-dimensional sound


3. EXPERIMENT ON HOMING TO SOUNDS IN
VIRTUAL SPACE

3.1 Experimental system

S( j w ) * HL(j w ) = SI(j w ) -Gi(io)+ S (j w ) G,' ( j w )


Thus, the output signals from the loudspeakers are
described by the following.
S ' ( j 0 ) = (G:( jw). HL(jw) -G,' (jw) .HR(jw))Y(jw)lX(jw)
S'(jw) = (-G;(jw) .HL(jw)+G~(jw).H,(jw))Y(jw)/X(jw)

X ( j w ) = Gi(jw).G;(jw) -G&w)*GZ(jw)
Needless to say, by using a set of headphones, we can briefly
describe the output signals from them:

S ' ( J w ) = HL(jW).S(jO)

Fig. 2 shows a schematic of our experimental system in a virtual


space experiment. The system consists of a computer, a joystick,
a three-dimensional sound processor and an amplifier, a set of
headphones. During the experiment, a subject operates the
joystick and gets sound information through the headphones. We
have selected RSS-10 (made by Roland) as a sound processor
since it is easy to control by a computer. And we have chosen
headphone mode to make the experimental environment equal
for each subject. If we used loudspeakers, it would be difficult
for a blind subject to fix hisher head on the center of the
loudspeakers.
The functional architecture of this system can be schematized as
follows. At first, a monophonic sound source is played by the
computer, and input to the three-dimensional sound processor.

30 sound processor

In usual cases, each transfer function is measured in advance by


impulse response method with a probe microphone [SI. We can
give a sound image to the subject by digital filtering of an
arbitrary sound signal. The digital filter is designed from
HRTFs.
However, each person's HRTFs differ individually, and it is said
that a subject sometimes misjudges the direction of front and
rear, since most three-dimensional sound processors use HRTFs
that have average characteristics.

Joystick G o r n w t e r
J
Position info.

Fig.2: Schematic of experimental system

1-127

As a sound source we have selected 44kHz sampling-rate, 8bit


ranged pink noise whose central frequency was 10kHz because
it is said that sound which has a wide band width of frequency
area is suitable for sound localization and over 8kHz sounds are
distinguishable with respect to localization [6].Then, when the
subject moves the joystick, the computer takes its position
information and sends a control signal to the RSS-10
synchronizing the motion. By the control signal, the RSS-10
modifies the monophonic sound to three-dimensional
stereophonic sounds using HRTFs and outputs them to the
subject through the headphones. Finally, the subject recognizes
that sound source position. The subjects can change their
orientation and go forward or backward by operating the
joystick. Therefore, they can move anywhere they want on a
horizontal plane in the virtual space.

3.2 Experimental procedure


Using the experimental system explained above, we conducted
subjects to move to the sound source that was produced by
sound processor in the virtual space. We ordered them to go to
the sound position as fast as possible by operating the joystick.
In the procedure of this experiment, they can decide themselves
when the experiment starts since if they push a button of the
joystick, the sound source will be presented. Then, when they
reach the sound position, i.e., they feel the sound source is in
their head, they pull the joystick trigger so that the sound will be
stopped. We recorded reaching time from start to end, and
plotted trajectories from an original position to a final position
where subjects thought the sound source was.
There were 24 times trials in this experiment after proper
practice. We prepared 12 directions of the sound position on the
same circle line whose diameter was three meters on the
horizontal plane. Each sound is separated by 30 degrees and a
sequence of the position was randomized. The subjects were 4
sighted persons, 1 visually impaired and 3 congenitally blind
persons who use braille everyday. They ranged in age from 20 to
47 years.
Additionally, we made the same style experiments with visual
feedback by subjects who were sighted. The aim was to compare
the results in an experiment with only sound cue. In this style,
the sighted subjects could see sequentially where the sound
source was and where they were in the virtual space through a
monitor of the computer like a video game.

3 3 Results of virtual space experiments


The results of virtual space experiments were the following. A
mean of reaching time with only sound cue was 16 sec. And in
the trials with visual feedback by sighted subjects, the mean time
was 8.2 sec. Each reaching time and its mean are shown in Fig.3.
Though we cannot use statistical method because of the small
number of subjects, the results show us that it takes more time to
reach sound source positions without visual feedback. However,
from trajectory data, we describe that subjects homed to the
sound source in any direction without hesitation. A few

El with visual feedback

sound cue only

35 p"-

-___
_I__ ___I_-___

30
25

.E 20
r" 1 5

10

5
0

LS

KK

XY

HT

TM

EN

MA

AM

Mean

Subects

Fig.3: Reaching time in a virtual space experiment


examples are shown in Fig.4. In the figure, the (a) and the (b) are
examples by congenitally blind subjects. The (c) is data of a
sighted subject in the sound only condition and the (d) is with
visual feedback. After all, it is clear that the sound source
generated by RSS-10 can lead subject to arbitrary positions.
Even if the subjects deviated from the direction of the sound at
first, they made an orbit correction as they closed in on the
sound.
Moreover, we are able to say that there is not so much difference
between subjects who were sighted and those who are visually
impaired in sound only condition in regard to both reaching time
and trajectories. It means that a sighted person can be readily
guided by a sound source in spite of not having a special hearing
sense or skills like congenitally blind people.

4. EXPERIMENT ON HOMING TO SOUNDS IN


REAL WORLD
4.1 Experimental system
Because the results of the virtual space experiment suggest to us
that a generated sound source is available for walking guidance,
as a next step, we made an experimental system, shown in Fig.5,
for an experiment in the real world. A subject mounts on hisher
head an omni-vision image sensor that is constructed with a
spherical mirror and a CCD camera. The image sensor sends
image data on an electric wave. The lens of the camera is
covered by an infrared-filter, and a marker that is on the table
flashes an infrared light by LED. In the experiments, the
computer receives the image data from a head-mounted sensor
on the electric wave, and detects a position of the LED by
certain image processing. After calculating a direction and
distance of the LED, the computer sends a control signal to the
three-dimensional sound processor, RSS-10, and the processor
make three-dimensional sound. Since sound data is transmitted
through air in the same way as image data, the subjects can walk
freely.

I -128

100

200

300

i ......yjo.l
....................................................

100

200

300

@) Trajectories of subject MA

(a)Trajectories of subject EN

-100

400

0 1 0 0 2 0 0 3 0 0 4 0 0

50
100
150

200

250
300
350

(c) Trajectories of subject LS sound only

(d) Trajectories of subject LS with visual feedback

Fig.4: Examples of trajectory data in a virtual space experiment


At the image processing, we applied a combined process of edge
detection and threshold comparing. And we set height data of
each subject to the computer prior to beginning the experiment
because a distance to the LED was calculated by using the
difference of height between the spherical mirror and the LED.

4 3 Experimental procedure
The number of subjects who took part in the experiment was
three visually impaired, including one congenitally blind. Two
visually impaired subjects put on an eye mask for the experiment.
We equipped the head-mounted sensor to their head and
disoriented them before the experiment started. They were
requested to search for the sound source and to reach it. The
walking trajectory and the reaching time were recorded by a
digital video camera that was fixed on the field by a tripod. In
advance, we explained to them that a table was the goal and the
sound source represented it. The distance between the original
position where the subject stood and the table was 1.5m. In the
experiment, we tried seven patterns of directions for each subject.
They were separated by 45 degrees. We did not try directly
behind because a prop that supports a spherical mirror masks
that direction.

4 3 Results of mal world experiments


Consequently, all subjects were able to touch the table in all
trials. Nobody lost hisher way to the table. Examples of walking
trajectory data are shown in Fig.6, and means of reaching time
with respect to each subject are shown in Table 1. In this figure,
white footprints represent left feet and blacks represent right.
Bottoms of each vertical line are starting points and tops of them
are positions of the table. Intervals of each horizontal line are 50
cm.Left trajectory of subject TM and that of subject M A show

3D Scund Processor

Fig5 Schematic of experimental system

I -129

trials with setting the direction of the table to the right. And right
trajectory of subject TM shows the case of the left position.
Those results show us that subject RK and TM are fast and
subject MA is a slow stepper. At the beginning of the experiment,
all of them rotated their head or body to sense the direction of
the sound source. Then, Subject RK and TM walked straight in
the direction that they determined to go without hesitation. On
the other hand, subject MA continued to rotate the head to sense
and confirm the sound position during locomotion. We think that
tendency is mainly caused by the individual characters of each
subject. Subject MA is a man of caution, so he walked slowly
and deliberately, and he was afraid of hitting the table. We think
another reason may be that MA is a congenitally blind subject.
We cant discuss it since the number of subjects was so small,
however, it was possible the results were influenced by the fact
that those eye-masked visually impaired subjects had some
information about the surroundings prior to the experiment and
the congenitally blind subject had less information than the
others. From the point of view that the original purpose for
developing the system is to avoid troubles when the visually
impaired person walks, we have to solve the problem of the
subject walking deliberately using the system.
After the experiment, we took subjects comments. They said
that they feel less accuracy about the distance of the sound
source and themselves in this experiment than in the virtual
space experiment. We think the reason why is that in the real
world experiment they could touch the table before they felt they
were close enough to the sound source. In the virtual space
experiment, they felt the sound source in their head when they
moved to the position of the sound source. Besides, they felt the
sound source is behind them when they went beyond the
position. Those factors might make them have the feelings of
distance to the sound. However, it is very important to express
an absolute distance of sound source and in general, hearing
sense about absolute distance is not linear and not accurate [7].
Therefore, to develop an effective or intuitive method to express
accurate distance from subject to a sound source is a crucial
point in our future research.

subject TM

subject MA

00:Left step
0:Right step
Fig.6: Example of walking trajectories
Table 1:Mean time to reach the table
Subject

RK

TM

MA

Mean
time [sec]

5.1

6.0

69.1

6. REFERENCES
M.Kobayashi, H.Suda, M.Ohta, On an acoustic visual aid
system which enables active acquisition of images, Proc.
ofAsian Control Conference ,1994, pp.957-960
Peter B. L. Meijer, An Experimental System for Auditory
Image Representations, IEEE Trans. on Biomedical Eng.,
V01.39,N0.2 1992,pp.112-121
Jack M. Loomis, C.Hebert, Jack.G.Cicinelli Active
localization of virtual sounds, J. Acoust. Soc. A m , VoI.88,
NO.4 1990 pp.1757-1764

5. CONCLUSION

Jack M.Loomis, et al.,Personal Guidance System for the


Visually Impaired, Proc. of The First Annual ACM
Conference on Assisfive Technologies (ASSETS94),
1994,pp.85-91

Toward developing an acoustic walking guide system using


three-dimensional sound, we conducted experiments. From the
results of those experiments, we conclude that though there are
some problems to be solved, a three-dimensional sound
technology is available to guide the visually impaired. The
virtual three-dimensional sound source produced by a processor
can lead people who dont have visual information to an
arbitrary place.
Our future consideration points are how we make an effective or
intuitive method to present sound information. When we succeed
in solving the problem, the acoustic aid system can not only
serve as a walking guide but also give support to play sports for
visually impaired people.

Jens Blauert, translated by John S. Allen Spatial Hearing,


The MIT PressJ983
Hisao Sakai, Takeshi Nakayama, Choukalru to Onkyou
shinri, The Acoustical Society of Japan: Corona PressJ992
Jack M.Loomis,Roberta LKlatzky, et al., Assessing
auditory distance perception using perceptually directed
action, Perception & Psychophysics, V01.60, No.6,
1998,pp.966-980

I -130

Vous aimerez peut-être aussi