Académique Documents
Professionnel Documents
Culture Documents
DOI 10.1007/s10209-008-0135-y
LONG PAPER
1 Introduction
There is a clear need for finding new dimensions and
applications for voice in order to impart a sense of performance and expressivity to existing multimedia
applications and to augment the awareness of vocal technologies, which have hitherto been superseded by visual
technologies. Anne Karpf explains this in terms of peoples
daily visually-biased expressions;
S. Al Hashimi (&)
Department of Arts, University of Bahrain,
P.O. Box 32038, Sakhir, Kingdom of Bahrain
e-mail: samaa.alhashimi@gmail.com
123
66
2 Related work
In many of the few audio-physical applications which exist
today, physical movement is what controls sound, and not
the opposite. Some of the developers of these applications
map various aspects of sound either to the physical
movement of the user or to the physical movement of real
objects. Reversing this relationship and employing sound,
or even voice, as a controller of real inanimate objectsin
which is referred to here as a voice-physical application
is not as common and whenever it is implemented, it is
often linked with and restricted to android (humanoid
robot) behavior in reaction to spoken commands. This
section first explores the mappings that some developers
established between real physical parameters as input and
sound as an output, whereby changing certain parameters
of real objects (position, size, etc.) controlled certain
characteristics of the generated sound before investigating
how reversing the role of soundand especially voice
can add a more ludic and magical dimension to the interactive experience. It may hence inspire users creative
interpretations, improvisations, and interactions.
2.1 Audio-physical work
Among the projects that map sound parameters to the
position of real objects is the performance instrument Audiopad, which allows for live electronic music
composition. Developed by Patten and his colleagues at the
MIT [14], the interface is projected on a tabletop surface
and is controlled by pucks.
The spatial position of the pucks on the table determines
certain properties of the sound. Each puck is associated
with the number of samples that the performer intends to
control. Audiopad detects the position and orientation of
the pucks and maps the volume and other parameters of the
music to these parameters [14].
The performer can associate each puck with a different
track. Each puck resonates at a different frequency on the
table, and radio sensors are used to determine the position
of each puck. Rotating a puck controls the volume of the
associated track. Changing the position of a puck changes
certain acoustic properties of the associated track.
One of the remarkable audio-physical projects that map
sound to the physical movement of the user is SoundSlam
in which the user punches a bag at certain locations to
trigger and control pre-recorded audio files and create song
compositions [11]. The bag contains pressure sensors, a
global acceleration sensor, and a sound processor unit [11].
123
67
123
68
Over the years, the perception of voice mainly as a communication tool has led to the development of many
techniques and devices such as record players, telephones,
and radios. The main aim was to preserve, reproduce, or
transfer voice as it is; an audible means by which an
expression is communicated. The authors installations,
sssSnake, Blowtter and Expressmas Tree, however, are
attempts to considerably convert voice and transform it into
a visible means by which a real object is physically altered.
These efforts did not manage to talk a hole through a
board [6] yet. However, the next paragraph explains sssSnake is able, in a sense, to shout a coin out of the screen.
3.1 sssSnake
123
69
Table 1 The steps involved in determining the microphone and utterance used in sssSnake. In the example given, both players are vocalizing
Steps
Description
1. High
frequency
component
detection
2. Low
frequency
component
detection
3. Snake
movement
4. Coin
movement
2.
3.
4.
5.
123
70
and the plotter was hidden below the table. Hence, the
virtual coin, which was initially used to test the screenbased version, has been replaced by a real coin moving
away from a snake projected on the surface of the table.
123
71
123
72
recognition is to investigate the implications of its integration with non-speech voice input.
3.3.1 Implementation of Blowtter
The main hardware components of Blowtter include a
DXY Roland plotter, four USB audio adapters, four
microphones, and a fast personal computer. The application was again programmed in Macromedia Director/
Lingo. Three Xtras were used: asFFT, Chant, and Direct
Communication. Some of the techniques used in this project are similar to those in sssSnake.
The vocal signal is analyzed using the asFFT Xtra. By
using the Xtra to measure and compare the different volume levels at each microphone, the software was enabled
to recognize which microphone of the four the user is
blowing into. The microphone at which the maximum
amplitude is detected determines the direction for moving
the pen. To avoid the plotting head reacting to any voice
input other than blowing, an amplitude threshold is specified. When a volume above the threshold is detected, the
software assumes that the user is speaking rather than
blowing and the speech recognition Xtra, Chant, is activated. Furthermore, to avoid the plotter head reacting to
soft ambient noises, another lower threshold is specified.
Any noises below it are completely ignored. The HewlettPackard Graphics Language (HPGL) is again used to
control the plotter. In order to insure accuracy, a pitch
threshold is also specified. When a pitch (here identified as
the most powerful part of the frequency spectrum) above
the pitch threshold and a volume below the volume
threshold are detected, the software assumes that the user is
blowing because blowing is high-pitched in comparison to
most speech sounds.
Chant Speech Kit, a speech-recognition Xtra for
Director, is used to recognize the commands spoken by the
user. The Xtra supports a number of speech recognition
applications, in this case Microsoft SAPI. Chant Speech Kit
consists of a command, grammar, and dictation vocabulary
list. For Blowtter, however, only the command list is
enabled and customized to include only the commands that
control the plotter. The dictation vocabulary is disabled,
hence detection errors are minimized by programming the
Xtra to capture only the commands in the customized
command list.
Direct Communication Xtra allows Director to communicate with an external device either through the serial
or the parallel port.
123
73
123
74
123
References
1. Al Hashimi, S., Davies, G.: Vocal telekinesis; physical control of
inanimate objects with minimal paralinguistic voice input. In:
Proceedings of ACM Multimedia 2006, Santa Barbara, CA, USA,
2006
2. Armand, A., Dauw, B.: Commotion. http://www.francobelge
design.com/ (2004)
3. Borland, R., Findley, J., Jacobs, M.: Milliefiore effect.
http://www.siggraph.org/artdesign/gallery/S02/onfloor/borland/
1technicalstatement.html (2002)
4. Dobson, K.: Blendie 2000. http://web.media.mit.edu/%7
Emonster/blendie/ (2003)
5. Dozza, M., Chiari, L., Horak, F.B.: Audio-biofeedback improves
balance in patients with bilateral vestibular loss. Arch. Phys.
Med. Rehabil. 86(7), 14011403 (2005)
75
6. Dyer F.L.: Edison, his life and inventions. http://www.worldwide
school.org/library/books/hst/biography/Edison/chap10.html (1929)
7. Gaver, W.: The sonic finder: an interface that uses auditory icons.
Hum. Comput. Interact. 4, 6794 (1989)
8. Gaye, L., Holmquist, L.E.: In: Duet with everyday urban settings:
a user study of Sonic City. In: Proceedings of the New Interfaces
for Musical Expression (2004)
9. Harada, S., Wobbrock, J.O., Landay, J.A.: VoiceDraw: a handsfree voice-driven drawing application for people with motor
impairments. Proceedings of the 9th International ACM SIGACCESS Conference on Computers and Accessibility. Tempe,
Arizona (2007)
10. Karpf, A.: The human voice. Bloomsbury, London (2006)
11. Kirschner, R., Morawe, V., Reiff, T.: His Masters Voice,
http://www.fursr.com/details.php?id=3&pid=3 (2001)
12. Liljedahl, M., Lindberg, S., Berg, J.: Digiwallan Interactive
Climbing Wall. In: Proceedings of ACE2005. Valencia, Spain,
1517 (2005)
13. Ng, K., Popat, S., Stefani, E., Ong, B., Cooper, D.: Music via
motion: interactions between choreography and music. In Proceedings for the Inter-Society for the Electronic Arts (ISEA)
Symposium, France. http://www.isea2000.com/actes_doc/48_
kia_ng.rtf (2000)
14. Patten, J., Recht, B., Ishii, H.: Audiopad: a tag-based interface for
musical performance. In: Proceedings of the New Interfaces for
Musical Expression (NIME 02) (2002)
15. Roosegaarde, D.: 4D-Pixel. http://www.daanroosegaarde.nl/
(2005)
16. Schmitt, A.: asFFT Xtra. http://www.as-ci.net/asFFTXtra (2003)
17. Snibbe, S.: Blow up. http://www.snibbe.com/scott/breath/
blowup/index.html (2005)
18. Srensen, K., Petersen, K.Y.: Smiles in motion. http://www.
boxiganga.dk/english/enindex.html (2000)
19. Wakefield, J.: Body movement to create music. http://news.bbc.
co.uk/2/hi/technology/3873481.stm (2004)
20. Zivanovic, A.: http://www.senster.com/ihnatowicz/ (2000)
123