Académique Documents
Professionnel Documents
Culture Documents
Dean
MARCS Auditory Laboratories
When Is Noise Speech? A
University of Western Sydney, Australia
Locked Bag 1797 Survey in Sonic Ambiguity
Penrith South DC NSW 1797, Australia
{f.bailes, roger.dean}@uws.edu.au
NoiseSpeech is a compositional device in which terms of sound-causing events may override a more
sound is digitally manipulated with the intention of bottomup sensory perception. Concordantly, Ballas
evoking the sound qualities of unintelligible speech (1993) explores how associations are formed between
(Dean 2005). Speech is characterized by rapidly environmental sound and sound source, listing ex-
changing broadband sounds (Zatorre, Belin, and posure to particular sounds in everyday life, being
Penhune 2002), whereas musicparticularly tonal able to visualize the sound-producing event, and the
musicchanges more slowly and narrowly in similarity of the sound to a mental stereotype.
frequency content. As Zatorre and colleagues argue, Dean (2005) proposed that a hybrid of noise and
this distinction may be reflected in better temporal speech may not only invent a new language but
resolution in the left auditory cortex and better more importantly may present a new message.
spectral resolution in the right, so that perception NoiseSpeech seems to escape commodification,
is adapted to both ranges and extremes of sonic and in this respect fits Attalis (1985) concept of
stimuli. NoiseSpeech is constructed either by composition, yet it is not devoid of connotation
applying the formant structure (that is, spectral or expression. We argue that NoiseSpeech is likely
peak content) of speech to noise or other sounds, or to evoke affective responses from a listener through
by distorting speech sounds such that they no longer its association with the affective expression of
form identifiable phonemes or words. The resultant human speech (Dean and Bailes 2006). Traditionally,
hybrid is an artistic device that, we argue, may owe affect has been conceptualized in terms of valence
its force to an encapsulation of the affective qualities (positive and negative) and arousal (active and
of human speech, while intentionally stripping the passive) dimensions. (See for example Leman et al.
sounds of any semantic content. In this article, 2005). Here, we distinguish affect from emotional
we present an empirical investigation of listener connotations that are more concerned with top-
perceptions of NoiseSpeech, demonstrating that down cognitive associations, such as comfort
non-specialist listeners hear such sounds as similar or annoyance. According to this distinction, a
to each other and to unaltered speech. certain familiarity with a sound is necessary for
NoiseSpeech is ambiguous in evoking the iden- emotion to be evoked, but not for the perception
tification of an everyday source of soundhuman of affect. Where a sound is identified as familiarly
speechwithin the musical context of sound art. speech-like, higher-order cognitive processes may
Arguably, it could be said to blur the distinction be involved, associated with the perception of an
described by Gaver (1993a, 1993b) between two emotion. However, when sounds are either strongly
types of listening: the everyday and the mu- distorted through processing or are of ambiguous
sical. When NoiseSpeech occurs in the context origin, listeners may perceive this on an affective
of a composition or performance, what form of level as valence and arousal.
listening does a listener employ? The context is one Vocal affect expression has been studied through-
of musical listening, yet the identification of the out history (Banse and Scherer 1996). Links have
sounds with the human generation of speech is an been made in speech between emotion and altered ar-
everyday listening concern. The propensity to iden- ticulation, respiration, and phonation. In particular,
tify the source of a sound is a question of interest to these effects are believed to be quantifiable in terms
both cognitive and ecological approaches to percep- of acoustic variables such as spectral energy distri-
tion. Handel (1989) posits that cognizing sound in bution, fundamental frequency, and speech rate. For
example, Banse and Scherer (1996) examined the por-
Computer Music Journal, 33:1, pp. 5767, Spring 2009 trayal of different emotions by professional actors,
c
!2009 Massachusetts Institute of Technology. comparing human listener recognition rates with the
voice speaking everything is at least double, and NoiseSpeech sounds (though with generally lower
NsSp01NsSp14 constitute a female voice asking, speech ratings), even though they involve sta-
Is that you? As noted previously, two of the ble (unchanging) formants superimposed on noise,
items (NsSp01 and AtHs01) are unaltered speech; whereas speech itself comprises rapidly changing
this data shows that putative NoiseSpeech items formant spectra.
indeed cluster with genuine speech. One remarkable Taken together, the clusters seem to describe a
feature that deserves future investigation is that continuum, with the most clearly identifiable sound
NsSp15NsSp18 cluster with the other speech and sourcesdrums (cluster 1) and speech (cluster 3)at