Vous êtes sur la page 1sur 6

Solum 1

Renata Solum
Jayanthi Sasisekaran
SLHS 3305W
March 3, 2009

The Influence of Emotion on Acoustic Parameters of Voice

Introduction

Physiological correlates of human emotion are as evident in speech as they are in

facial expression and body language. This project aims to identify some of the effects of

emotion on measurable, acoustic parameters of voice, attempting to replicate some

findings by Banse and Scherer (1996) for the emotions hot anger, happiness, boredom,

and contempt. This research will also expand upon Banse and Scherer to include

measurement of jitter for each of the four emotions.

Research Questions

How do four emotions (hot anger, happiness, boredom, and

contempt) affect acoustic parameters of voice? How do exemplary

utterances of the four emotions differ in terms of pitch and intensity

contours? Do pitch and intensity contours co-vary across the four

emotions? What do measurements of mean F0, range F0, mean intensity,

dynamic range, sentence duration, LTAS, and jitter absolute reveal about

production of and perceptual distinction between the four emotions?

Predictions

Figure 1 shows my predictions for acoustic measurements of the

four recorded exemplars.


Solum 2

F0/Pitch Dynamics Sentence LTAS Jitter


contour Duration
Highest mean F0 Highest mean Shortest Perhaps slightly more energy in Perhaps second
(Banse and intensity and duration high frequencies. lowest value
Hot Anger Scherer), most largest dynamic
variable pitch range
contour
High mean F0 Smaller dynamic Second Unsure. Bright, relaxed,
but flatter affect, range, medium longest but clear vocal
Happiness due to association intensity quality—low jitter
with relaxed
state.
Flattest affect; Smallest Longest Perhaps slightly more energy in Perhaps second
predict smallest dynamic range, low frequencies highest value.
Boredom F0 range and lowest intensity
downward slope.
Lowest mean F0 Mean intensity Second In comparison to other three Highest value—
(Banse and similar to hot shortest emotions, more energy centered in creaky, growly
Scherer); range anger, and high frequencies. (I predict vocal quality
Contempt smaller than hot perhaps similar contempt similar to Banse and towards the
anger. dynamic range, Scherer’s predictions of disgust— bottom of my
as well. pharyngeal constriction) modal range
Figure 1: Predictions
Methods

After Banse and Scherer (1996), this experiment made use of a standardized

utterance to mitigate the effects of varying phonological content on acoustic parameters,

and thus attempt to isolate the effects of emotion. Their phrase, “Hat sundig pron you

venzy,” resembles normal speech. In the interest of further standardizing the four

emotional exemplars, I placed stress uniformly on the words pron and venzy during

recording. All recordings were of my own voice. I recorded three potential exemplars of

each emotion in the Praat software using a sampling rate of 44100Hz. I then randomized

the list of 12 recordings and played them to two listeners, who identified each as one of

the four emotions and rated it on a 1-5 scale in terms of how well it exemplified the

identified emotion. I averaged the ratings and chose the highest-rated exemplar for each

of the four emotions, throwing out one exemplar that the two listeners identified

differently. Having identified the single best exemplar of each emotion (hot anger,

happiness, boredom, and contempt) I used Praat to obtain pitch and intensity contours
Solum 3

and measurements of mean F0, range F0, mean intensity, dynamic range, sentence

duration, LTAS, and jitter (absolute) for each.

Results

Pitch and intensity contours are shown below in Figure 2, with pitch and intensity

contours for each utterance overlaid in their respective pairs for examination of co-

variance. Pitch is highlighted in red.

Hot Anger Happiness

Boredom Contempt
Figure 2: Pitch and Intensity Contours (Pitch in red)
To start with hot anger, it is immediately evident that this exemplar has the highest F0

variability, as I predicted. It also, incidentally, has the highest mean F0--this seems to

reflect the high physiological excitement associated with hot anger, as confirmed by

Banse and Scherer, who found that mean F0 is highest for what they call the most

“intense” emotions. Boredom carries the flattest affect, as predicted, and the little change

there is in the pitch contour does reveal a slight downward slope. Banse and Scherer
Solum 4

found that boredom was associated with lower mean F0, which I did not anticipate, and

which was not evident in my measurements—in fact, I found that my contempt exemplar

had a slightly lower mean F0 than did boredom (see Figure 3, below). Incidentally,

boredom also showed what appears to be a delay in voice-onset—the pitch contour is

unavailable until about 1.2sec into the utterance, which might reflect the aperiodic,

breathy sigh that can be heard at the beginning of this exemplar.

Pitch and intensity appear to co-vary for the emotions hot anger, happiness, and

contempt especially, with contempt showing much greater peaks for intensity than for

pitch; this seems intuitive considering the way in which I associate contempt with a

“spitting” of words.

Figure 3 shows results for measurements of other acoustic parameters: mean F0,

range F0, intensity, dynamic range, sentence duration, and jitter.

Mean F0 Range F0 Intensity Dynamic Sentence Jitter


(Hz) (Hz) (dB) Range (dB) Duration (sec) Absolute
Hot Anger 222.47 78.25 64.07 45.89 2.05 2.65%
Happiness 204.403 127.42 62.74 52.13 1.76 1.69%
Boredom 152.53 81.53 81.53 57.05 2.04 1.52%
Contempt 148.46 93.56 57.24 48.76 2.01 2.46%
Figure 3: Further Results
Whereas I had predicted that boredom and hot anger would have the smallest and

the largest dynamic ranges, respectively, in fact I found that the dynamic range for

boredom was smallest second only to hot anger. I am curious as to whether the

measurement of dynamic range is skewed due to non-uniform microphone placement

during recording or simply an idiosyncratic performance of hot anger on my part. One

finding for hot anger that recalls Banse and Scherer is its having the highest mean F0 of

the four exemplars. Hot anger had the slowest speech rate as measured by sentence
Solum 5

duration, which is at odds with my prediction that it would have the fastest speech rate. I

do not know why this is.

Jitter values may be the most informative and expected data to come from this set

of parameters. The highest jitter values were associated with hot anger and contempt,

which perhaps reveals the physiological stress injected into speech during the

circumstances that warrant these emotions. Specifically, pharyngeal constriction like that

theorized by Banse and Scherer for production of disgust speech may affect the mass or

tension of the vocal folds in ways that interfere with normal periodicity.

Figure 4: Overlaid LTAS Curves for All Four Emotions


Figure 4, above, shows LTAS curves for all four emotions, overlaid for

comparison. It is hard to draw solid conclusions from the LTAS curves themselves—

perhaps one thing that can be said is that contempt appears to be associated with the most

dramatic decrease in energy from the low to the high frequencies, contrary to my

prediction that pharyngeal constriction (again, something I associate with both contempt

and Banse and Scherer’s disgust) would lead to more high-frequency energy.
Solum 6

Conclusions

Replications of Banse and Scherer’s measurements of acoustic parameters had

varying degrees of success. Pitch and intensity contours certainly gave insight to the

production and perception of the four emotions: for hot anger and happiness, pitch and

intensity co-vary closely, whereas boredom and contempt had similarly dramatic

variations in intensity while maintaining relatively flat pitch. It is interesting that pitch

contours for hot anger and contempt are so distinct, given that these were the two

emotions most confused by listeners in the initial selection of exemplars. Perhaps the

confusion reflects the difficulty in defining hot anger and contempt informatively, and

reveals that perceiving them distinctly from a set containing both may be contingent on

another factor, such as context, body language, or linguistic content. My one expansion

on Banse and Scherer, that is, jitter values for each emotion, seemed to reveal the most

about production of the more “intense” emotions in particular.

Limitations

I am aware of the limited pool of exemplars (12—three for each of four emotions)

from which judges chose the best four, in comparison with 1,344 voice samples in the

Banse and Scherer study. Furthermore, the fact that all 12 were from my own voice

means that less variation within emotion sets resulted, and so judges were forced to rate

near 5 what they may have thought were poor exemplars, in the interest of identifying a

“best” for each emotion. These issues may be the underlying causes for the scarcity of

identifiable correlation with findings by Banse and Scherer, and the apparent lack of

usefulness, entirely, of a few acoustic parameters.

Vous aimerez peut-être aussi