Assignment 2

Solum 1
Renata Solum
Jayanthi Sasisekaran
SLHS 3305W
March 3, 2009
The Influence of Emotion on Acoustic Parameters of Voice
Introduction
Physiological correlates of human emotion are as evident in speech as they are in
facial expression and body language. This project aims to identify some of the effects of
emotion on measurable, acoustic parameters of voice, attempting to replicate some
findings by Banse and Scherer (1996) for the emotions hot anger, happiness, boredom,
and contempt. This research will also expand upon Banse and Scherer to include
measurement of jitter for each of the four emotions.
Research Questions
How do four emotions (hot anger, happiness, boredom, and
contempt) affect acoustic parameters of voice? How do exemplary
utterances of the four emotions differ in terms of pitch and intensity
contours? Do pitch and intensity contours co-vary across the four
emotions? What do measurements of mean F0, range F0, mean intensity,
dynamic range, sentence duration, LTAS, and jitter absolute reveal about
production of and perceptual distinction between the four emotions?
Predictions
Figure 1 shows my predictions for acoustic measurements of the
four recorded exemplars.

Solum 2
F0/Pitch Dynamics Sentence LTAS Jitter

contour Duration
Highest mean F0 Highest mean Shortest Perhaps slightly more energy in Perhaps second
(Banse and intensity and duration high frequencies. lowest value
Hot Anger Scherer), most largest dynamic
variable pitch range
contour
High mean F0 Smaller dynamic Second Unsure. Bright, relaxed,
but flatter affect, range, medium longest but clear vocal
Happiness due to association intensity quality—low jitter
with relaxed
state.
Flattest affect; Smallest Longest Perhaps slightly more energy in Perhaps second
predict smallest dynamic range, low frequencies highest value.
Boredom F0 range and lowest intensity
downward slope.
Lowest mean F0 Mean intensity Second In comparison to other three Highest value—
(Banse and similar to hot shortest emotions, more energy centered in creaky, growly
Scherer); range anger, and high frequencies. (I predict vocal quality
Contempt smaller than hot perhaps similar contempt similar to Banse and towards the
anger. dynamic range, Scherer’s predictions of disgust— bottom of my
as well. pharyngeal constriction) modal range
Figure 1: Predictions
Methods
After Banse and Scherer (1996), this experiment made use of a standardized
utterance to mitigate the effects of varying phonological content on acoustic parameters,
and thus attempt to isolate the effects of emotion. Their phrase, “Hat sundig pron you
venzy,” resembles normal speech. In the interest of further standardizing the four
emotional exemplars, I placed stress uniformly on the words pron and venzy during
recording. All recordings were of my own voice. I recorded three potential exemplars of
each emotion in the Praat software using a sampling rate of 44100Hz. I then randomized
the list of 12 recordings and played them to two listeners, who identified each as one of
the four emotions and rated it on a 1-5 scale in terms of how well it exemplified the
identified emotion. I averaged the ratings and chose the highest-rated exemplar for each
of the four emotions, throwing out one exemplar that the two listeners identified
differently. Having identified the single best exemplar of each emotion (hot anger,
happiness, boredom, and contempt) I used Praat to obtain pitch and intensity contours
Solum 3
and measurements of mean F0, range F0, mean intensity, dynamic range, sentence
duration, LTAS, and jitter (absolute) for each.
Results
Pitch and intensity contours are shown below in Figure 2, with pitch and intensity
contours for each utterance overlaid in their respective pairs for examination of co-
variance. Pitch is highlighted in red.
Hot Anger Happiness
Boredom Contempt
Figure 2: Pitch and Intensity Contours (Pitch in red)
To start with hot anger, it is immediately evident that this exemplar has the highest F0
variability, as I predicted. It also, incidentally, has the highest mean F0--this seems to
reflect the high physiological excitement associated with hot anger, as confirmed by
Banse and Scherer, who found that mean F0 is highest for what they call the most
“intense” emotions. Boredom carries the flattest affect, as predicted, and the little change
there is in the pitch contour does reveal a slight downward slope. Banse and Scherer
Solum 4
found that boredom was associated with lower mean F0, which I did not anticipate, and
which was not evident in my measurements—in fact, I found that my contempt exemplar
had a slightly lower mean F0 than did boredom (see Figure 3, below). Incidentally,
boredom also showed what appears to be a delay in voice-onset—the pitch contour is
unavailable until about 1.2sec into the utterance, which might reflect the aperiodic,
breathy sigh that can be heard at the beginning of this exemplar.
Pitch and intensity appear to co-vary for the emotions hot anger, happiness, and
contempt especially, with contempt showing much greater peaks for intensity than for
pitch; this seems intuitive considering the way in which I associate contempt with a
“spitting” of words.
Figure 3 shows results for measurements of other acoustic parameters: mean F0,
range F0, intensity, dynamic range, sentence duration, and jitter.
Mean F0 Range F0 Intensity Dynamic Sentence Jitter

(Hz) (Hz) (dB) Range (dB) Duration (sec) Absolute
Hot Anger 222.47 78.25 64.07 45.89 2.05 2.65%
Happiness 204.403 127.42 62.74 52.13 1.76 1.69%
Boredom 152.53 81.53 81.53 57.05 2.04 1.52%
Contempt 148.46 93.56 57.24 48.76 2.01 2.46%
Figure 3: Further Results
Whereas I had predicted that boredom and hot anger would have the smallest and
the largest dynamic ranges, respectively, in fact I found that the dynamic range for
boredom was smallest second only to hot anger. I am curious as to whether the
measurement of dynamic range is skewed due to non-uniform microphone placement
during recording or simply an idiosyncratic performance of hot anger on my part. One
finding for hot anger that recalls Banse and Scherer is its having the highest mean F0 of
the four exemplars. Hot anger had the slowest speech rate as measured by sentence
Solum 5
duration, which is at odds with my prediction that it would have the fastest speech rate. I
do not know why this is.
Jitter values may be the most informative and expected data to come from this set
of parameters. The highest jitter values were associated with hot anger and contempt,
which perhaps reveals the physiological stress injected into speech during the
circumstances that warrant these emotions. Specifically, pharyngeal constriction like that
theorized by Banse and Scherer for production of disgust speech may affect the mass or
tension of the vocal folds in ways that interfere with normal periodicity.
Figure 4: Overlaid LTAS Curves for All Four Emotions

Figure 4, above, shows LTAS curves for all four emotions, overlaid for
comparison. It is hard to draw solid conclusions from the LTAS curves themselves—
perhaps one thing that can be said is that contempt appears to be associated with the most
dramatic decrease in energy from the low to the high frequencies, contrary to my
prediction that pharyngeal constriction (again, something I associate with both contempt
and Banse and Scherer’s disgust) would lead to more high-frequency energy.
Solum 6
Conclusions
Replications of Banse and Scherer’s measurements of acoustic parameters had
varying degrees of success. Pitch and intensity contours certainly gave insight to the
production and perception of the four emotions: for hot anger and happiness, pitch and
intensity co-vary closely, whereas boredom and contempt had similarly dramatic
variations in intensity while maintaining relatively flat pitch. It is interesting that pitch
contours for hot anger and contempt are so distinct, given that these were the two
emotions most confused by listeners in the initial selection of exemplars. Perhaps the
confusion reflects the difficulty in defining hot anger and contempt informatively, and
reveals that perceiving them distinctly from a set containing both may be contingent on
another factor, such as context, body language, or linguistic content. My one expansion
on Banse and Scherer, that is, jitter values for each emotion, seemed to reveal the most
about production of the more “intense” emotions in particular.
Limitations
I am aware of the limited pool of exemplars (12—three for each of four emotions)
from which judges chose the best four, in comparison with 1,344 voice samples in the
Banse and Scherer study. Furthermore, the fact that all 12 were from my own voice
means that less variation within emotion sets resulted, and so judges were forced to rate
near 5 what they may have thought were poor exemplars, in the interest of identifying a
“best” for each emotion. These issues may be the underlying causes for the scarcity of
identifiable correlation with findings by Banse and Scherer, and the apparent lack of
usefulness, entirely, of a few acoustic parameters.

Assignment 2

Transféré par

Informations du document

Description originale:

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Assignment 2

Transféré par

Droits d'auteur :

Formats disponibles

Solum 1

The Influence of Emotion on Acoustic Parameters of Voice

Physiological correlates of human emotion are as evident in speech as they are in

emotion on measurable, acoustic parameters of voice, attempting to replicate some

measurement of jitter for each of the four emotions.

How do four emotions (hot anger, happiness, boredom, and

contempt) affect acoustic parameters of voice? How do exemplary

utterances of the four emotions differ in terms of pitch and intensity

contours? Do pitch and intensity contours co-vary across the four

emotions? What do measurements of mean F0, range F0, mean intensity,

production of and perceptual distinction between the four emotions?

Figure 1 shows my predictions for acoustic measurements of the

four recorded exemplars.

F0/Pitch Dynamics Sentence LTAS Jitter

utterance to mitigate the effects of varying phonological content on acoustic parameters,

duration, LTAS, and jitter (absolute) for each.

variance. Pitch is highlighted in red.

Hot Anger Happiness

boredom also showed what appears to be a delay in voice-onset—the pitch contour is

breathy sigh that can be heard at the beginning of this exemplar.

range F0, intensity, dynamic range, sentence duration, and jitter.

Mean F0 Range F0 Intensity Dynamic Sentence Jitter

measurement of dynamic range is skewed due to non-uniform microphone placement

during recording or simply an idiosyncratic performance of hot anger on my part. One

do not know why this is.

Figure 4: Overlaid LTAS Curves for All Four Emotions

Replications of Banse and Scherer’s measurements of acoustic parameters had

about production of the more “intense” emotions in particular.

usefulness, entirely, of a few acoustic parameters.

Vous aimerez peut-être aussi