Académique Documents
Professionnel Documents
Culture Documents
1739
I. INTRODUCTION
Identification of emotions in speech depends on the
chosen set of features extracted from the speech signal.
These features are systematically divided into segmental and
supra-segmental ones [1]. Short-term segmental features
derived from speech frames with short duration are usually
in relation with the speech spectrum. These include
traditional features like linear predictive coefficients, line
spectral frequencies, mel-frequency cepstral coefficients
(MFCC), linear prediction cepstral coefficients [2], or
unconventional ones like perceptual linear predictive
coefficients, log frequency power coefficients, etc. [3].
Supra-segmental features comprise statistical values of
parameters describing prosody by duration, fundamental
83
( )
S e j =
1
w x e j k
n k =1 k k
n
2
1
w
n k =1 k
(1)
N0
S e j = c0 + 2 cn cos ( n ),
(4)
( )
fs
f
n , Bn = s ln zn ,
2
Fn =
(2)
n =1
Input sentence
Selected ROI
Segmentation
Windowing
Calculation of mean
Welchs
periodogram
( )
S e j
( )
S e j =
1+
NA
{F, T }
(Voiced sound)
PW [dB]
Smoothed envelope
Smoothed
by cepstral
envelope by LPC
limitation
(high order)
{F, T }
PC [dB]
{F, T }
PLPC [dB]
(3)
Formant positions and bandwidths determination
ak exp jn
n =1
{PSDdB }
{ak }, G
Calculation from
roots of LPC
polynomial
Formant trends
calculation
{B31,2,3}
{F1,2,3}
{k1,2,3}
{1,2,3}
{F1,2,3}
{B31,2,3}
84
y y1 = k ( x x1 ) , k =
y2 y1
, k = tg( ) ,
x2 x1
(5)
-5
Angle
F1
Rel
[dB]
-10
Angle
13
= -53
= -42
F =506, B =181
1
-15
'12
-20
F 2 =1859, B 3=132
F =2748, B =312
F2
-25
'23
F3
-30
-35
-40
23
'13
0
500
1000
1500
2000
F [Hz]
2500
3000
3500
4000
Fig. 3. Basic statistical parameters of the first three format frequencies for
male and female voices: speech in neutral (a), sad (b), joyful (c), and
angry emotional styles (d).
85
Neutral
Joy
Sadness
Anger
PRel [dB]
0
-20
-40
-60
Neutral
Joy
Sadness
Anger
0
PRel [dB]
given by formant tilts from the male and the female voices
are shown in Fig. 7. Comparison between obtained mutual
formant mean frequencies F1 / F2, F1 / F3, and F2 / F3 for
different emotional states of male and female voices is
presented in Fig. 8. Diagrams of detailed analysis of formant
mutual frequency positions for different emotional states of
voiced sounds corresponding to vowels a, e, o, and
consonant n selected from the speech material of male and
female voices are shown in Fig. 9 and Fig. 10.
-20
-40
-60
1000
2000
F [Hz]
3000
1000
2000
F [Hz]
3000
4000
a)
b)
Fig. 7. Summary diagrams of bisectors with directions given by formant
tilts of male (a); and female (b) voices.
3000
F [Hz]
2500
2000
Neutral
Joy
Sadness
Anger
1500
1000
500
0
500
B3 [Hz]
400
F1 male
F2 male
F3 male
F1 female
F2 female
F3 female
Neutral
Joy
Sadness
Anger
300
200
100
0
B3_1 male
B3_2 male
Fig. 4. Common bar graphs of mean values of the first three formant
frequencies (upper), and their 3 -dB bandwidths (lower) for different
emotional states of male and female voices.
Fig. 8. Summary diagrams of mean F1 /F2 (a), F1 /F3 (b), F2 /F3 (c) mutual
frequency positions for different emotional states of male voices (blue
dashed lines and marks) and female ones (red dotted lines and marks).
86
Sound
Male voice
Female voice
12
13
23*)
12
13
23*)
-42
-27
-49
-31
-22
-36
-45
-51
-34
-32
-36
-28
-43
-59
33
-46
-52
-51
-56
-48
-44
-57
-34
-53
-64
-44
-48
-59
-37
-44
-65
-3
-52
-70
-10
-48
-70
-2
-47
-72
-12
Sound
NF [-]
F1 / B31 [Hz]
F2 / B32 [Hz]
F3 / B33 [Hz]
758
631 / 261
1363 / 193
2529 / 366
720
451 / 140
1590 / 240
2490 / 402
684
297 / 94
1879 / 412
2615 / 276
608
514 / 80
1178 / 320
2594 / 486
735
393 / 191
1113 / 173
2531 / 327
784
209 / 143
1186 / 309
2414 / 126
934
271 / 99
1191 / 427
2557 / 345
Sound
NF [-]
F1 / B31 [Hz]
F2 / B32 [Hz]
F3 / B33 [Hz]
782
806 / 321
1545 / 220
2725 / 292
V. DISCUSSION AND
802
621 / 152
1754 / 402
2744 / 473
576
311 / 80
1922 / 397
2814 / 361
618
451 / 263
1671 / 466
2606 / 356
684
438 / 120
1428 / 167
2943 / 292
696
314 / 156
1257 / 433
2613 / 370
945
357 / 130
1272 / 373
2690 / 313
87
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
VI. CONCLUSIONS
[18]
88