Vous êtes sur la page 1sur 6

I nt ernat i onal Journal of E mergi ng Trends & Technol ogy i n Comput er Sci ence (I JE TTCS)

Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com


Volume 1, Issue 2, July August 2012 ISSN 2278-6856


Vol ume 1 , I ssue 2 Jul y-August 2 0 1 2 Page 2 4 1


Abstract: This study deals with a noise robust distributed
speech recognizer for real-world applications by deploying
feature parameter compensation technique. To realize this
objective, Mel-LP based speech analysis has been used in
speech coding on the linear frequency scale by applying a
first-order all-pass filter instead of a unit delay. To minimize
the mismatch between training and test phases, Cepstral
Mean Normalization (CMN) and Blind Equalization (BEQ)
have been applied to enhance Mel-LP cepstral coefficient as
an effort to reduce the effect of additive and multiplicative
noises. The performance of the proposed system has been
evaluated on Aurora-2 database which is a subset of TIDigits
database contaminated by additive and multiplicative noises.
The recognition performance of the developed system was
evaluated on test set A and set C. It should be noted that the
system was trained on clean condition.The average word
accuracy for test sets A and C are found to be 59.05% and
65.74% without applying feature parameter compensation
techniques, i.e CMN and BE. By employing CMN on the Mel-
LP cepstrum, the overall recognition rate has been improved
from 59.02% to 68.02% and from 65.74% to 71.64% for sets A
and C, respectively. On the contrary, the word accuracy for
blind equalization (BE) is found to be 60.81% and 65.29% for
sets A and C, respectively.
Keywords: Aurora-2 database, CMN, BEQ, bilinear
transformation, Mel-LPC.
1. INTRODUCTION
Speech recognition system is a system that takes a
continuous speech waveform as the input and produces a
transcription of the words being uttered. Speech
recognition systems have evolved from laboratory
demonstrations to a wide variety of real-life applications,
for instance, in telecommunications, in wireless
multimedia communications, in question and answer
(QA) systems, in robotics etc. Distributed Speech
Recognition (DSR) system is being developed for portable
terminals. These applications require such Automatic
Speech Recognizers (ASRs) which can be fit for
maintaining the performance at an acceptable level in a
wide variety of environmental situations has issued. The
performance of ASRs has reached to a satisfactory level
under controlled and matched training and recognition
conditions. However, their performance severely degrades
when there is a mismatch between training and test
phases, caused by additive noise and channel effect.
Environmental noises as well as channel effects
contaminate the speech signal and change the vectors
representing the speech, for instance, reduce the dynamic
range, or variance of feature parameters within the frame
[1],[2]. Consequently, a serious mismatch is occurred
between training and recognition conditions, resulting in
degradation in recognition accuracy. Noise robustness can
be achieved in many ways, such as, enhancement of input
signal either in time domain [3] or in frequency domain
[4], [34], [5], [6], enhancement in cepstral domain [7],
[8], [9], that is feature parameters compensation, and
acoustic model compensation or adaptation [10], [11],
[12]. In HMM based recognizer, the model adaptation
approaches have been shown to be very effective to
remove the mismatch between training and test
environments. However, for a distributed speech
recognition system, speech enhancement and parameter
compensation approaches are suitable than the model
adaptation approach. Because the acoustic model reside at
a server, so adaptation or compensation of model from the
Effect of CMN and Blind Equalization on
Additive and Multiplicative Noises for Mel-
LPC based Noisy Speech Recognition Using
HMM
Md. Firoz Ahmed
1
, Md. Babul Islam
2
, Md. Rasadul Islam
3
, Md. Ashraful Islam
4
, Mustari Zaman
5

1
Department of Information & Communication Engineering,
Rajshahi University, Rajshahi-6205, Bangladesh

2
Department of Applied Physics and Electronic Engineering,
Rajshahi University, Rajshahi, 6205, Bangladesh

3
Department of Applied Physics, Electronic and Communication Engineering,
Pabna Science and Technology University, Bangladesh

4
Department of Information & Communication Engineering,
Rajshahi University, Rajshahi-6205, Bangladesh

5
Department of Information & Communication Engineering,
Rajshahi University, Rajshahi-6205, Bangladesh

I nt ernat i onal Journal of E mergi ng Trends & Technol ogy i n Comput er Sci ence (I JE TTCS)
Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com
Volume 1, Issue 2, July August 2012 ISSN 2278-6856


Vol ume 1 , I ssue 2 Jul y-August 2 0 1 2 Page 2 4 2

front-end is not feasible. Therefore, this paper deals with
the design of front-end with parameter compensation,
such as CMN and BEQ. Since the human ear resolves
frequencies non-linearly across the speech spectrum,
designing a front-end incorporating auditory-like
frequency resolution improves recognition accuracy [13],
[14], [15]. In non-parametric spectral analysis, Mel-
frequency Cepstral Coefficient (MFCC) [13] is one of the
most popular spectral features in ASR. This parameter
takes account of the nonlinear frequency resolution like
the human ear.
In this study, the Mel-LP analysis along with CMN and
BEQ has been used to reduce the mismatch between
training and test sessions for designing a noise-robust
DSR front-end.

Fi gure 1: The fr equency mappi ngs funct i on by
bi l i near t r ansfor mat i on.

Fi gure 2: Gener al i zed aut ocor r el at i on funct i on

In par amet r i c spect r al anal ysi s, t he l i near
Pr edi ct i on Codi ng (LPC) anal ysi s [16], [17]
based on all-pole model is widely used because of its
computational simplicity and efficiency. While the all-
pole model enhances the formant peaks as an auditory
perception, other perceptually relevant characteristics are
not incorporated the model unlike MFCC. To alleviate
this inconsistency between the LPC and auditory analysis,
several auditory spectra have been simulated before the
all-pole modeling [14], [18], [19], [20].
In contrast to the different spectral modification, [23]
(1980) proposed all-pole modeling to a frequency warped
signal which is mapped onto a warped frequency scale by
means of bilinear transformation and investigated several
computational procedures. However, the methods
proposed by Oppenheim and Johnson (1972) [21] to
estimate warped all-pole model have rarely been used in
automatic speech recognition. Recently, as an LP-based
front-end, a simple and efficient time domain technique
to estimate all-pole model is proposed by Matsumoto et
al. (1998)[24], which is referred to as a Mel-LPC
analysis. In this method, the all-pole method has been
estimated directly from the input signal without applying
bilinear transformation. Hence, the prediction coefficients
can be estimated without any approximation by
minimizing the prediction error power at a two-fold
computational cost over the standard LPC analysis.
2. MEL-LP ANALYSIS
Set your page as A4, The frequency warped signal
] [
~
n x (n=0,,) obtained by the bilinear transformation
[21] of a finite length windowed signal ] [n x (n = 0, 1,
2,, N 1) is defined by:


=

= = =
1
0 0
] [ ) (
~
] [
~
)
~
(
~
N
n
n
n
n
z n x z X z n x z X 1
where
1 ~
z is the first-order all-pass filter:
1
1
1
. 1
~


=
z
z
z


2
where 1 0 < < is treated as frequency warping factor.
The phase response of
1 ~
z

is given by:

)
`

+ =




cos 1
sin
tan 2
~
1
3
This phase function determines a frequency mapping. As
shown in Fig 1, = 0.35, = 0.40 can approximate the
mel-scale and bark-scale [22], [32] at the sampling
frequency of 8 kHz, respectively.
In Mel-LP analysis, the spectral envelope of
)
~
(
~
)
~
(
~
z W z X

is approximated by the following all-pole
model on the linear frequency domain,

+
=
p
k
k
k
e
a
z a
z H
1
~ ~
1
~
)
~
(
~
4

where
k
a
~
is the k-th mel-prediction coefficient and
2
~
e
is
the residual energy (Strube, 1980).



Figure 3: Mel-LP analysis on the linear frequency scale.

I nt ernat i onal Journal of E mergi ng Trends & Technol ogy i n Comput er Sci ence (I JE TTCS)
Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com
Volume 1, Issue 2, July August 2012 ISSN 2278-6856


Vol ume 1 , I ssue 2 Jul y-August 2 0 1 2 Page 2 4 3

The model )
~
(
~
z
a
H is estimated on the basis of
minimization of mean square error (MMSE) as shown in
Fig. 3. Since ] [
~
n x is an infinite sequence, the prediction
error signal is also an infinite sequence. Thus, the total
error energy E
~
over an infinite sequence is given by:
2
0 0
] [
~
~

= =
|
|
.
|

\
|
=
n
k
p
k
k
n x a E
5
Where ] [n x
k
is the output signal of a k-th order all-pass
filter
k
z z

) (
~
excited by ] [ ] [
0
n x n x = . As a result of
minimizing E
~
, the mel-prediction coefficients { }
k
a
~
are
obtained by solving for the following normal equations:

( ) ( ) ( ) p m m a k m
k
p
k
,..., 1 , , 0
~
~
,
~
1
= =

=

6

where
] [ ] [ ) , (
~
0
n x n x k m
k
n
m

=
=
7
In the warped frequency domain, Eq.7 can be rewritten
as:
( )


d e e W e X k m
k m j j j
~
) ( 2
~ ~
. | ) (
~
) (
~
|
2
1
,
~

}
= 8
where the frequency weighting function

) (
~
~
j
e W is
defined by:
1
2
~
1
1
)
~
(
~

=
z
z W

9
which is derived from
2
~
| ) (
~
|
~

j
e W
d
d
=
10
Eq.6 indicates that ) , (
~
k m reduces to the
autocorrelation function of the signal whose Fourier
transform is equal to the frequency warped and frequency
weighted spectrum ) (
~
) (
~
~ ~
j j
e W e X .This autocorrelation
function is called as generalized autocorrelation
function. Fig. 2 illustrates the calculation procedure of
generalized autocorrelation function. From Eq.6, it
should be noted that ) , (
~
k m is a function of the
difference ) ( m k . Thus, ) , (
~
k m can be calculated
from the sum of finite terms without any approximation,
] [ ]. [ ] [
~
) , (
~
1
0
n x n x m k r k m
N
n
m k

=

= =
11
Therefore, to solve for
k
a
~
and
e

~
, the generalized
autocorrelation coefficients of the input signal ] [n x is
required instead of autocorrelation coefficients in the
traditional LP analysis[25], [26]. Since the mel-
prediction coefficients { }
k
a
~
are obtained from the
generalized autocorrelation function of the input
signal ] [n x , the proposed system enhances the speech
signal in the generalized autocorrelation domain.
Although the estimated model given by Eq. 4 includes the
frequency weighting ) (
~
~
j
e W , this can be easily removed
by inverse filtering in the generalized autocorrelation
domain using{ }
1
1
)
~
(
~
)
~
(
~

z W z W , which leads to the mel-


autocorrelation function ] [
~
m r

:
} { ] 1 [
~
] 1 [
~
] [
~
] [
~
1 0
+ + + = m r m r m r m r


12
Where
2
1
2 2
0
) 1 )( 1 (

+ =
13
and
2 1 2
1
) 1 (

=
14
As feature parameters for recognition, the Mel-LP
cepstral coefficients can be expressed as:

= H
0
~ ~
)
~
(
~
log
n
n
k a
z c z
15
where } {
k
c
~
are the mel-cepstral coefficients.
The mel-cepstral coefficients can also be calculated
directly from mel-prediction coefficients { }
k
a
~
(J. Markel
and A. Gray, 1976) using the following recursion:
j k
k
j
k k k
c a j k
k
a c

=
~ ~
) (
1
~ ~
1
1

16
It should be noted that the number of cepstral coefficients
need not be the same as the number of prediction
coefficients.

3. MEL-LP PARAMETER COMPENSATION
3.1 Cepstral Mean Normalization
A robust speech recognition system must adapt with its
acoustical environment or channel. To bring this concept
in effect, a number of normalization methods have been
developed in the cepstral domain so far. The simplest but
effective cepstral normalization method is the Cepstral
Mean Normalization (CMN) technique. In CMN the
mean of the cepstral vectors over an utterance is
subtracted from the cepstral coefficients in each frame as
given:
=
=
N
n
m
n c
N
n c n c
0
] [
1
] [ ] [

17
I nt ernat i onal Journal of E mergi ng Trends & Technol ogy i n Comput er Sci ence (I JE TTCS)
Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com
Volume 1, Issue 2, July August 2012 ISSN 2278-6856


Vol ume 1 , I ssue 2 Jul y-August 2 0 1 2 Page 2 4 4

where ] [n c and ] [n c
m
are the time-varying cepstral
vectors of the utterance before and after CMN,
respectively, and N is the total number of frames in the
utterance. The average of the cepstrum vectors over the
speech interval represents the channel distortion, which
does not use any knowledge of the [8]. As the channel
distortion is suppressed by CMN, it can be viewed as
parameter filtering operation. Consequently, CMN has
been treated as high-pass and band-pass filters [9]. The
effectiveness of CMN for the combined effect of additive
noise and channel distortion is limited. Acero and Stern,
(1990) [28] have developed more complex cepstral
normalization techniques to compensate the joint effect of
additive noise and channel distortion.
3.2 Blind Equalization
Blind equalization is a technique effective for minimizing
the channel distortion, which is caused by the differences
in the input devices frequency characteristics. It uses
adaptive filtering technique to reduce these effects. It can
be applied both in spectral domain as well as in cepstral
[29], [30]. But in the cepstral domain it is easier to
implement, and it requires less operations than in the
spectral domain. This technique is based on the least
mean square (LMS) algorithm, which minimizes the
mean square error computed as a difference between the
current and reference cepstrum. In this study, the same
algorithm is used as that implemented in Islam et al.
(2007)[3] with same values of different parameters.
4. EVALUATION ON AURORA-2 DATABASE
4.1 Experimental Setup
The proposed system was evaluated on Aurora 2 database
[31], which is a subset of TIDigits database contaminated
by additive noises and channel effects. This database
contains the recordings of male and female American
adults speaking isolated digits and sequences up to 7
digits. In this database, the original 20 kHz data have
been down sampled to 8 kHz with an ideal low-pass filter
extracting the spectrum between 0 and 4 kHz. These data
are considered as clean data. Noises are artificially added
with SNR ranges from 20 to -5 dB at an interval of 5 dB.

To consider realistic the frequency characteristics of
terminals and equipment in the telecommunication area
an additional filtering is applied to the database. Two
standard frequency characteristics G.712 and MIRS are
used which have been defined by the ITU
(1996)[33].Their frequency responses have been shown in
Fig. 5

It should be noted that the whole Aurora-2 database was
not used in this experiment rather a subset of this
database was used as shown in Table 1


Figure 5: Frequency responses of G712 and MIRS filters
Table 1: Definition of training data.
Training
Model
Filter Noise SNR [dB]
Clean G.712

Multi

G.712
Subway, car
,
babble,
exhibition
20, 15, 10,
5, 0, -5 and
clean

The recognition experiments were conducted with a 12th
order Mel-LP analysis. The pre-emphasized speech signal
with a pre-emphasis factor of 0.95 was windowed using
Hamming window of length 20 ms with 10 ms frame
period. The frequency warping factor was set to 0.35. As
front-end, 14 cepstral coefficients and their delta
coefficients including 0th terms were used. Thus, each
feature vector size is 28. The reference recognizer was
based on HTK (Hidden Markov Model Toolkit, Version
3.4) software package. The HMM was trained on clean
condition. The digits are modeled as whole word HMMs
with16 states per word and a mixture of 3 Gaussians per
state using left-to-right models. In addition, two pause
models sil and sp are defined. The sil model consists
of 3 states, which illustrates in Fig. 6. This HMM shall
model the pauses before and after the utterance. A
mixture of 6 Gaussians models each state. The second
pause model sp is used to model pauses between words.
It consists of a single state, which is tied with the middle
state of the sil model. The recognition accuracy (Acc) is
evaluated as follows:
% 100

=
N
I S D N
Acc
18
where N is the total number of words. D, S and I are
deletion, substitution and insertion errors, respectively.

Figure 6: Possible transition in the 3-state pause model
sil.
I nt ernat i onal Journal of E mergi ng Trends & Technol ogy i n Comput er Sci ence (I JE TTCS)
Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com
Volume 1, Issue 2, July August 2012 ISSN 2278-6856


Vol ume 1 , I ssue 2 Jul y-August 2 0 1 2 Page 2 4 5

4.2 Recognition Result
The recognition accuracy of ASR without applying
parameter compensation technique for test sets A and C
are tabulated in Table 2 and 3, respectively. It has been
found that the average word accuracy for test sets A and
C are 59.05% and 65.74%, respectively. On the other
hand, CMN has improved the recognition accuracy both
for test sets A and C and are found to be 68.02% and
71.64% as listed in Table 4 and 5, respectively. But in the
case of blind equalization (BEQ), there is a slight
improvement is observed for test set A which is 60.81%
as shown in Table 6, whereas, for test set C, no
improvement is observed and the accuracy is found to be
65.29% as shown in Table 7. The effect of CMN and
BEQ on noise category has also been observed for test sets
A and C .It is clear that for car noise BEQ gives better
accuracy than that of CMN and for other noise categories
CMN outperforms BE on the average. It has also been
observed the effect of CMN and BEQ at low SNR
conditions for test sets A and C. It has been found that for
test set A, at low SNR conditions, that is, for 5 and 0 dB
most of the cases BEQ give better accuracy than that of
CMN except for babble noise. But in the case of test set
C, CMN outperforms BE at any SNR conditions.

Table 2: Word Accuracy of MLPC without parameter
compensation for set A
Noise SNR [dB] Average
(20 to 0
dB)
Clean 20 15 10 5 0 -5
Subway 98.71 96.93 93.43 78.78 49.55 22.81 11.08 68.30
Babble 98.61 89.96 73.76 47.82 21.95 6.80 4.45 48.06
Car 98.54 95.26 83.03 54.25 24.04 12.23 8.77 53.77
Exhibitio
n
98.89 96.39 92.72 76.58 44.65 19.90 11.94 66.05
Average 98.69 94.64 85.74 64.36 35.05 15.44 9.06 59.05

Table 3: Word Accuracy of MLPC without parameter
compensation for set C
Noise SNR [dB] Average
(20 to 0
dB)
Clean 20 15 10 5 0 -5
Subway 99.11 94.75 89.41 77.53 46.58 16.92 8.72 65.04
Street 98.73 94.53 89.09 74.79 49.97 23.76 12.12 66.43
Average 98.92 94.64 89.25 76.16 48.28 20.34 10.42 65.74

Table 4: Word Accuracy of MLPC with CMN for set A



Table 5: Word Accuracy of MLPC with CMN for set C.
Noise SNR [dB] Average
(20 to 0
dB)
Clean 20 15 10 5 0 -5
Subway 97.02 86.77 79.92 73.72 64.63 44.43 21.46 69.89
Street 95.86 89.45 85.01 78.14 67.11 47.19 23.88 73.38
Average 96.44 88.11 82.46
5
75.93 65.87 45.81 22.67 71.64

Table 6: Word Accuracy of MLPC with Blind
equalization for set A
Noise SNR [dB] Average
(20 to 0
dB)
Clean 20 15 10 5 0 -5
Subway 96.99 77.83 71.66 64.32 56.89 45.35 27.79 63.21
Babble 96.43 79.75 68.65 59.16 44.20 22.28 2.96 54.81
Car 96.00 87.65 78.32 70.27 59.92 43.72 20.82 67.98
Exhibitio
n
95.96 77.63 70.90 63.13 46.62 27.95 12.19 57.25
Average 96.35 80.72 72.38 64.22 51.91 34.83 15.94 60.81


Table7: Word Accuracy of MLPC with Blind
equalization for set C
Noise SNR [dB] Average
(20 to 0
dB)
Clean 20 15 10 5 0 -5
Subway 97.08 80.2 74.85 68.01 58.98 43.72 22.87 65.16
Street 95.86 83.89 77.3 67.99 58.04 39.81 19.68 65.41
Average 96.47 82.05 76.08 68 58.51 41.77 21.28 65.29

References
[1] D. C. Bateman, D.K. Bye, M.J. Hunt,Spectral
contrast normalization and other techniques for
speech recognition in noise, ICASSP, 92(1): 241-
244, 1998.
[2] S .V. Vaseghi and B.P. Milner, Noise-adaptive
Hidden Markov models based on Wiener filters,
Euro. Speech, 93(2):1023-1026, 1993.
[3] M. B. Islam., K. Yamamoto and H. Matsumoto,
Wiener filter for Mel-LPC based speech
recognition. IEICE, Trans. Inform. Sys, E90-D (6),
2007.
[4] S .F. Boll, Suppression of acoustic noise in speech
using spectral subtraction IEEE Trans. Acoust.
Speech Signal Process, 27(2): 113-120.
[5] P. Lockwood, and J. Boudy, Experiments with a
nonlinear spectral subtractor (NSS), hidden
Markov models and the projection or robust speech
recognition in cars Speech Commun, vol. 11, no.
2-3, pp. 215-228,2007.
[6] Q. Zhu and A. Alwan, The effect of additive
noise on speech amplitude spectra: A Quantitative
analysis, IEEE Signal Processing Letters,
9(9):275-277, 2002.
[7] B. Atal, Effectiveness of linear prediction
characteristics of the speech wave for automatic
speaker identification and verification. J. Acoust.
Soc. Am., 55(6): 1304-1312, 1974.
Noise SNR [dB] Average
(20 to 0
dB)
Clean 20 15 10 5 0 -5
Subway 99.02 96.41 92.05 78.66 50.23 24.78 16.09 68.43
Babble 98.82 97.37 93.80 82.22 55.32 25.76 13.30 70.89
Car 98.87 96.96 92.72 77.42 42.77 22.55 13.18 66.48
Exhibitio
n
99.07 96.08 91.67 76.70 45.97 20.98 11.60 66.28
Average 98.95 96.71 92.56 78.75 48.57 23.52 13.54 68.02
I nt ernat i onal Journal of E mergi ng Trends & Technol ogy i n Comput er Sci ence (I JE TTCS)
Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com
Volume 1, Issue 2, July August 2012 ISSN 2278-6856


Vol ume 1 , I ssue 2 Jul y-August 2 0 1 2 Page 2 4 6

[8] S. Furi, Cepstral analysis technique for automatic
speaker verification IEEE Trans, Acoust, Speech
and Signal Processing, vol. ASSP-29, pp. 254-272,
1981.
[9] C. Mokbel, D. Jouvet, J. Monne and R. De Mori,
Compensation of telephone line effects for robust
speech recognition, Proc. ICSLP, 94, pp. 987-
990, 1984.
[10] M.J.F. Gales and S. J. Young, HMM recognition
in noise using parallel model Combination Proc.
of Euro speech 93, vol. II, pp.837-840, 1993a.
[11] M.J.F. Gales and S. J. Young, Ceptral parameter
compensation for HMM recognition in noise.
Speech Commu, 12(3): 231-239, 1993b.
[12] A.P. Vagra and R.K. Moore, Hidden Markov
model decomposition of speech and noise Proc.
ICASSP., 90(2):845-848, 1990.
[13] S. Davis and P. Mermelstein, Comparison of
parametric representation for monosyllabic word
recognition in continuously spoken sentences.
IEEE Trans. Acoustics Speech Signal Process,
ASSP, 28(4): 357-366, 1993b.
[14] H. Hermansky, Perceptual linear predictive (PLP)
analysis of speech, The Journal of the Acoustical
Society of America, vol. 87, no. 4, pp. 17-29.
[15] N. Virag, Speech enhancement based on masking
properties of the auditory system, ICASSP,
95:796-799.
[16] F. Itakura and S. Satio, Analysis synthesis
telephony based upon the maximum likelihood
method, Proc. of 6
th
International Congress on
Acoustics, Tokyo, C-5-5, C17-20, 1968.
[17] B. Atal, and M.Schroeder, Predictive coding of
speech signals, Proc. of 6
th
International
Congress on Acoustics, Tokyo, pp.21-28, 1968.
[18] J. Makhoul and L. Cosell, LPCW: An LPC
vocoder with linear predictive warping, Proc.
ICASSP., 76: 446-469, 1976.
[19] J. Itahashi and S. Yokoyama, A formant
extraction method utilizing mel scale and equal
loudness contour Speech Transmission
Lab,Quarterly Progress and Status Report
Stockholm (4), pp. 17-29, 1987.
[20] M.G. Rahim and B. H. Juang, Signal bias removal
by maximum likelihood estimation for robust
telephone speech recognition. IEEE Trans, on
Speech and Audio Processing, Vol. 4, No. 1, pp.
19-30, 1996.
[21] A.V. Oppenheim and D. H. Johnson, Discrete
representation of signals, IEEE Proc., vol.60,
no.6, pp.681-691, 1972.
[22] E. Zwicker and E. Terhardt, Analytical
expressions for critical band rate and critical
bandwidth as a function, J. Acoust. Soc. Am., vol.
68, pp. 1523-1525, 1980.
[23] H.W. Strube, Linear prediction on a warped
frequency scale, J. Acoust. Soc. Am., vol. 68, no.
4, pp. 1071-1076, 1980.
[24] Matsumoto, H., T. Nakatoh and Y. Furuhata, An
efficient Mel-LPC analysis method for speech
recognition,Proc. ICSLP 98, pp. 1051-1054, 1998
.
[25] Moreno Raj B., Gouvea E. and Stern R.
MultivariateGaussian-Based Cepstral
Normalization for Robust SpeechRecognition.
ICASSP95.
[26] L. Neumeyer and Weintraub M. Robust Speech
Recognitionin Noise Using Adaptation and
Mapping Techniques.ICASSP95.
[27] J. Markel and A. Gray, 1976.Linear prediction of
speech. Springer-Verlag.
[28] A. Acero and R. Stern, Environmental robustness
in automatic speech recognition,Proc. of ICASSP
90, pp. 849-852, 1990.
[29] L. Mauuary, Blind equalization for robust
telephone based speech recognition,
Proc.EUSPICO 96, pp. 125-128, 1996.
[30] Mauuary, Blind equalization in the cepstral
domain for robust telephone speech Recognition,
Proc. EUSPICO 98, vol. 1, pp. 359-363, 1998.
[31] H.G. Hirsch and D. Pearce,The AURORA
Experimental framework for the performance
evaluation of speech recognition system under
noisy conditions, Proc. ISCA ITRW ASR.,
181:188, 2000.
[32] P.H. Lindsay and D. A. Norman,Human
information processing: An introduction to
psychology, 2nd Ed., pp. 163, Academic Press.
[33] ITU recommendation Transmission performance
characteristics of pulse code modulation channels
[34] J.S Lim and A.V. Oppenheim, Enhancement and
bandwidth compression of noisy speech, Proc. of
the IEEE, vol. 67, no. 2, pp. 1586-1604.

Vous aimerez peut-être aussi