Vous êtes sur la page 1sur 2

752 IEEE TRANSACTIONS ON ACOUSTICS,

SPEECH, AND SIGNAL PROCESSING, VOL. ASSP-28, NO. 6 , DECEMBER 1980

Correspondence

Suppression of Acoustic Noise in Speech Using tant design factor in noise suppressionsince large misadjust-
Two Microphone Adaptive Noise Cancellation ment manifests itself as apronounced echo in thespeech wave-
form.Echo canbepresent intheoutput speechsince the
STEVEN F. BOLL AND DENNIS C. PULSIPHER output is continually fed back when estimating thetap
weights. The echo is removed by reducing the adaptation step
size, and thus the misadjustment.This reduction, of course,
Abstract-Acoustic noise with energy greater or equal to the speech conflicts withtherequirement of quick settling time. The
can be suppressed by adaptively filtering a separately recorded corre- tradeoff between misadjustment and settling time is discussed
latedversionofthenoisesignaland subtracting it from the speech below.
waveform. It is shown that for this application of adaptive noise can- Another issue is filter causalty. In general, a noncausal filter
cellation, large filter lengths are required to account for a highly rever- is required if the noise reaches the speaker before reaching the
berantrecordingenvironmentand that thereis a directrelationbe- reference microphone. Noncausal adaptivefilters are easily
tween filtermisadjustmentandinducedechointhe output speech. generated by placing a delay into the primary channel. How-
The second reference noise signal is adaptively filtered using the least ever, more tap weights are then required with the accompany-
meansquares, LMS, andthelatticegradientalgorithms.Thesetwo ing misadjustment problems described above. For the experi-
approaches are compared in terms of degree of noise power reduction, ments described here, the reference microphone was placed
algorithm convergence time, and degree of speech enhancement. Both next to thenoise source, eliminating the need for delay.
methods were shown to reduce ambient noise power by at least 20 dB
with minimal speech distortion and thus to be potentially powerful as 111. EXPERIMENTATION
AND RESULTS
noisesuppressionpreprocessorsforvoicecommunicationinsevere An analog white noise generator was played out through a
noise environments. loudspeakerinto a hard-walled room.The reference signal
microphone was placed next to the loud speaker, while the
I. INTRODUCTION primary microphone was placed 12 ft away next to the control
terminal. Theauthor (D.P.) spokeintothe primarymicro-
It hasbeen shown that there is asignificant reductionin phone while controllingthestereo recordingprogram. The
measured speech intelligibility and quality due to the ambient noise power was amplified t o such a level that the recorded
background noise generated in many operating environments speech was completely masked. The signals were filtered at
[ 1 1. A number of single microphone approaches for reducing 3.2kHz, sampled at6.67 kHz,and quantized to15 bits.
the backgroundnoise added t o speech have beendeveloped Recordings were made with and without speech present, each
[2]. However, thesemethodsbecome ineffective when the lasting 23.4 s.
noisepower is equal t o or greater than the signal power or Each algorithms performance is measured in terms of the
when the noise spectral characteristics change rapidly in time. degree of steady-state noise power reduction during nonspeech
This correspondence describes an alternative approach to noise activity, the time it takes to reach this steady-state value (algo-
suppression in which a second correlated noise source is adap- rithm settling time), and the amount of echo induced when
tively filtered to minimize the output power between the two speech is present. Threeexperiments were conducted to
microphone signals. Two adaptive algorithm implementations measure algorithm settling time and induced echo as a func-
were investigated: the LMS approach [31 andthelattice tion of specifiedmisadjustment. Step sizes were used corre-
gradient approach [4]. Each approach was compared in terms sponding to misadjustments of 1, 5 , and10percent.The
of degree of noise power reduction, algorithm settling time, results showed that both algorithms converge to a steady-state
and degree of speech enhancement. noisepower reduction of -20 dB in approximately 15 s for
10percent misadjustment and 21 s for 5 percent misadjust-
11. IMPLEMENTATIONCONSIDERATIONS
ment. At 1 percent misadjustment the step size for the LMS
The estimated adaptive filter in the absence of uncorrelated algorithm was so small that the noise power was reduced by
noise represents a transfer function equalto the productof the only - 10 dB before the data ran out.Forthelattice algo-
transfer function from the noise source to the speaker multi- rithm, at 1 percent misadjustment, essentially no convergence
plied by the inverse of the transfer function from the noise was measured. In listening totheoutput during
speech
source tothe reference microphone. Based on simulation activity, it was judged that at 10 percent misadjustment an
studies [ 51, approximating this inverse transfer function ade- unacceptable amount of echo was present and that at 5 per-
quately requires using an all-zero filter having 1500tap cent misadjustment the echo was just noticeable.
weights. Sucha large filter inturn increases misadjustment Toillustrate thisnoisesuppression capability, isometric
(the ratio of excess to minimum mean-square error, [ 3 I ). As plots of the short-time magnitude spectra with and without
is discussed below, the amount of misadjustment is an impor- noisesuppression are shownin Figs. 1 and 2. Adescription
of theplotconstruction is described in [ 21. Fig. 1 corre-
sponds to the short-time spectrum of the unprocessed speech
ManuscriptreceivedSeptember 5, 1979; revisedFebruary 8, 1980. signal: the pipe began to. Fig. 2 corresponds to the pro-
Thisworkwas supported by theInformationProcessingTechniques
Branch of the Defense Advanced Research Projects Agency, monitored cessed speech signal using the 5 percent misadjustment after
by the Naval Research Laboratory under Contract N00173-79-C-0045. the filter has converged. Since the noise was acoustically
S. F. Boll is with the Department of Computer Science, University of added, n o underlying clean speech spectrum was available for
Utah, Salt Lake City, UT 84112. comparison. However, it was judged that the intelligibility of
D. C. Pulsipher is with Sandia Laboratories, Livermore,CA 94550. the processed speechhad clearly improved.This was based

0096-35 l8/80/1200-0752$00.75 0 1980 IEEE


IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND
SIGNAL PROCESSING, VOL. ASSP-28, NO. 6 , DECEMBER 1980 753

On Some Suboptimum ARMA Spectral Estimators


S. BRUZZONE AND M. KAVEH

Abstract-This correspondence describes some suboptimum schemes


for ARMA spectral estimation. A least squares method is presented and
compared to the method based on the modified Yule-Walker equations.
A modification of the latter method is also given that improves its be-
haviorinestimatingspectrawithnarrowpeaks.Examplesare then
Fig. 1. Short-timespectrumof the unprocessedspeech: the pipe shown that compare the suboptimum methods to the maximum likeli-
began to. hood one.
I. INTRODUCTION
In this correspondence anew suboptimumscheme is re-
ported for estimating the power spectral density (PSD) of an
autoregressive moving-average (ARMA) process of known
orders. After a preliminary data reduction, this scheme,called
the least squares (LS) estimator, minimizes a sum of squared
quadraticfunctions of the autoregressive (AR) coefficients
using a nonlinear leastsquares algorithm.The poles of the
estimated PSD are foundfromthe minimizing AR coeffi-
cients, and zeros are found from quadratic functions of these
Fig. 2. Short-time spectrumof the noise cancelled output. coefficients.
The general idea of least squares fitting the ARMA param-
upon the fact that before processing it was difficult to even eters is notnew,and various otherapproaches have been
detect that there was speech present in the noise, while after suggested (see, e.g., [ 3 ] and [ 71). The scheme discussed here
processing the speech was understandable. is, however,analogous to aminimummean-squared error
IV. CONCLUSIONS estimation of the parameters appearing in the estimator dis-
cussed in [ 21 and [ 51. A modification of the latter estimator
In terms of noise powerreductionandamount of echo that is based on the modified Yule-Walker equations(the
present, both approaches can be adjusted to give equivalent MYW estimator) is also presented, in which the problem of
results. Using step sizes corresponding to
approximately negative excursions of theestimated PSD is corrected by
5 percentmisadjustment, eachalgorithm converges (noise taperingtheestimated moving-average autocorrelationfunc-
power down 20 dB) after 20 s of input, with a just noticeable tion. Examplesare shown that compare the performance of
amount of echo.For this
whitenoise environment,the these ad hoc techniques to an approximate maximum likeli-
orthogonalizationand energy normalization providedin the hood one, the unconditional least squares method of Box and
gradient lattice approach offered noadvantage. Jenkins [ 3 , pp. 23 1-23 5 I.
Insummary,althoughthistwomicrophoneapproach to
noise suppression requires a second signal and possibly exces- 11.THESPECTRUM OF AN ARMA PROCESS
sive computation due to long filter lengths, it offers a poten- Assume that we observe x t , t = 1, * ,N where x t issta-
tially powerfulapproachforspeechenhancementin severe tionary and Gaussian of mean zero, and that x t fits an ARMA
noise environments. ( L , M ) model. Then we can write
ACKNOWLEDGMENT L
The authors wish to thank J. Makhoul, L. J. Griffiths, and xt - a i x t - l = ut (1)
i=l
E.Satorius, for their helpful discussions. Also they are grate-
ful to R. Power ,and G. Randall fortheir assistance in im- where ai are the AR parameters for the ARMA model and ut
plementing the algorithms on the FPS-120B array processor, is the MA residual sequence given by
to M. Milochik forpreparation of thephotographs,and to
E. Collins for preparation of the manuscript. lif
Ut=Et- biet-i (2)
REFERENCES i=1
C. F. Teacher and D. Coulter, Performance of LPC vocoders in a where bi are the MA parameters and Et is a zero-mean uncor-
noisy environment, in Proc. IEEE Con6 Acoust., Speech, Signal related normal sequence of variance .:u Define
Processing, Washington, DC, Apr. 1979, pp. 216-219.
S . F. Boll, Suppression of acoustic noise in speech using spectral A ~ = [ I - ,a l ; * * , - a L I .
subtraction, IEEE Trans. Acoust., Speech, Signal Processing, vol.
ASSP-27, pp. 113-120, Apr. 1979. Then the PSD of x t is given by
B. Widrow, J. McCool, M. Larimore, and C. R. Johnson, Jr., Sta-
tionary and nonstationary learningcharacteristicsof the LMS S,(Z) = S,(Z)lATZL 1-2 (3 1
adaptive filter,Proc. IEEE, vol. 64, pp. 1151-1162, Aug. 1976. where 2; = [ 1, z - l , * * ,z - ~ ] ,z being the z-transfer operator
L. Griffiths, An adaptive lattice structure used for noise-cancel-
lingapplications,in hoc. IEEE Con$ Acoust., Speech, Signal
Processing, Tulsa, OK, Apr. 1978, pp. 87-90. ManuscriptreceivedMarch 14,1980; revised June 26, 1980. This
D. Pulsipher, S . F. Boll, C. K. Rushforth, and L. K. Timothy, Re- workwas supported by the AirForce Office of ScientificResearch
duction of nonstationary acoustic noise in speech using LMS adap- under Grant AFOSR-78-3628.
tive noise cancelling, in Proc. IEEE Con$ Acoust., Speech,Signal The authorsarewith the Department of ElectricalEngineering,
Processing, Washington, DC, Apr. 1979, pp. 204-208. University of Minnesota, Minneapolis,MN 55455.

0096-3518/80/1200-0753$00.75 0 1980 IEEE

Vous aimerez peut-être aussi