Académique Documents
Professionnel Documents
Culture Documents
Authors:
Aleksandar Jovanovic
Kalle Nilvér
Patrik Söderberg
Magnus Broberg
1
Abstract
This paper shows some implementations of acoustic echo cancellation algorithms in Matlab and
the results of analysis on the broader systems involved. It is the result of a project in the course
adaptive signal processing at Uppsala University. It focuses on Normalized Least Mean Square
(NLMS) and Variable Impulse Response Double Talk Detector (VIRE DTD). Discussions on
Stereophonic Acoustic Echo Cancellation (SAEC) are carried out and we recommend some
topics for doing further work on with this project.
2
Acknowledgements
Professor Andreas Jakobsson at Karlstad University has developed an assignment1 for a course in
adaptive signal processing which clearly illustrates the effects of acoustic echo cancellation. It
has implementations of the NLMS algorithm and of the Geigel DTD. We have used the Matlab
code as a starting ground for our work and have also used the sounds included since our recorded
sounds created strange results, perhaps due to some downsamplings that we made. Andreas was
also kind enough to mail an article about Fast Normalized Cross Correlation (FNCC) DTD and
answer a question on the behavior of VIRE DTD.
Mikael Sternad has been our supervisor and we have received a lot of support from him.
Daniel Aronsson showed a useful way of plotting data in a timed sequence, a technique which we
used since to analyze how the filter adapted. The technique also clearly showed the effects of
adaption while double talk was present. He also suggested a loop method to extract correct
threshold for double talk detection.
Lars Johan Brännmark helped out when we had problems of measuring the room impulse
response. After some short time of work from his side the measurements worked correctly.
Simon Mika, Simon Moritz and Carl-Johan Larsson made some progress with this project last
year and some of our work is based on their results.
1
Åhgren, Per and Jacobsson, Andreas (2006) Course material for course in adaptive signal processing at Karlstad
University
3
Table of contents
1 Conclusion ............................................................................................................................... 5
2 Introduction.............................................................................................................................. 7
3 Background .............................................................................................................................. 7
4 System overview...................................................................................................................... 9
5 Filter adaptation ....................................................................................................................... 9
5.1 LMS ............................................................................................................................... 10
5.2 NLMS ............................................................................................................................ 11
6 Talk detection......................................................................................................................... 12
6.1 Far-end talk detection .................................................................................................... 12
6.2 Double talk detection ..................................................................................................... 13
6.2.1 Geigel..................................................................................................................... 14
6.2.2 VIRE DTD ............................................................................................................. 15
6.2.3 Other ...................................................................................................................... 17
7 Comfort noise generator ........................................................................................................ 17
8 Measuring room impulse response ........................................................................................ 20
9 Stereophonic Acoustic Echo Cancellation (SAEC)............................................................... 22
10 Real time implementation .................................................................................................. 24
11 Views on further development........................................................................................... 25
12 Figure index ....................................................................................................................... 26
13 Matlab code segment index ............................................................................................... 26
14 Mathematical formula index .............................................................................................. 26
15 Subject index...................................................................................................................... 26
16 Bibliography ...................................................................................................................... 27
17 Appendix............................................................................................................................ 28
17.1 aec.m - For acoustic echo cancellation .......................................................................... 28
17.2 ir.m - For calculating impulse response of a room ........................................................ 31
4
1 Conclusion
There are many ways of solving the acoustic echo cancellation and the “market” is flooded with
algorithms for both adaptation and double-talk detection. We opted for the VIRE DTD algorithm
proposed by Per Åhgren2 but a stable implementation was very difficult and behaviour was
somewhat unpredictable.
This year we improved last year’s work humongously, making our solutions better on all
previously implemented areas and implemented all restoring parts of a complete AEC solution.
The results were very good, but there are of course things that could be improved in a
complicated system like this. Maybe next step would be to try to cut down the computation time
and take the step to a full real-time system.
x 10
2
arg 1
-2 4
x 10 2 4 6 8 10 12 14 16 18 20
2
arg 2
0
-2 4
x 10 2 4 6 8 10 12 14 16 18 20
2
arg 3
-2 4
x 10 2 4 6 8 10 12 14 16 18 20
2
arg 4
-2 4
x 10 2 4 6 8 10 12 14 16 18 20
2
arg 5
-2 4
x 10 2 4 6 8 10 12 14 16 18 20
2
arg 6
-2 4
x 10 2 4 6 8 10 12 14 16 18 20
2
arg 7
-2
2 4 6 8 10 12 14 16 18 20
Figure 1: SPCLAB result window showing, from top to bottom, far-end talk, near-end talk, microphone
pickup, filtered signal, double-talk detection, far-end talk detection and finally an indication on when
adaptation is taking place.
2
Åhgren, Per (2004) On System Identification and Acoustic Echo Cancellation
5
spclab( xF(indV), v(indV), y(indV), e(indV), isDT(indV)*max(e)/2,
isFET(indV)*max(e)/2, isAdapt(indV)*max(e)/2 );
Matlab code segment 1: Using spclab to plot the results.
6
Figure 3: ERLE plot with NLP through Matlab command plot( smooth(-
10*log10((((e(1:100000).*not(isNLP(1:100000)))).^2+eps) ./ ((y(1:100000)).^2+eps)+eps), 1000) );
2 Introduction
This report is the result of a project course in adaptive signal processing at Uppsala University.
The aim of the project is to improve last year’s research on acoustic echo cancellation. More
specifically the tasks were to implement algorithms for adaptation in such ways that the results
exceeded those of the previous group using one microphone and one loudspeaker, and to further
examine the possibility of using two microphones and two loudspeakers. The tool used is
primarily Matlab.
3 Background
The problem of acoustic echo cancellation is the result of hands-free telephony and tele-
conferencing systems. In early telephony the microphone and loudspeaker were separated and no
sound could propagate between the speaker and the microphone. Therefore no echo would be
transmitted back. Using a hands free loudspeaker telephone, however, the sound from the
loudspeaker will be picked up by the microphone and transmitted back to the sender who will
7
recognize this as an echo. This severely reduces conversation quality, even at very small echo
delays.
In a room with no propagation delay and no room impulse response (i.e. a studio with dampening
walls and the microphone placed with no distance from the loudspeaker) the solution would
simply be to subtract the input (far-end talk), which is readily available, from the output signal
picked up by the microphone, which consists of both near-end talk and far-end talk. After the
subtraction the output signal would consist of near-end talk only. This is not possible, however,
since the room both alters the sound and spreads it over time. Using IP-telephony, as illustrated in
Figure 4, this spread over time will vary according to the delay in the net and therefore IP-
telephony introduces even more problems. Due to these problems the input must be modified
accordingly before we subtract it. The problem is that the parameter after which it should be
modified is unknown. This is where the adaptive filtering technology comes in. The adaptive
filter adjusts according to inputs and outputs to form the parameters mentioned after which the
input must be modified if a subtraction is to be useful.
Acoustic echo cancellation also introduce a second problem: To be able to detect when there is
nothing but far-end talk entering the microphone, and when there are other things entering aswell.
The adaptation algorithm uses only one measurement, the difference between the modified input
and the real input. If this difference is zero, and no near-end talk is present, the filter will be an
8
exact copy of the room impulse response and hence work as intended. Now if, at this time, there
is near-end talk, the difference will be equal to the near-end talk and the filter algorithm will
notice this as an error in the filter. The filter will therefore adapt to cancel out the near-end talk
aswell and as a result the it will cease to work.
The same techniques are used in data networking where there are also problems with echoes.
4 System overview
To solve the acoustic echo problem the setup in Figure 5 was used.
Figure 5: System overview. The following notations are used: v(t) = white noise, ĝ(t) = adaptive coloring filter,
n_hat(t) = comfort noise, NLP = Non-Linear Processor, e(t) = error signal, d(t) = estimated echoic signal,
h_hat(t) = adative filter, y(t) = uncorrected output, s(t) = near-end talk, n(t) = ambient near-end noise.
The goal is to mimic h(t), which is the room impulse response, with the adaptive filter ĥ(t ) . The
Comfort Noise Generator and the NLP is used to further improve output, but are not an essential
part of the adaptive filtering problem.
5 Filter adaptation
A filter is something that transforms data to extract useful information from a noisy environment.
In digital filtering there are two primary types; infinite impulse response (IIR) and finite impulse
response (FIR). IIR filters can normally achieve the same filter characteristics as a FIR filter
using less memory and calculations with the cost of possibly becoming unstable. As the filters
become more complex though, IIR filters needs more parameters and the advantages are reduced.
9
Because of the high complexity of the many strong and sharp peaks in a room impulse response
the filters that are being used in acoustic echo cancellation are usually of the FIR type.3
For the filter in an acoustic echo cancellation to work in the real case, with changing parameters
such as different room acoustics and people moving around in the room, a filter with adapting
parameters (taps) is necessary.
There are numerous adaptive algorithms that are applicable in acoustic echo cancellation such as
least mean squares (LMS), recursive least squares (RLS), affine projection algorithm (AFA) and
different degenerates there of. LMS is an old, simple and proven algorithm which has turned out
to work well in comparison with newer more advanced algorithms. In this project we use the
normalized LMS (NLMS) for the main filter and LMS for the noise generation.
5.1 LMS
In 1959 Widrow and Hoff introduced the LMS algorithm. During the years this has been the by
far most used adaptive filtering algorithm for several reasons; it was the first, it requires relatively
little computation and it works well, at least for slow changes in the filter.
The LMS filter uses a gradient method of steepest decent to adapt its weights to minimize some
function c(i) defined in Mathematical formula 1
(
c (n ) = E e(n )
2
)
Mathematical formula 1: Function c(n) to be minimized.
3
Liavas, Athanasios P. and Regalia, Phillip A. (1998) Acoustic Echo Cancellation: Do IIR Models Offer Better
Modeling Capabilities than Their FIR Counterparts?
10
where e(n) is defined in Figure 5. In comparison to other algorithms LMS is relatively simple as
it doesn’t require correlation function calculation or matrix inversions, for each sample in the
signal LMS only uses two multiplications and two additions.
µ
hn +1 (i ) = hn (i ) + * e(n ) * x(n − i )
2
Mathematical formula 2: Adjustment of taps using LMS algorithm.
The taps are adapted as shown in Mathematical formula 2 where h, e and x are defined in Figure
5 and µ is the step length between zero and one over the largest eigenvalue of the correlation
matrix.
5.2 NLMS
The primary disadvantage of the LMS algorithm is its slow convergence rate that is the result of
the static step length µ. In NLMS µ is normalized by the energy of the signal vector as in
Mathematical formula 3 and therefore achieves a much faster convergence rate then LMS at a
low cost. To avoid division by zero a small number σ is often added to the energy.
µ0
µ NLMS (n ) =
σ + x (n ) * x(n )
T
µ0
hn +1 (i ) = hn (i ) + * e(n ) * x(n − i )
σ + x (n ) * x(n )
T
11
Figure 7: The error declines with time as the adaptive filter converges towards the room impulse response.
6 Talk detection
Talk detection is used for deciding on when to activate the NLP and when to adapt the filter ĥ(t ) ,
amongst others. There are two types of talk detection, far-end talk detection and double talk
detection.
12
energyOfSignalBlock = signalBlock' * signalBlock;
powerOfSignalBlock = energyOfSignalBlock / filterLength;
isFET(k) = (powerOfSignalBlock > farendThres);
Matlab code segment 3: Measurement of far-end talk power and comparison to calculated threshold.
4
x 10 Far-end voice detection
2.5
1.5
0.5
Amplitude
-0.5
-1
-1.5
However there is a problem in the real case where near-end talk is not available by it self but only
in a combination with far-end talk in the microphone signal. The difficulty is to distinguish the
different sub signals and to know which is what.
13
There are several solutions to this problem and we have chosen to implement two of these and
study them in terms of performance and computational complexity.
Figure 9: Double talk detection. When both far-end talk and near-end talk is present a detection variable
(marked with a blue line) is set.
6.2.1 Geigel
The Geigel algorithm is a very simple DTD with low computational complexity. It is based on
the assumption far-end talk has lower power then the near-end talk when we receive the signal in
the microphone. The room will most likely have worked dampening on the far-end signal and the
volume on the speaker is with any luck not turned up too much. Practically we form a decision
variable as shown in Mathematical formula 5.
y (t )
d (t ) =
max{ x(t ) , K , x(t − n ) }
14
We implemented this and got it to work very well if the power of the far-end signal was
significantly lower then the near-end signal. This was an acceptable solution to the double-talk
problem, but the implementation areas we aimed at, with unknown speaker and microphone
positions, demanded a more flexible solution.
Figure 10: The development of the taps during filter adaptation. When double talk is present the taps diverge
from the average.
If the variance exceeds some threshold, which could be varied over time, we have double-talk. In
other words, if the estimated room impulse response changes a lot, we assume that it is not the
room that has changed, but that some other source of sound has appeared. The formula is
somewhat complicated,
15
σ γ2 (t ) = λσ γ2 (t − 1) + (1 − λ )(γ − γ (t ))2
γ (t ) = λγ (t − 1) + (1 − λ )γ
{
γ = max hˆ1 , K, hˆn }
Mathematical formula 6: The VIRE algorithm.
with γ¯(t) being the mean of γ, and λ a forgetting factor4, though the calculation is very
lightweight as it only needs five multiplications.
We got this algorithm to work very well, but it demanded certain tweaking that seemed to be
input specific. Especially λ, the forgetting factor, was a challenge to understand and optimize. To
get good results we needed to change it a lot, and we couldn’t find a good way to estimate it over
time.
% VIRE DTD
if k > 10,
tap(k) = max(abs(tempFilter));
tapmean(k) = forgettingFactor*tapmean(k-1) + (1-forgettingFactor)*tap(k);
variance(k) = forgettingFactor*variance(k-1) +
(1-forgettingFactor)*(tap(k)-tapmean(k))^2;
end
if(k > 10*filterLength && variance(k) > vireThres && k+DTMemory < length(xF))
isDT(k : k+DTMemory) = 1;
end
Matlab code segment 5: VIRE DTD algorithm.
Because the calculation is very lightweight, however, it fitted our purpose best of the algorithms
we read about, so we decided to go with it. Nonetheless there are some other algorithms that were
worth looking at.
4
Åhgren, Per (2004) On System Identification and Acoustic Echo Cancellation
16
Figure 11: The VIRE variance. When it exceeds the threshold (marked with a red line) it triggers the detector
(marked with a blue line).
6.2.3 Other
There are several other ways of detecting double talk. The Cheap Normalized Cross Correlation
(CNCR) algorithm for example is based on the comparison of the variances of the estimated
signal and the measured signal.
It might be a good idea to go with another DTD algorithm if you skip the real-time
implementation goal; it would have saved us a lot of time if we chose an easier algorithm and that
might would have given us better results as well.
17
isNLP(k) = ( not(isDT(k)) && isFET(k) );
Matlab code segment 6: Setting status of NLP.
However, if one would choose to not transmit anything the user on the other end might suspect
that the line has gone down. To avoid this, comfort noise is sent instead. This noise is colored
according to the background noise in the room. It is done through an adaptive filter ĝ (t ) which is
calculated by LMS with the error signal used by the main filter ĥ(t ) as an adaptive parameter.
4
x 10 Non-Linear Processor Activation
2.5
1.5
1
Amplitude
0.5
-0.5
Non-Linear Processor
-1 Far end signal
Near end signal
-1.5
0 2 4 6 8 10 12 14 16 18
Sample 4
x 10
Figure 12: This figure shows the activation of the NLP according to far end talk and near end talk. The NLP
should be activated when there is far end talk but no near end talk.
White noise is created with the WGN (White Gaussian Noise) command in Matlab. We are
setting the strength of this noise statically (assigning -27 to a parameter specifying power in
decibels relative to a watt) as shown in Matlab code segment 7, but we would rather want to set it
dynamically according to the intensity of the ambient noise in the room. This however has proven
very difficult and therefore the noise level is adjusted to suit our equipment. If other equipment is
to be used this parameter may have to be altered.
18
whiteNoise(k) = wgn(1,1,-27);
Matlab code segment 7: The creation of white noise.
The coloring filter ĝ (t ) is updated if there is near-end talk as shown in Matlab code segment 8.
if(isNT(k))
% Adapt using LMS
CNGFilter = CNGFilter + mu0/2 * e(k) * whiteNoiseBlock;
end
Matlab code segment 8: Adaptation of comfort noise coloring filter.
Over time the filter will adapt to model the noise that is present in the near end room as
illustrated in Figure 13.
Comfort noise filter after 1700 samples Comfort noise filter after 35000 samples
1000 1000
Filter taps Filter taps
500
500
0
Gain
Gain
0
-500
-500
-1000
-1000 -1500
0 20 40 60 80 100 0 20 40 60 80 100
Tap Tap
Comfort noise filter after 79000 samples Comfort noise filter after 158000 samples
1000 2000
Filter taps Filter taps
500
1000
0
Gain
Gain
0
-500
-1000
-1000
-1500 -2000
0 20 40 60 80 100 0 20 40 60 80 100
Tap Tap
Figure 13: The comfort noise filter at 1700, 35000, 79000 and 158000 samples, respectively. Notice the slight
divergence at 158000.
Finally, the assembly of a number of white noise samples generated in Matlab code segment 7,
whiteNoiseBlock, is filtered through the coloring filter CNGFilter if the NLP is activated,
19
if(isNLP(k))
comfortNoise(k) = whiteNoiseBlock' * CNGFilter;
e(k) = comfortNoise(k);
end
Matlab code segment 9: Coloring of comfort noise.
and the comfort noise is set as the output. Figure 14 is showing what the colored noise looks like
at the times the NLP is active, that is comfortNoise(k). This result is added to the output signal.
CNG
2000
Comfort noise
1500 Non Linear Processor
1000
500
Noise Level
-500
-1000
-1500
-2000
0 2 4 6 8 10 12 14 16 18
Sample 4
x 10
Figure 14: The generated comfort noise as the NLP turns off the microphone output. The noise filter diverges
somewhat between 150000 and 165000 samples when we have undetected near end talk which create a louder
noise level but then starts to converge.
To measure an impulse response of a filter (in our case room) there are several methods that can
be used, one can record the echo of a impulse such as a loud bang, one can use sine-waves of all
the different frequencies as input and see what the system does or one can record what comes out
20
of the system when white noise is used for input. In the latter two cases the response of the
system is deconvolved with the input and the resulting signal is the impulse response.
Using sine-waves will produce the best result but going though all the relevant frequencies is a
time consuming task. For the impulse method to work optimal an infinitely short pulse with
infinite height needs to be used. A loud bang such as a clap or a balloon popping would work but
in most cases not give a very good result. The final method of recording white noise has the
potential of giving a good result while it quite easily can be realized using Matlab and doesn’t
require anything other than a computer equipped with a microphone and a speaker. This is the
method that we have chosen.
To be able to use division in the frequency domain instead of deconvolution in time domain
white noise with constant power is needed. This is accomplished by generating the noise in the
frequency domain with random phase.
Further improvement of the result can be achieved by using the mean of multiple recordings.
The periodic signal is then played back and recorded at the same time after which the recording is
divided into periods again.
21
periods = 32;
for j=1:periods
xx(((j-1)*fftLen+1):(j*fftLen)) = x; % make signal periodic
end
r = audiorecorder(fs,16,1);
sound(xx, fs); % play noise
recordblocking(r,length(xx)/fs); % record
noiserec = getaudiodata(r);
for j=1:periods
x_rec(j,:) = noiserec(((j-1)*fftLen+1):(j*fftLen)).'; % split into blocks
end
Matlab code segment 11: Recording of noise.
Finally the impulse response is calculated by dividing recording with noise in frequency domain
as presented in Matlab code segment 12.
for j=1:periods
X_REC = fft(x_rec(j,:)); % transform recorded noise
ir(j,:) = ifft(X_REC./X); % determine one impulse response
end
impulse_response = mean(ir);
Matlab code segment 12: Calculate impulse response.
In teleconferencing systems, stereo sound would offer a better user experience then in a mono
system. It would offer the users the possibility to distinguish between different voices by
determining which speaker delivers the sound. But to cancel acoustic echo that comes from two
speakers into two microphones, that is demanded if we want stereo sound, turns out to be a very
complex problem.
22
Figure 16: A typical Loudspeaker Enclosure Microphone (LEM) setup in the stereophonic case
One microphone signal, y1(t), can be modeled with the following equation:
Where * denotes convolution, h11 and h12 are the room impulse responses for the different
speakers to the microphone and n(t) is the noise of the room. The other microphone signal can be
modeled similarly since the system is symmetrical. This will make it at least four times as
computation heavy as in the mono case.
One big problem with SAEC is what is commonly referred to as the non-uniqueness problem5.
This problem arises from the fact that the signals x1(t) and x2(t) are highly correlated since they
originate from the same source and the fact that in a typical scenario where you would like stereo
sound there are different people speaking alternatively6. The algorithms used must track both
near-end and far-end changes in the echo paths, which is not easy since they can change so
drastically if another person starts talking. It is therefore important to keep the rooms impulse
response estimate very close to the real room impulse response before the paths change, which
would demand fast adaptive filter, and that is of course very hard to accomplish7.
5
Sundar G. Sankaran (1999), On ways to improve adaptive filter performance
6
Masahiro Yukawa, Noriaki Murakoshi, and Isao Yamada (2005), Efficient Fast Stereo Acoustic Echo Cancellation
Based on Pairwise Optimal Weight Realization Technique
7
Åhgren, Per (2004) Stereophonic Acoustic Echo Cancellation
23
There are different ways to approach SAEC and the non-uniqueness problem. One way is to
reduce the correlation of the input signals, but without adding distortion to the signal. One way to
do this is to introduce nonlinearity in one of the input signals. You could also add random noise
to the input channels. A possible way to do this without destroying the sound totally would be to
add noise that the human can’t hear.
It seems very hard to solve the SAEC problem, and extremely hard to implement it in a
practically usable way. It would be very computational heavy and the different user-scenarios
that could come up makes SAEC a big, if not impossible, challenge for next years group.
Real time implementation is possible through the use of custom Very Large Scale Integration
(VLSI) processors or Digital Signal Processors (DSP). These processors are specially designed
for signal processing tasks and their computational power is very high10. They provide parallel
processing of commands. DSP programs work by hardware interrupts. Sampling at 8 kHz, the
sampling program will be interrupted every 125 µsec by the next sampled signal. In the case of a
40 MHz fast DSP each instruction takes 25 nsec to complete. Thus, there are 125 µsec/25 nsec =
5000 machine cycles available for echo canceling calculation before the next sampled signal
arrives11.
8
Berkeman, Anders and Owall, Viktor Ä Architectural tradeoffs for a custom implementation of an acoustic echo
canceller
9
Raghavendran, Srinivasaprasath (2003) Implementation of an Acoustic Echo Canceller Using Matlab
10
Åhgren, Per (2004) On System Identification and Acoustic Echo Cancellation
11
Chong Chew, Wee and Boroujeny, Farhang (1997) Software Simulation and Real-time Implementation of Acoustic
Echo Cancelling
24
Figure 17: A process diagram for handling incoming data continously.
Multiplication has several times higher complexity than an addition, so only multiplications are
considered when choosing proper DSP. The division operation used by the NLMS has a high
complexity, but it is not used as frequent as multiplications in this algorithm, and is therefore
considered negligible in the analysis.
Unlike Matlab where we have floating point data representation, DSP algorithms store the data
with finite precision. Unnecessary large word lengths on signals result in larger arithmetic blocks
and larger memories. Such extra hardware consumes power without any performance gain.
Therefore, all signals should have a minimum word length. On the other hand, as the signal word
length also determines resolution and dynamic range, there is a trade off between performance
and power consumption. It is important to keep signals wide enough to avoid overflow and
rounding errors, or at least keep the probabilities for such events to a minimum.
25
12 Figure index
Figure 1: SPCLAB result window. .................................................................................................. 5
Figure 2: ERLE plot......................................................................................................................... 6
Figure 3: ERLE plot with NLP ........................................................................................................ 7
Figure 4: A telephone conference using an IP-telephony system.................................................... 8
Figure 5: System overview.. ............................................................................................................ 9
Figure 6: Filter adaptation.............................................................................................................. 10
Figure 7: The error decline. ........................................................................................................... 12
Figure 8: Far-end talk detection..................................................................................................... 13
Figure 9: Double talk detection...................................................................................................... 14
Figure 10: The development of the taps during filter adaptation................................................... 15
Figure 11: The VIRE variance....................................................................................................... 17
Figure 12: Activation of NLP according to far end talk and near end talk.................................... 18
Figure 13: The comfort noise filter at 1700, 35000, 79000 and 158000 samples. ........................ 19
Figure 14: The generated comfort noise as the NLP turns off the microphone output.. ............... 20
Figure 15: Impulse response in room 1116 at Magistern. ............................................................. 21
Figure 16: A typical LEM setup in the stereophonic case ............................................................. 23
Figure 17: A process diagram for handling incoming data continously........................................ 25
15 Subject index
CNG, 17 DTD, 2, 3, 17
DSP, 24, 25 ERLE, 6, 7, 26
26
Geigel, 3, 15 Real time implementation, 24
LEM, 23 room impulse response, 3, 20, 23, 24
NLMS, 2, 3, 11, 18, 25 SAEC, 2, 22, 23, 24
NLP, 17, 18, 20 WGN, 18
Nyquist theory, 24 VIRE DTD, 3
16 Bibliography
1. Åhgren, Per and Jacobsson, Andreas (2006) Course material for course in adaptive signal
processing at Karlstad University
http://www.it.kau.se/ee/utbildning/kurser/tel614/Download.html
2, 4, 10. Åhgren, Per (2004) On System Identification and Acoustic Echo Cancellation
http://www.ahgren.com/publications/phdthesis.pdf
3. Liavas, Athanasios P. and Regalia, Phillip A. (1998) Acoustic Echo Cancellation: Do IIR Models
Offer Better Modeling Capabilities than Their FIR Counterparts?
http://www.telecom.tuc.gr/Greek/Liavas/publications/Acoustic%20Echo%20Cancellation%20Do
%20IIR%20Models%20Offer%20Better%20Modeling%20Capabilities%20than%20Their%20FI
R%20Counterparts.pdf
6. Masahiro Yukawa, NoriakiMurakoshi, and Isao Yamada (2005), Efficient Fast Stereo Acoustic
Echo Cancellation Based on Pairwise Optimal Weight Realization Technique
http://www.hindawi.com/GetPDF.aspx?doi=10.1155/ASP/2006/84797
8. Berkeman, Anders and Owall, Viktor Ä Architectural tradeoffs for a custom implementation of
an acoustic echo canceller
http://www.norsig.no/norsig2002/Proceedings/papers/cr1125.pdf
11. Chong Chew, Wee and Boroujeny, Farhang (1997) Software Simulation and Real-time
Implementation of Acoustic Echo Cancelling
http://www.ece.mtu.edu/ee/faculty/rezaz/index_files/Seminapapers2004/Kashulpatel.pdf
27
17 Appendix
% calculate the variance of the voice, in the real case we don't have this
% so we use a statically assigned value instead
varianceOfVoice = var(v);
% Rough estimate of suitable treshold for VIRE DTD, should be done as a
% estimate in the loop since we don't have xF
vireThres = (mu0*varianceOfVoice / (2 - mu0))*mean(1/norm(xF)^2);
28
signalBlock = eps*ones(filterLength,1); % Adaptive filter state for time t.
if (exist('filterTaps') == 0 || not(length(filterTaps) == filterLength))
filterTaps = eps*ones(filterLength,1); % Adaptive filter weights.
end
tempFilter = eps*ones(filterLength,1); % Adaptive filter weights.
saveFilter = eps*ones(filterLength,numberOfSaves);
saveTempFilter = eps*ones(filterLength,numberOfSaves);
tap = zeros( size(xE) ); % the maximoum tap of the filter over time
tapmean = zeros( size(xE) ); % the mean of the max taps
variance = zeros( size(xE) ); % the variance of the max taps
% Calculate the far-end detection threshold, not for the real case
% should be done in the loop using fethres = max(fethresh, power(last fs samples))
% or statically assigned
temppow = zeros(floor(length(xF)/fs)-1, 1);
for k = 1:length(temppow),
temppow(k) = xF((k-1)*fs+1:k*fs)' * xF((k-1)*fs+1:k*fs) / fs;
end
farendThres = max(temppow) * 1/10; % FE threshold is 1/10 of the average power
% VIRE DTD
if k > 1,
29
tap(k) = max(tempFilter);
tapmean(k) = forgettingFactor*tapmean(k-1) +
(1-forgettingFactor)*tap(k);
variance(k) = forgettingFactor*variance(k-1) +
(1-forgettingFactor)*(tap(k)-
tapmean(k))^2;
end
if (isNT(k))
% Adapt comfort noise filter using LMS
CNGFilter = CNGFilter + mu0/2 * e(k) * whiteNoiseBlock;
end
if (isNLP(k))
comfortNoise(k) = whiteNoiseBlock' * CNGFilter;
e(k) = comfortNoise(k);
end
if ( isAdapt(k) )
% use the temp filter on the signal (in the next iteration)
filterTaps = tempFilter;
end
figure;
plot (y, 'DisplayName' , 'Recorded signal');
title('Resulting signal');
hold all;
plot (e, 'DisplayName' , 'Echo cancelled signal');
hold off;
30
figure;
semilogy (isDT*mean(variance)*5, 'DisplayName', 'Double talk detector');
title('Double talk variance/threshold');
hold all;
semilogy (smooth(variance,1000), 'DisplayName', 'VIRE variance');
semilogy (vireThres*ones(length(xF),1), 'DisplayName', 'Double talk
threshold');
hold off;
figure;
plot (isNLP*max(xF), 'DisplayName', 'Non linear processor' );
title('NLP');
hold all;
plot (abs(xF), 'DisplayName', 'Far end signal');
plot (-abs(v), 'DisplayName', 'Near end signal');
hold off;
figure;
plot (comfortNoise, 'DisplayName' , 'CNG' );
title('CNG');
hold all;
plot (isNLP*max(comfortNoise), 'DisplayName', 'Non linear processor');
hold off;
figure;
plot (tap, 'DisplayName' , 'Largest tap' );
title('Tap development');
hold all;
plot (variance, 'DisplayName', 'Variance of largest tap');
hold off;
figure;
C:\Documents and Settings\Patrik\Skrivbord\aec.m 5 of 5
den 8 juni 2006 11:57:45
hold off;
for k=1:length(saveFilter(1,:))
plot(saveFilter(:,k));
axis([1 filterLength -1 1]);
title(k);
drawnow;
pause(0.1);
end
figure;
hold off;
for k=1:length(saveCNGFilter(1,:))
plot(saveCNGFilter(:,k));
title(k);
drawnow;
pause(0.1);
end
% Hej!
31
%
% Ett knep jag brukar använda är att generera en periodisk brussignal, där periodtiden
% måste vara större än längden på rummets impulssvar (ett vanligt vardagsrum har
% typiskt ca 0.5 sek impulssvar). Sedan beräknar man fft:n blockvis, med
% blocklängd = brusets periodtid. För att inte tappa noggrannhet, är det viktigt
% att X:s absolutbelopp är ca =1 för alla frekvenser, vilket löses genom att generera
% bruset i frekvensdomänen, med konstant belopp och slumpmässig fas, tex så här:
pause(30)
fs=44100; %Samplingsfrekv
fftLen=fs*1; %Impulssvaret ej längre än 0.5 sek, blocklängden blir fs/2
x=ifft(X); %Beräkna tidssekvensen (skall vara reell om allt funkar som det skall)
x=0.99*x/max(abs(x));
periods = 32;
xx = zeros(1, periods*fftLen); %Periodisera, t.ex. 12 perioder
for j=1:periods
xx(((j-1)*fftLen+1):(j*fftLen)) = x;
end
% Vid identifieringen, använd fs/2 punkter i fft:n. Gör som ni gjort tidigare, men nu
% för varje block, och medelvärdesbilda. Brusigheten kan fås ner ytterligare om ni
% klipper bort eventuell inledande tystnad i början av inspelade bruset, alternativt
% medelvärdesbildar över fler perioder.
r = audiorecorder(fs,16,1);
sound(xx, fs);
recordblocking(r,length(xx)/fs);
noiserec = getaudiodata(r);
32