Vous êtes sur la page 1sur 17

Politechnika Warszawska

Wydzia Elektroniki i Technik Informacyjnych


Zakad Automatyki i Robotyki

Tutorial
Signal processing
Prof. dr hab. in. Wodzimierz Kasprzak

Report on
Spectrogram clustering

by
Shanmugaraj V S Kalyanasundaram
Vignesh Dhanabal

Task
An audio signal is given that contains 2 types of noises: a constant factory noise and an
occasional useful signal. Apply windowed FFT to the signal, analyze its spectrogram and
make a clustering of Fourier-based features in a sufficient number of clusters, to differentiate
between different signals. Make visualization of the time-domain signal and its spectrogram, of
intermediate and final processing results. Prepare a report containing a description of the
proposed algorithm, the computer program design and test results.

Introduction
The background noise is the most common factor degrading the quality and intelligibility of
speech in recordings. The noise reduction module intends to lower the noise level without
affecting the speech signal quality. This module is based on the spectral subtraction performed
independently in the frequency bands corresponding to the auditory critical bands.
In communication systems noise handling is important process to improve the quality of the
signal. Two types of deliberately generated noise in common use are referred to as 'white noise',
which has a uniform spectral power density at all frequencies, or 'pink noise' which has a
power spectral density that falls at 3dB/octave with rising frequency. The parameter which is use
to determine, whether a signal is a good signal or a bad signal is SNR .Higher the SNR ratio
higher the signal quality and vice versa. There are more ways to increase the SNR of the signals.
For example, spectral subtraction is used to increase the SNR.
Task and algorithm
Task description:
In this program we are going to minimize the noise that is presented in the audio file, provided.
The noisy signal can be de-noised or the noise is reduced using time-frequency spectral
subtraction and further it can be processed through the block thresholding algorithm.
Algorithm description:
Step1: The time-frequency spectral subtraction algorithm calculates the Short Fourier Transform
(STFT) for that it uses hamming window (length=1367)
Step 2: then it estimate the spectrum of the noisy signal.
Step 3: the attenuation map is calculated by using estimated signal to noise ratio.
Step4: Finally the noise reduced signal can be obtained by taking Inverse Short Time Fourier
(ISTFT).
Step 5: The SNR of the input, output and optimal output signal is calculated.

Signal analysis
The given signal is plotted in time domain with the different noise listening rates. By changing
the listening time the noise spectrum estimation will change. The change will affect the time
index of the noisy spectrum. Below figures shows the various listening times and their
spectrogram respectively.
Time from 0 to 10
Signal with Noise

Time 0 to 1

Signal de-noising
Signal de-noising consist below processes.

estimating snr,

computation of attenuation map and

Inverse short time Fourier transform.

1. Estimating SNR:
For improving the SNR ration we are using time-frequency spectral subtraction
technique. Below paragraphs are used to explain the process of time frequency spectral
subtraction.
Time frequency spectral subtraction:
The spectral subtraction method is a simple and effective method of noise reduction. In this
method, an average signal spectrum and average noise spectrum are estimated and subtracted
from each other, so that average signal-to-noise ratio (SNR) is improved. It is assumed that the
signal is distorted by a wide-band, stationary, additive noise, the noise estimate is the same

during the analysis and the restoration and the phase is the same in the original and restored
signal.
The noisy signal y(m) is a sum of the desired signal x(m) and the noise n(m):
y(m) = x(m) + n(m)
In the frequency domain, this may be denoted as:
Y(j) = X(j) + N(j) => X(j) = Y(j) - N(j)
Where,
Y(j), X(j), N(j) are Fourier transforms of y(m), x(m), n(m), respectively.
The statistic parameters of the noise are not known, thus the noise and the speech signal are
replaced by their estimates:

The noise spectrum estimate

is related to the expected noise spectrum

which is usually calculated using the time-averaged noise spectrum

taken from parts of

the recording where only noise is present. The noise estimate is given by:

where

is the amplitude spectrum of the i-th of the K frames of noise. Noise estimate in

k-th frame may be obtained by filtering the noise using first-order low-pass filter:

where

is the smoothed noise estimate in i-th frame, n is the filtering coefficient (0.5

n 0.9, some authors use values 0.8 n 0.95). To obtain the noise estimate, containing only
noise that precedes the part containing speech signal should be analysed (the length of the
analysed fragment should be at least 300 ms). To achieve this, additional speech detector has to
be used.
The spectral subtraction error may be defined as:

This error degrades the signal quality, introducing the distortion known as residual noise or
musical noise. The error is a function of expected

or average

noise

spectrum estimate:

.
Therefore, the longer noise section is used in analysis, the more accurate the noise estimate is.
The signal-to-noise ratio may be defined in frequency domain as SNR a priori (for clean signal)
or SNR a posteriori (for noisy signal). SNR in k-th frame is given by:

During the restoration process, the clean signal is not known, hence the SNR a priori value has to
be estimated. Using the Gaussian model, optimal SNR in k-th frame may be defined as:

where P(x) is:

is the variance of the noise spectrum in the previous frame,

is estimate of

the restored signal and is constant (0.9 < < 0.98). The variance is usually replaced by spectral
power of noise estimate:

3.Inverse short time fourier transform:


Inverse short time fourier transform is found from the short time fourier
transform.Further explaining the process of inverse short time fourier transform we have to
explain about the process of short time fourier transform.
The short-time Fourier transform (STFT) of a signal consists of the Fourier transform of
overlapping windowed blocks of the signal. In this note, we assume the overlapping is by 50%
and we derive the perfect reconstruction condition for the window function, denoted w(n).Here
hamming window is used. The length of the hamming window is 1367.

By applying the given values we will find the w(n).


The m-th windowed block of the signal x(n) is given by x(n) w(n m N/2).:
s(m, n) := x(n) w(n m N/2)

where m = 0, . . . , 4.

The short-time Fourier transform is obtained by taking the DTFT of each windowed block:
S(m, ) := DTFT{x(n) w(n m N/2)}
The short-time Fourier transform of a discrete-time signal x(n) is denoted by
S(m, ) = STFT{x(n)}.
The inverse STFT begins with the inverse DTFT of S(m, ) to recover s(m, n).
s(m, n) = DTFT1{S(m, )}
Now, from s(m, n) we wish to recover x(n) by multiplying each s(m, n) by the shifted window
w(n m N/2) and adding the results. We will use the same window used in the forward STFT.
Multiplying the m-th windowed block by the shifted window gives:
s(m, n) w(n m N/2)
s(m, n) for m = 0, . . . , 4.
The next step of the inverse STFT adds these overlapping blocks to obtain the final signal y(n):
y(n) = s(m, n) w(n m N/2)
We have called this the inverse STFT, however, it is only an inverse if y(n) = x(n), which in
turn depends on the window w(n). If the window is not chosen correctly, then the reconstructed
signal y(n) will not be equal to the original signal x(n).

Result analysis

Optimal output

Calculated output

SNR

SNR

0.51

Overlap

Lambda

0.12

0.41

0.51

0.46

0.3

0.51

0.50

0.3

1.75

0.51

0.43

0.3

2.25

Here 4 variables are used as parameters of STFT. The variables are alpha, beta1, beta2, and
lambda. If we change coefficients of the above parameters (lambda must be low), SNR of the
calculated output will increase. From the above tabular column, 3rd case gives the somewhat
SNR with optimal output. But, unfortunately it doesnt show the clear output. In final case gives
0.43 SNR. This value is much lower than the previous value. But this SNR value gives the clear
form of the signal.
Conclusion:
To conclude that, the white noise has been removed by using time-frequency spectral subtraction
technique. Compared to the previous technique, this technique is more efficient. The output of
speech signal has more clarity as compared to the output of the technique used in previous
methods. This technique will be more suitable for removing the white noise in any form of
signal.
Additional Thoughts
This project has some disadvantages. If we use this technique, musical noise will be created. In
future, for the purpose of removing the musical noise, block thresholding algorithm will be used.
In addition to this, the quality of the signal is not high even the SNR ratio is high. Also this
technique having some data loss. By this we confirmed that there is no need of signal must be
clear when the SNR ratio is higher than anything.
We thought this time-frequency spectral subtraction is better than wavelet algorithm. In wavelet
algorithm, thresholding is used to remove the white noise, but in spectral subtraction technique
there is no need of thresholding to remove white noise. Wavelet algorithm uses the additional
noise in order to remove the noise, but in spectral subtraction we dont need to add noises to the
signal.

References
[1] Steven F. Boll, "Suppression of Acoustic Noise in Speech Using Spectral Subtraction", IEEE
Transactions on Signal Processing, 27(2), pp 113-120, 1979
[2] Y. Ephraim and D. Malah, Speech enhancement using a minimum mean square error shorttime spectral amplitude estimator, IEEE. Transactions in Acoustic., Speech, Signal Process.,
vol. 32, no. 6, pp. 11091121, Dec. 1984.
[3]NOISE:-http://en.wikipedia.org/wiki/Noise_%28audio%29
[4]G. Yu, S. Mallet, E. Bacry, "Audio Denoising by Time-Frequency Block Thresholding",
IEEE Trans. on Signal Processing, vol 56, no. 5, pp. 1830-1839, May 2009.

APPENDIX:
Clc
close all
clear all
fprintf('NOISE MINIMIZATION');
%loading the noisy sound
[y,Fe]=wavread('noi_b.wav');
%optimal output to compare the result
X=wavread('ori.wav');
%declaring the parameters to calaculate STFT
tmp_SNR=1;
a=0.2;
b1=0.5;
b2=0.5;
l=2.25;
%declaring the window and window length to find STFT
NFFT=1024;%number of FFT points
wl=round(0.031*Fe); %window length
window=hamming(wl);%type of window
window = window(:);
overlap=floor(0.3*wl); %number of windows samples without overlapping

%time to listen noise


mn=0.5;
mx=1.00;
% spectrogram computation
[S,F,T] = spectrogram(y+i*eps,window,wl-overlap,NFFT,Fe);
[Nf,Nw]=size(S);
%noisy spectrum extraction
t_index=find(T>mn & T<mx);
absS_noisy=abs(S(:,t_index)).^2;
ns=mean(absS_noisy,2); %spectrum of the noisy signal
nsm=repmat(ns,1,Nw);
% Estimating SNR to compute the attenuation map
[absS,SNR_est]=est_snr(S,nsm,tmp_SNR,a);
%Compute attenuation map
att=max((1-l*((1./(SNR_est+1)).^b1)).^b2,0);
%computation of STFT
STFT=att.*S;
%Compute Inverse STFT
ind=mod((1:wl)-1,Nf)+1;
denoisedsignal=zeros((Nw-1)*overlap+wl,1);
for indice=1:Nw %Overlapp add technique

left_index=((indice-1)*overlap) ;
index=left_index+[1:wl];
temp_ifft=real(ifft(STFT(:,indice),NFFT));
denoisedsignal(index)= denoisedsignal(index)+temp_ifft(ind).*window;
end
% Display Figure
%show temporal signals
figure
subplot(2,1,1)
t_index=find(T>mn & T<mx);
plot([1:length(y)]/Fe,y);
xlabel('Time (s)');
ylabel('Amplitude');
hold on;
noise_interval=floor([T(t_index(1))*Fe:T(t_index(end))*Fe]);
plot(noise_interval/Fe,y(noise_interval),'r');
hold off;
legend('Original signal','time to listen noise');
title('Original Sound');
%show denoised signal
subplot(2,1,2)

plot([1:length(denoisedsignal)]/Fe,denoisedsignal );
xlabel('Time (s)');
ylabel('Amplitude');
title('Sound without vuvuzela');
%spectrogram of noisy signal
t_epsilon=0.001;
figure
PF=max(S(1:length(F)/2,:),t_epsilon);
pcolor(T,F(1:end/2),10*log10(abs(PF)));
shading flat;
colormap('hot');
title('Spectrogram: noisy signal');
xlabel('T (s)');
ylabel('F (Hz)');
%spectrogram of the denoised signal i.e output_signal
figure
PF=max(STFT(1:length(F)/2,:),t_epsilon);
pcolor(T,F(1:end/2),10*log10(abs(PF)));
shading interp;
colormap('hot');
title('Spectrogram: De-noised signal');

xlabel('T (s)');
ylabel('F (Hz)');
%writting the denoised signal
fprintf('\n\nthe denoised signal written as denoisedsignal.wav\n');
wavwrite(denoisedsignal,Fe,'denoisedsignal.wav');
%calculating the SNR ratio
%SNR of the optimal output
tmp = sprintf('\nThe SNR of the optimal output%.2f', pwr(y,X));
disp(tmp)
%SNR of the calculated output
tmp = sprintf('\nThe SNR of the calculated output %.2f', pwr(y,denoisedsignal));
disp(tmp)
est_snr FUCTION
function [A,B]=est_snr(S,nsm,tmp,a)
A=abs(S).^2;
B=max((A./nsm)-1,0);
if tmp==1
B=filter((1-a),[1 -a],B);
end
PWR FUNCTION
function snr=pwr(x,y)

power = (norm(x)^2)/length(x);
power1=(norm(y)^2)/length(y);
snr=power1/power;

Vous aimerez peut-être aussi