Académique Documents
Professionnel Documents
Culture Documents
Spring 2004-2005
Homework #5
Sinusoidal Modelling, Additive Synthesis, and Noise Reduction
75 points
Due in one week (5/10/2005)
(a) Download and run the Matlab program make sdif file.m 1 which shows how to
make an SDIF2 file in Matlab. It uses IRCAM’s3 SDIF Extensions for Matlab4
(which should already be installed at CCRMA) to write a Matlab cell array to
an SDIF file. (Later in this homework you will write SDIF files containing the
results of a sinusoidal analysis of an input sound. For now, the parameters in the
SDIF file are just some made-up numbers so that you can see how SDIF works.)
(b) Use the Unix command-line utility spew-sdif to print the contents of the SDIF
file. This should be installed in /usr/ccrma/bin, which should be part of your
Unix path by default.
(c) Download the Matlab program additive synth.m5 and use it to synthesize the
SDIF file you just created. Listen to the resulting sound.
(d) Changing only the index numbers (i.e., the integers in the first columns of the
matrices frame1, frame2, etc), modify make sdif file.m so that it produces an
SDIF file that sounds noticeably different when you synthesize it.
2. (5 pts) Look at the function ignore phase synth in additive synth.m, especially
at the use of the variables dp and ddp, which stand for “difference in phase” and
“difference in difference in phase” respectively.
One might think that this is a needlessly low-level style of programming. Based on the
equations for a sum of sinusoids6 it seems cleaner to treat the frequency interpolation
just like the amplitude interpolation, like this:
1
t = 0;
for i = fromSample+1:toSample
output(i) = output(i) + (a * sin(f * t + phases(index)));
f = f + df;
a = a + da;
t = t + T;
end
3. (20 pts) Write a matlab program to perform sinusoidal analysis on an input sound and
write the result to an SDIF file. You should use the findpeaks function from HW#3
or elsewhere. The parameters of interest are the amplitudes and frequencies of the
sinusoidal components. (Don’t worry about phase.) When matching up spectral peaks
in adjacent frames, choose the solution that minimizes frequency deviation from one
frame to the next.
Your program should take the following arguments:
Download the sound file wrenpn1.wav7 which contains the sound of a bird chirping
with noise embedded. Test your analysis program on this sound file using the following
parameters:
2
Spectrogram of wrenpn1.wav
10000
8000
Frequency
6000
4000
2000
0
0 0.5 1 1.5 2
Time
10000
8000
Frequency
6000
4000
2000
0
0 0.5 1 1.5 2
Time
Figure 1: Spectrogram of the original and resynthesized birdsong using one peak (high SNR)
Turn in your analysis code and your final denoised soundfile. You can email them to
the TA, or create a temporary webpage for the TA to view.
Solution: (20 pts) The code is on additivesynth.m8 and hw5chirp.m9 which
use the function findpeaks2.m10 . The sinusoidal modeled birdsong output is in
wrenout1.wav11 .
5. (5 pts) At the time half way through, plot the following spectral slices overlaid on a
dB scale (i.e., just plot the spectral magnitude at that time):
Solution: (10 pts) At approximately half way through (time t=1.1428 sec), the
overlaid spectral slices (the magnitude of DFT of the frame near this time instant) are
shown in Figure 2. The frame used in both cases are time-aligned (see code). Note
that there is only one prominent peak so we can set the number of peak in the bird-
chirp sinusoid modeling to be one. The result of using just one peak is arguably the
8
http://ccrma.stanford.edu/˜jos/hw421/hw5sol/additivesynth.m
9
http://ccrma.stanford.edu/˜jos/hw421/hw5sol/hw5chirp.m
10
http://ccrma.stanford.edu/˜jos/hw421/hw5sol/findpeaks2.m
11
http://ccrma.stanford.edu/˜jos/hw421/hw5sol/wrenout1.wav
3
Spectral slice (Magnitude DFT) midway through (frame 230) (wrenpn1.wav)
30
original
resynthesized
20
10
dB
−10
−20
−30
−40
−50
0 2000 4000 6000 8000 10000
Hz
Figure 2: Spectral slices of the original and resynthesized birdsong overlaid at time midway
through using one peak (high SNR)
best from listening. Though generally, using more peaks will represent the signal more
faithfully, in this case (specific kind of bird), it is more than necessary. If more than
two peaks are used, we will actually start to hear some artifacts.
6. (10 pts) Repeat the previous two problems for the sound file wrenpn2.wav 12 in which
the signal to noise ratio is only 0 dB.
Solution: (10 pts) Now the SNR is too low for reliable sinusoid peak tracking. The
result is far worse than the low noise case. The spectrogram and the spectral slices are
shown in Figure 3 and 5 respectively. In particular, the spectral slice of the original
shows that the spurious peaks corresponding to noise are as high as the sinusoid itself
though in this frame, the peak detection still gives the correct one, we might not be as
lucky in other frames.
7. (5 pts) What is the limitation of this noise reduction technique? Explain in relation
to your results obtained earlier.
Solution: (5 pts) The limitation to this noise reduction technique is that it will not
work when the signal-to-noise ratio (SNR) is too low. This is a thresholding effect
associated with any non-linear estimator such as the peak finding used here. The
reason is, findpeaks will return the peak(s) corresponding to the noise rather than the
sinusoid at low SNR. This is evident from wrenpn2.wav where SNR is only 0dB even
when only one peak is used to model the signal. Also, it will not work well with other
sound source which consist of many more partials than the birdsong, having some of
the sinusoidal peaks lower than the noise floor, for example.
8. (10 pts) For this problem you will write two general-purpose programs that transform
12
http://ccrma.stanford.edu/˜jos/hw421/hw5/wrenpn2.wav
4
Original (wrenpn2)
10000
8000
Frequency
6000
4000
2000
0
0 0.5 1 1.5 2
Time
Resynthesized (wrenpn2)
10000
8000
Frequency
6000
4000
2000
0
0 0.5 1 1.5 2
Time
Figure 3: Spectrogram of the original and resynthesized birdsong overlaid at time midway
through using one peak (low SNR)
10
0
dB
−5
−10
−15
−20
−25
0 2000 4000 6000 8000 10000
Hz
Figure 4: Spectral slices of the original and resynthesized birdsong overlaid at time midway
through using one peak (low SNR)
5
Pitch scaled by 1/4
10000
8000
Frequency
6000
4000
2000
0
0 0.5 1 1.5 2
Time
Time stretched by 2
10000
8000
Frequency
6000
4000
2000
0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
Time
sinusoidal models stored in SDIF files. Each program should read in an input SDIF
file and write the result to an output SDIF file. You should then run the output SDIF
file through the additive synthesizer to hear the result.
Extra credit: Instead of having the frequency and time scale factors be constants, allow
them to be functions of time. For example, you should be able to decrease the pitch of
the birdsong by a factor of 2 initially, then have it keep getting lower over the course
of the SDIF file, to an ending pitch factor of 6.
Solution:
(a) (5 pts) To do it see the solution code hw5chirp.m. Basically, all frequency esti-
mates are divided by 4 (see code). The output soundfile is wrenout2.wav 13 .
(b) (5 pts) Also see hw5chirp.m. The length used to interpolate the instantaneous
amplitudes and frequencies is increased by a factor of 2 (see code). The output
soundfile is wrenout3.wav14 .
13
http://ccrma.stanford.edu/˜jos/hw421/hw5sol/wrenout2.wav
14
http://ccrma.stanford.edu/˜jos/hw421/hw5sol/wrenout3.wav
6
9. (5 pts) Download the sound file peaches.wav.15 Repeat the analysis and synthesis
processes in the first problem. Tracking three peaks through time is enough for starters.
Describe your result and compare its quality to that of the birdsong case. How many
partials (peaks) are needed to make the speech intelligible?
Solution: (5 pts) The number of partials needed to make the speech intelligible is
only 2 or 3. The algorithm used with the birdsong earlier does not work well with a
human speech because
(a) The spectrum of the speech is more complex. We need a lot more number of
sinusoids to model it. Even with a large number is used, simple tracking used
with the birdsong will not give correct trajectories.
(b) Human speech has the noise and transient components which are not well-modeled
by the sinusoid modeling alone.
15
http://ccrma.stanford.edu/˜jos/hw421/hw5/peaches.wav