Spectral Modeling and Signal Processing Intro421

MUS421/EE367B Lecture 1
Introduction and Overview
Julius O. Smith III (jos@ccrma.stanford.edu)

Center for Computer Research in Music and Acoustics (CCRMA)
Department of Music, Stanford University
Stanford, California 94305
April 10, 2007
Course Overview
• The focus of this course is spectral modeling and

signal processing using the short-time Fourier
transform (STFT).
• Applications include musical sound synthesis and
audio signal processing.
1
Administrative Information
Units
You may sign up for either 3 or 4 units:
• 3 units = lectures + assignments + final
• 4 units adds a final project based on outside reading
and/or a software project
Important Pointers
• The course schedule and outline1 (reachable from the
class home page2) lists the following information:
– Assignments
– Weekly class schedule
– Pointers to all lecture overheads
• The 421 home page further contains pointers to
– Programming examples
– Sound examples
– Related items of interest online
• The MUS421/EE367B Overview3 contains this
administrative info and more.
1
http://ccrma.stanford.edu/~jos/intro421/Schedule_Assignments.html
2
http://ccrma.stanford.edu/CCRMA/Courses/421/
3
http://ccrma.stanford.edu/~jos/intro421/
2
Why The Fourier Transform
• Natural for visualizing audio signals:

The ear performs a kind of Fourier analysis
• Spectral models can be very compact and flexible:
– MPEG audio coding
– Sinusoidal modeling (“additive synthesis”)
∗ AES talk4 on history of spectral modeling at
CCRMA and elsewhere.
• Any Linear Time Invariant (LTI) system can be
implemented in the frequency domain by means of
the Fourier Transform
• Efficient FFT implementations exist which make it
possible to implement very large LTI systems in real
time, e.g., room impulse-response convolutions of
length 10,000 to 100,000
4
http://ccrma.stanford.edu/~jos/pdf/AES-Heyser.pdf
3
Applications of the
Short-Time Fourier Transform (STFT)
• Frequency-domain display of audio signals

• Fast convolution
• Time-varying linear filtering
• Time-varying nonlinear filtering
• Fourier analysis, modification, and resynthesis
• Musical sound synthesis via spectral modeling:
– Additive synthesis using sinusoids
– Sines + Noise modeling
– Sines + Noise + Transients modeling
• Speech synthesis
• Vocoders
• Time scaling
• Pitch shifting
• Pitch Detection
• Noise reduction
• Audio compression
4
Applications, Cont’d
• Transform coders for audio compression, such as

– MPEG AAC (10X common)
– “MP3” (MPEG-II, Layer III — about half the
quality of AAC)
– Dolby AC-2 and AC-3 (6X, fixed)
– Philips DCC, Sony Minidisc (4 or 5X))
Music 422 (EE 367C) is an entire CCRMA course
devoted to this topic. It is offered only in Winter
quarters.
• Perfect Reconstruction Filter banks
• Computational Auditory Modeling
• Filter Design and Implementation
• System Identification
• Signal Separation (Auditory Scene Analysis)
• Non-Parametric Spectral Estimation
5
Emerging Applications in Audio Coding
MPEG-4 supports an “object oriented” approach to

compression algorithms:
• First transmit
objects = synthesizer patches = “decoder”
• Next transmit
messages = performance data (like MIDI) =
“encoded bit stream”
• Main challenge: to develop classifiers and coders for
general purpose audio
Final Project (4th Unit)

• Individual or group research project
• Must be related to lecture/lab topics (FFT+Audio)
• One-page project proposal due by 4th class meeting
• Final written report due by the end of finals week
• Oral presentation during the last class invited!
• Project Types
– Programming project and report
6
– Reading and report
• Suggested Outside Reading: See
http://ccrma.stanford.edu/~jos/refs421/
(also available in Appendix P of the text.5)
• Example Project Topics
– Windows
∗ New FFT window types
∗ Explore window types not covered in class
– Spectrum Analysis
∗ Short-time spectrum analysis of recorded data
∗ Study of statistical spectrum estimation
∗ Matching spectrum analysis parameters to
human hearing
∗ Alternative time-frequency representations
(Wavelets, Wigner, ...)
– Sinusoidal Modeling
∗ Readings in additive synthesis
∗ Implement your own sines+noise
analysis/synthesis system
∗ Noise reduction based on sinusoidal modeling
∗ Source separation based on sinusoidal modeling
5
http://ccrma.stanford.edu/~jos/sasp/
7
∗ Transient detection (for “sines + noise +
transients” modeling)
– Short-Time Fourier Transform (STFT) based
Analysis, Modification, and Resynthesis
∗ Software system development
∗ Noise reduction
∗ Pitch detection/tracking
∗ Time compression/expansion
∗ Transform coding
∗ Pitch-synchronous phase vocoder (adapt
window to pitch period)
∗ Signal reconstruction from the STFT magnitude
only (phase discarded)
∗ Spectral interpolation schemes to compensate
for lost frames
– Software Development
∗ Modify course Matlab examples to be
compatible with Octave
(See http://www.octave.org/.)
∗ Develop missing components of the Signal
Processing Toolbox for Octave (looking only at
Matlab help info).
∗ PD FFT-processing patches (see
http://www.pure-data.org/)
8
∗ LADSPA spectral-processing plug-ins (see
http://www.ladspa.org/
9
Main Pointer
The course schedule and outline6 (reachable from the

class home page7) lists the following information:
• Assignments
• Weekly class schedule
• Pointers to all lecture overheads
6
http://ccrma.stanford.edu/~jos/intro421/Schedule_Assignments.html
7
http://ccrma.stanford.edu/CCRMA/Courses/421/
10
Notation
Before we begin, we will review the notation we will use

in this class. (We will try to stay consistent.)
Frequency and Time:
• ω denotes continuous radian frequency (rad/sec)

• f denotes continuous frequency in Hertz (Hz)
• ω = 2πf
• ωk denotes discrete frequency, ωk = 2π(k/N )fs
• ω, ωk ∈ R (frequencies are always real)
• T = sampling interval (sec)
1
• fs = sampling rate, fs = T
• tn = nT (discrete time)
• n, k ∈ Z (integers)
• t, tn ∈ R (times are always real)
11
Introduction to Audio Spectrum Analysis
Spectrum analysis of real-world signals typically occurs in

short segments. We are therefore most interested in
short-time spectrum analysis:
• Spectral content typically varies over time.

• The human ear uses less than one second of past
sound to form a spectrum.
• There is a limit to the length of signal we can analyze
at once.
To extract and analyze a sound segment, it is necessary

to apply a window function. An unmodified segment
extraction corresponds to a “rectangular window”.
Everything we ‘look at’ will be through a ‘window’, hence
it is important to realize what the window is doing to our
underlying signal.
Applications we’ll discuss first:
• Spectral Analysis for Display

• FIR Filter Design
12
Example of Windowing
Lets look at a simple example of windowing to

demonstrate what happens when we turn an infinite
duration signal into a finite duration signal through
windowing.
Complex Sinusoid:
x(n) = ejωnT , 0 ≤ ωT < π
Notes:
• real part = cos(ωnT )

• The frequencies present in our signal are only positive.
A fancy name for this is an ‘analytic signal’
This signal is infinite duration. (It doesn’t die out as n

increases.) In order to end up with a signal which dies
out eventually (so we can use the DFT), we need to
multiply our signal by a window (which does die out).
13
The following is a diagram of a typical window function:
Zero−Phase Window
1
0.9
0.8
0.7
0.6
Amplitude
0.5
0.4
0.3
0.2
0.1
0
−1000 −800 −600 −400 −200 0 200 400 600 800 1000
Time (samples)
This is loosely called a “zero-centered” (or “zero phase”,

or “even”) window function, which means its phase in the
frequency domain is either zero or π, as we will see in
detail later. (Recall that a real and even function has a
real and even Fourier transform.) The window is also
nonnegative, as is typical.
14
We might also require that our window be zero for
negative time. Such a window is said to be ‘causal’.
Causal windows are necessary for real-time processing:
Linear Phase Window (Causal)
1
0.9
0.8
0.7
0.6
Amplitude
0.5
0.4
0.3
0.2
0.1
0
−1000 −800 −600 −400 −200 0 200 400 600 800 1000
Time (samples)
By shifting the original window in time by half its length,

we have turned the original non-causal window into a
causal window. The Shift property of the Fourier
Transform tells us that we have introduced a linear phase
term.
15
Putting all this together, we get the following:
Our original signal (unwindowed, infinite duration), is
x(n) = ejω0nT , n ∈ Z
A portion of the real part, cos(ω0nT ), is plotted below:
1
0.8
0.6
0.4
0.2
Amplitude
−0.2
−0.4
−0.6
−0.8
−1
−2000 −1500 −1000 −500 0 500 1000 1500 2000
Time (samples)
The imaginary part, sin(ω0nT ), is of course identical but

for a 90-degree phase-shift to the right.
16
The Fourier Transform of this infinite duration signal is a
delta function at ω0: X(ω) = δ(ω − ω0)
δ(ω − ω0)
0 ω0 ω
The windowed version is: xw (n) = w(n)e−jω0nT n ∈ Z

(Note carefully the difference between w and ω.)
17
1
0.8
0.6
0.4
0.2
Amplitude
−0.2
−0.4
−0.6
−0.8
−1
−2500 −2000 −1500 −1000 −500 0 500 1000 1500 2000 2500
Time (samples)
The Convolution Theorem tells us that our multiplication

in the time domain results in a convolution in the
frequency domain. Hence, in our case, we will obtain the
convolution of a delta function at frequency ω0, and the
transform of the window.
The result of convolution with a delta function is the
original function, shifted to the location of the delta
function. (The delta function is the identity element for
convolution.)
18
0
−5
main lobe
−10
−15
−20
Amplitude − dB
−25
−30
−35
sidelobes
−40
−45
−50
−3 −2 −1 0
ω0 1 2 3
ωT
19
Summary
• Windowing in the time domain resulted in a

‘smearing’ or ‘smoothing’ in the frequency domain.
We need to be aware of this if we are trying to resolve
sinusoids which are close together in frequency.
• Windowing also introduced side lobes.
This is important when we are trying to resolve low
amplitude sinusoids in the presence of higher
amplitude signals. When we look at specific windows,
we will be looking at this behavior.
• A sinusoid at amplitude A, frequency ω0, and phase φ
becomes a window transform shifted out to frequency
ω0, and scaled by Aejφ.
There are many type of windows which serve various

purposes and exhibit various properties.
20
The Rectangular Window
The rectangular window may be defined as:

(
∆ 1, |n| ≤ M2−1
wR (n) =
0, otherwise
Zero−Phase Rectangular Window − M = 21
0.8
Amplitude
0.6
0.4
0.2
0
−20 −15 −10 −5 0 5 10 15 20
Time (samples)
• “Zero centered” definition (even in time domain)

• Need M odd in zero-centered case
• Scale window by 1/M to obtain unity dc gain
21
To see what happens in the frequency domain, we need
to look at the DTFT of the window:
∞
∆
X
WR (ω) = DTFTω (wR ) = wR(n)e−jωn
n=−∞
M −1
2 M −1 M +1
X ejω 2 − e−jω 2
= e−jωn =
1 − e−jω
n=− M2−1
where we used the closed form of a geometric series:

U
X
n rL − rU +1
r =
1−r
n=L
We can factor out linear phase terms from the numerator

and denominator of the above expression to get
1
" M M
#
e−jω 2 ejω 2 − e−jω 2
WR (ω) = 1 1 1
e−jω 2 ejω 2 − e−jω 2
ω

sin M 2 ∆
= ω
= M · asincM (ω)
sin 2
where asincM (ω) denotes the aliased sinc function.
sin(M ω/2)
∆
asincM (ω) =
M · sin(ω/2)
(also called the Dirichlet function)
22
Rectangular Window Transform (Cont’d)
Above, we found the rectangular window transform to be

the aliased sinc function:
ω

∆ sin M 2
WR (ω) = M · asincM (ω) = ω

sin 2
DFT of a Rectangular Window of length M = 11
12
Amp
10
6
Complex Amplitude
−2
−4
−6 −4 −2 0 2 4 6
freq
This (real) result is for the zero-centered rectangular

window. For the causal case, a linear phase term appears:
M −1 ω
WRc (ω) = e−j 2 M asincM (ω)
As the sampling rate goes to infinity, the aliased sinc

function approaches the regular sinc function
∆ sin(πx)
sinc(x) =
πx
23
More generally, we may plot both the magnitude and
phase of the window versus frequency:
Magnitude of Rectangular Window Transform (M = 11)
1
0.9
0.8
0.7
Magnitude (Linear)
0.6
0.5
0.4
0.3
0.2
0.1
0
−6 −4 −2 0 2 4 6
ω/ΩM
Phase of Rectangular Window Transform (M = 11)

3.5343
3.1416
2.7489
2.3562
1.9635
Phase
1.5708
1.1781
0.7854
0.3927
Main Lobe
0
−0.3927
−6 −4 −2 0 2 4 6
ω/ΩM
24
In audio work, we more typically plot the window
transform magnitude on a decibel (dB) scale:
DFT of a Rectangular Window − M = 11
0
−5
main lobe
−10
13 dB down
−15
Magnitude (dB)
sidelobes sidelobes
−20
−25
−30
−35
nulls nulls
−40
−3 −2 −1 0 1 2 3
Normalized Frequency ω (rad/sample)
25
Since the DTFT of the rectangular window approximates
the sinc function, it should “roll off” at approximately 6
dB per octave, as verified in the log-log plot below:
DFT of a Rectangular Window − M = 20
0
−6.0206
Ideal −6 dB per octave line
−12.0412
−18.0618 Partial
Main
−24.0824 Lobe
Amplitude (dB)
−30.103
−36.1236
−42.1442
−48.1648
−54.1854
0.1 0.2 0.4 0.8 1.6 3.2 6.4

Normalized Frequency (rad/sample)
As the sampling rate approaches infinity, the rectangular

window transform (asinc) converges exactly to the sinc
function. Therefore, the departure of the roll-off from
that of the sinc function can be ascribed to aliasing in the
frequency domain, due to sampling in the time domain.
26
Sidelobe Roll-Off Rate
In general, if the first n derivatives of a continuous

function w(t) exist (i.e., they are finite and uniquely
defined), then its Fourier Transform magnitude is
asymptotically proportional to
constant
|W (ω)| → n+1
(as ω → ∞)
ω
Proof: Look up “roll-off rate” in text index.
Thus, we have the following rule-of-thumb:
n derivatives ←→ −6(n + 1) dB per octave roll-off rate
(since −20 log10(2) = 6.0205999 . . .).
This is also −20(n + 1) dB per decade.
To apply this result, we normally only need to look at the
window’s endpoints. The interior of the window is usually
differentiable of all orders.
Examples:
• Amplitude discontinuity ←→ −6 dB/octave roll-off
• Slope discontinuity ←→ −12 dB/octave roll-off
• Curvature discontinuity ←→ −18 dB/octave roll-off
For discrete-time windows, the roll-off rate slows down at
high frequencies due to aliasing.
27
In summary, the DTFT of the M -sample rectangular
window is proportional to the ‘aliased sinc function’:
∆ sin(ωM T /2)
asincM (ωT ) =
sin(ωT /2)
sin(πf M T ) ∆
≈ = M sinc(f M T )
πf T
Some important points:
∆ 2π
• Zero crossings at integer multiples of ΩM = M
∆
where ΩM = 2πM = frequency sampling interval used
by a length M DFT
4π
• Main lobe width is 2ΩM = M
• As M gets bigger, the mainlobe narrows
(better frequency resolution)
• M has no effect on the height of the side lobes
(Same as the “Gibbs phenomenon” for Fourier series)
• First sidelobe only 13 dB down from main-lobe peak
• Side lobes roll off at approximately 6dB per octave
• A phase term arises when we shift the window to
make it causal, while the window transform is real in
the zero-centered case (i.e., when the window w(n) is
an even function of n)
28
Frequency Resolution
The next series of plots shows the effect that an increased
window length has on our ability to resolve 2 sinusoids.
Two Cosines (“In-Phase” Case)

2π
• 2 cosines separated by ∆ω = 40
• Rectangular Windows of lengths: 20, 30, 40, 80
(∆ω = 12 ΩM , 34 ΩM , ΩM , 2ΩM )
M = 20 M = 30
0.7 0.7
0.6 0.6
0.5 0.5
Magnitude
Magnitude
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
0 1 2 3 0 1 2 3
Frequency ωT (rad/sample) Frequency ωT (rad/sample)
M = 40 M = 80
0.7 0.7
0.6 0.6
0.5 0.5
Magnitude
Magnitude
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
0 1 2 3 0 1 2 3
29
One Sine and One Cosine
(“Phase Quadrature” Case)
• As above, but 1 sine and 1 cosine

• Note: least-resolved case appears resolved!
• Note: M = 40 case suddenly looks much worse
• Only the M = 80 case looks good at all phases
M = 20 M = 30
0.7 0.7
0.6 0.6
0.5 0.5
Magnitude
Magnitude
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
0 1 2 3 0 1 2 3
M = 40 M = 80
0.7 0.7
0.6 0.6
0.5 0.5
Magnitude
Magnitude
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
0 1 2 3 0 1 2 3
30
One Sine and One Cosine
(“Phase Quadrature” Case)
All Four Resolutions Overlaid
• Same plots as on previous page, just overlaid

• Peak locations are biased in under-resolved cases,
both in amplitude and frequency
0.5
M=20
M=30
0.45 M=40
M=80
0.4
0.35
0.3
Magnitude
0.25
0.2
0.15
0.1
0.05
0
0 0.5 1 1.5 2 2.5 3
Frequency ωT (rad/sample)
The preceding figures suggest that, for a rectangular

window of length M , two sinusoids can be most easily
31
resolved when they are separated in frequency by

∆ 2π
∆ω ≥ 2ΩM ΩM =
M
This implies there must be at least two full cycles of the
difference-frequency under the window. (We’ll see later
that this is an overly conservative requirement—a more
careful study reveals that 1.44 cycles is sufficient for the
rectangular window.)
In principle, arbitrarily small frequency separations can be
resolved if
• there is no noise, and

• we are sure we are looking at the sum of two ideal
sinusoids under the window.
However, in practice, there is almost always some noise

and/or interference, so we prefer to require sinusoidal
frequency separation by at least one main-lobe width (of
the sinc-function in this case, or the window transform
more generally) whenever possible.
The rectangular window provides an abrupt transition at
its edge. We will later look at some other windows which
have a more gradual transition. This is usually done to
reduce the height of the side lobes.
32
Resolution Bandwidth (resolving sinusoids)
Our ability to resolve two closely spaced sinusoids is
determined primarily by the main lobe width of the
Fourier transform of the window we are using.
Let Bw denote the main lobe width in Hz, with the main
lobe width defined as the width between zero crossings:
6 WR (ω)
5 Μ=7
2
2πBw
1
0
−3Ω −2Ω −Ω Ω 2Ω 3Ω ω
-1
-2
-4 -3 -2 -1 0 1 2 3 4
For the Rectangular Window (length M ), we have

∆ sin (M ωT /2) sin (M πf T )
WR(ω) = asincM (ω) = =
sin(ωT /2) sin(πf T )
Main lobe width is “two sidelobes wide”
ΩM fs
⇒ Bw = 2 =2 (Hz)
2π M
33
Choosing Window Length to Resolve Sinusoids
A conservative requirement for resolving 2 sinusoids (in

noisy conditions) with a spacing of ∆Hz is to choose a
window length M long enough so that their main lobes
are clearly discernible. For example, we may require that
their main lobes meet at the first zero crossings.
Closely spaced sinusoids - Rectangular window - M = 21

1
0.9
0.8
∆
0.7
0.6
Amplitude
0.5
0.4
0.3
0.2
0.1
0
0 0.5 1 1.5 2 2.5 3
Bw Bw ωT (radians per sample)
To obtain the separation shown above, we must have

Bw ≤ ∆, where Bw is the main lobe width in Hz, and ∆
is the sinusoidal frequency separation in Hz.
34
For the rectangular window, Bw can be expressed as
fs
Bw = 2
M
Hence we need:
fs
Bw = 2 ≤∆
M
fs
⇒ M ≥2
∆
or
fs
M ≥2
|f2 − f1|
Thus, to resolve the frequencies f1 and f2 under a
rectangular window, it is sufficient for the window length
M to span at least 2 periods of the difference frequency
f2 − f1, measured in samples, where 2 is the width of the
main lobe, measured in sidelobe-widths.
A rectangular window of length or greater is said to
resolve the sinusoidal frequencies f1 and f2.
35

Spectral Modeling and Signal Processing Intro421

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Spectral Modeling and Signal Processing Intro421

Transféré par

Droits d'auteur :

Formats disponibles

MUS421/EE367B Lecture 1

Introduction and Overview

Julius O. Smith III (jos@ccrma.stanford.edu)

April 10, 2007

• The focus of this course is spectral modeling and

• Natural for visualizing audio signals:

• Frequency-domain display of audio signals

• Transform coders for audio compression, such as

MPEG-4 supports an “object oriented” approach to

Final Project (4th Unit)

The course schedule and outline6 (reachable from the

Before we begin, we will review the notation we will use

• ω denotes continuous radian frequency (rad/sec)

Spectrum analysis of real-world signals typically occurs in

• Spectral content typically varies over time.

To extract and analyze a sound segment, it is necessary

• Spectral Analysis for Display

Lets look at a simple example of windowing to

• real part = cos(ωnT )

This signal is infinite duration. (It doesn’t die out as n

This is loosely called a “zero-centered” (or “zero phase”,

By shifting the original window in time by half its length,

The imaginary part, sin(ω0nT ), is of course identical but

The windowed version is: xw (n) = w(n)e−jω0nT n ∈ Z

The Convolution Theorem tells us that our multiplication

• Windowing in the time domain resulted in a

There are many type of windows which serve various

The rectangular window may be defined as:

• “Zero centered” definition (even in time domain)

where we used the closed form of a geometric series:

We can factor out linear phase terms from the numerator

Above, we found the rectangular window transform to be

This (real) result is for the zero-centered rectangular

As the sampling rate goes to infinity, the aliased sinc

Phase of Rectangular Window Transform (M = 11)

0.1 0.2 0.4 0.8 1.6 3.2 6.4

As the sampling rate approaches infinity, the rectangular

In general, if the first n derivatives of a continuous

Two Cosines (“In-Phase” Case)

• As above, but 1 sine and 1 cosine

• Same plots as on previous page, just overlaid

The preceding figures suggest that, for a rectangular

• there is no noise, and

However, in practice, there is almost always some noise

For the Rectangular Window (length M ), we have

A conservative requirement for resolving 2 sinusoids (in

Closely spaced sinusoids - Rectangular window - M = 21

To obtain the separation shown above, we must have

Vous aimerez peut-être aussi