Time Frequency Analysis of Speech Signals

Page 3.
This lecture is concerned with the spectrogram as a

representation of how the frequency components within a
signal vary with time.
Define what we mean by normalised time and
frequency
Define the short-term discrete fourier transform
Look at the effect of different window lengths on time
and frequency resolution
Derive a quantitative description of the frequency
resolution of the short term DFT and compare the
performance of common windows
Derive a quantitative description of the time resolution
of the short term DFT
Explain the uncertainty principle and illustrate its
effects with some examples
Review some of the properties of the DFT.
Time-Frequency Representation
E.4.14 Speech Processing
any time quantity must be multiplied by the real sample

period (or divided by the real sample frequency)
any frequency or angular frequency quantity must be
multiplied by the real sample frequency (or divided by the
real sample period)
With sampled data systems it is customary to change the

time scale and work in units of the sample period.
In normalised units the sampling frequency (fs) and
period (T) both equal 1. They can therefore be omitted
from equations: any such omissions can be deduced
using dimensional consistency arguments.
The nyquist frequency is Hz = radians/second
To convert back to real units:
3.2
TIMEFREQ.PPT(15/04/2002)
Normalised Time
3.1
Lecture 3
Dark areas of spectrogram show high intensity
my speech
Page 3.2
Hz
Spectrogram
The spectrogram shows the energy in a signal at each

frequency and at each time. We calculate this by
evaluating the short-term discrete fourier transform.
dB
3.3
3.4
2 j
k (m i )
i =0
[k=frequency, m=time]
w(i) x(m i)exp
N 1
there are only N+1 independent frequency values
Notes:
|X(k;m)|2 gives the power at a frequency of k/N Hz
(normalized) for a window centred at m-(N-1).
the (m-i) term in the exponent means that the phase
origin remains consistent by cancelling out the linear
phase shift introduced by a delay of m samples.
the window samples are numbered backwards in time
(for convenience later) hence the summation is
performed backwards in time.
The values X(k;m) are based on the N signal values
from mN+1 to m.
the frequency resolution is 1/N Hz (normalised units)
the spectrum is periodic and (since w and x are real)
conjugate symmetric:
X ( k ) = X ( k + N ) = X * ( k )
X ( k ; m) =
For a window of length N ending on sample m we have:
Multiply by a finite length window and take the DFT.
We often want to estimate the power spectrum of a nonstationary signal at a particular instant of time.
Short-Term Discrete Fourier Transform
i =0
5 0 0
i=0
1 0 0 0
in d o w
Page 3.3
Line spectrum from

larynx oscillation (at
about 0.01 normalised
Hz) is superimposed
on vocal tract
resonances.
i=400
1 5
f1
f2
f3
f4
3.5
f5
m=600
/Is/ from my speech
2 j
k (m i )
w(i) x(m i) exp
N 1
Hamming window of length

401 samples centred on 400.
X ( k ; m) =
Short window eliminates the fine detail (N=29):
(N=401)
in d o w
5 0 0
e d
a ta
1 0 0 0
1 5
( 1 8 3 - 4
-7
-6
-5
-4
-3
-2
-1
3.6
Zero padded window gives more spectrum points and an illusion

of more detail (N=232=2916). This is the normal case for a
spectrogram.
Three different Hamming windows:
dB
dB
dB
i =0
2 j
k ( m i )
w(i ) x(m i ) exp
N 1
2j
ym (r ) exp N kr
r =0
N 1
3.7
Page 3.4
y(r) is a product of two signals so its DFT is the convolution of the

DFTs of w(N1r) and x(mN+1+r)
This is a standard DFT multiplied by a phase-shift term that is

proportional to k: this compensates for the starting time of the
window: mN+1
where ym (r ) = w( N 1 r ) x (m N + 1 + r )
2j (m N + 1)
X ( k ; m) = exp
k
By setting r = N 1 i we can rewrite this as:
X ( k ; m) =
Blackman-Harris: 4 term
0.359 + 0.488c1 +
0.141c2 + 0.012c3
a=2.72, b=8
Sidelobe = 92dB
a=1.81, b=6
Sidelobe = 67dB
0.423 + 0.498c1 + 0.079c2
Blackman-Harris: 3 term
Hamming: 0.54 + 0.46c1

a=1.81, b=4
Sidelobe = 43dB
Hanning: 0.5 + 0.5c1

a=1.65, b=4
Sidelobe = 23dB
Rectangular: w( n) 1
a=1.21, b=2
Sidelobe = -13dB
3.8
0
-10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0
-10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0
-10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0
-10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0
-10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
-5
-5
-5
-5
-5
0
S a mple
harris 4
0
S a mple
ha rris 3
0
Sa mple
hamming
0
Sample
ha nning
0
S ample
re ctangle
-80
-0.5
-70
-60
-50
-40
-30
-20
-10
-80
-0.5
-70
-60
-50
-40
-30
-20
-10
-80
-0.5
-70
-60
-50
-40
-30
-20
-10
-80
-0.5
-70
-60
-50
-40
-30
-20
-10
-80
-0.5
-70
-60
-50
0
Norm alis ed Fre que ncy
ha rris 4
0
Norma lis e d Fre que ncy
ha rris 3
0
ha mming
0
ha nning
0
Norm alis e d Freque ncy
re cta ngle
0.5
0.5
0.5
0.5
0.5
10
10
10
10
10
-40
-30
-20
-10
The 6dB and normalised bandwidth for an N-point window =

a/N and b/N respectively. Common windows & values for a & b are
shown below. In the formulae ck = cos(2k (n N ) / N )
dB
dB
dB
dB
dB
N 1
i =0
Page 3.5
Frequency Shift
by k/N
zk
Low-pass filter
(window)
X(k;)
Thus the kth frequency bin is a filtered version of zk in

which the filter has an impulse response of w(i). From the
previous slide, this is a low-pass filter with a 6dB
bandwidth of a/N.
i =0
= w(i ) zk (m i )
N 1
f k/N
2 j
k ( m i )
f k/N
Zk(j):
z k ( r ) = exp (2 j ( f k / N )r )
Now we have: X ( k ; m) = w(i ) x (m i ) exp
X(j):
x ( r ) = exp (2 jfr )
zk(r) is just the same as x(r) but shifted down in frequency

by k/N. E.g. if x is a complex exponential at frequency f
2 j
kr
zk (r ) = x(r ) exp
N
Concentrate on one particular value of k and define:
Time Resolution: Filter-Bank Viewpoint
3.9
3.10
/aI/ from my with 45Hz and 300Hz bandwidth spectrograms
BW = 45 Hz
BW = 300 Hz
NT = 44 ms
NT = 7 ms
Page 3.6
300 Hz gives
finer time
resolution
k/NT = 3.5 kHz
Horizontal slice
through
spectrogram:
45 Hz gives
finer frequency
resolution
mT = 0.1 s
Vertical slice
through
spectrogram:
4000
5000
6000
Amplitude variations with this period will be attenuated

by 6 dB.
-70
0
-65
-60
-55
-50
-45
-40
-35
-80
0
-70
-50
0.05
0.1
0.15
Time (s )
300 Hz, 7 ms
45 Hz, 44 ms
0.2
0.25
Linguistic analysis typically uses a window length of 1020

ms. The transfer function of the vocal tract does not
change significantly in this time.
3000
Hz
2 (see earlier)
Time resolution = 2N/afs
Most windows have a
Equal amplitude frequency components with this

separation will give distinct peaks
Frequency resolution = fs a/N
Duration of window = N/fs
-60
2000
Uncertainty Principle
3.12
You cannot get good time resolution and good frequency

resolution from the same spectrogram.
For all window functions, the product of the time and

frequency resolutions is equal to 2.
1000
300 Hz, 7 ms
45 Hz, 44 ms
300 Hz, 7 ms
3.11
-40
-30
-20
-10
45 Hz, 44 ms
/aI/ from my with 45Hz and 300Hz bandwidth spectrograms
dB
dB
Overlapping Windows
3.13
Page 3.7
b/N
45 Hz, 44 ms
300 Hz, 7 ms
the separation between spectral samples is the window

width divided by b
we must sample the spectrum at a frequency of b/N
b = 4 for a Hamming window

Significant variation of spectral components occurs at
frequencies below b/N
dB bandwidth for an N-point window =
To keep all the information about time variation of spectral

components, you need only sample the spectrum twice as
fast as the spectral magnitudes are varying. Using
normalised frequencies:
fx = f0
1.25f0
1.5f0
3.14
2f0
50 Hz Bandwidth: Note harmonic spacing increases + fx warbles
300 Hz Bandwidth: Note constant formant frequencies
/A/ (ah) sung as an arpedgio
DFT Properties
3.15
Page 3.8
N 1
m= 0
Discrete
Real
Periodic: xm+N/r = xm
Skew Periodic: xm+N/2r = xm
Periodic: Xk+N = Xk
Hermitian: Xk = X*k
Discrete: Xk = 0 for k ir
Odd Harmonics:
Xk = 0 for k (2i+1)r
Even: xm = xNm Real
Odd: xm = xNm Purely Imaginary
Real & Even Real & Even
Real & Odd Purely Imaginary and Odd
k =0
xm2 = N X k2
N 1
Symmetries {xm} {Xk}
Ex =
Energy Conservation (Parsevals theorem)
Exact line-spectrum of a periodic signal {xm}

Sampled continuous spectrum of zero-extended {xm}
Sampled continuous spectrum of infinite {xm}
convolved with spectrum of rectangular window
FFT is an algorithm for calculating DFT in time
NlogN
N 1
k
2j
X k = xm exp
km = X ( z ) evaluated at z = exp 2j
N
m=0

Time Frequency Analysis of Speech Signals

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Time Frequency Analysis of Speech Signals

Transféré par

Droits d'auteur :

Formats disponibles

Page 3.

This lecture is concerned with the spectrogram as a

E.4.14 Speech Processing

any time quantity must be multiplied by the real sample

With sampled data systems it is customary to change the

Dark areas of spectrogram show high intensity

The spectrogram shows the energy in a signal at each

w(i) x(m i)exp

E.4.14 Speech Processing

there are only N+1 independent frequency values

For a window of length N ending on sample m we have:

Multiply by a finite length window and take the DFT.

Short-Term Discrete Fourier Transform

Line spectrum from

/Is/ from my speech

w(i) x(m i) exp

Hamming window of length

Short window eliminates the fine detail (N=29):

E.4.14 Speech Processing

Zero padded window gives more spectrum points and an illusion

Three different Hamming windows:

w(i ) x(m i ) exp

y(r) is a product of two signals so its DFT is the convolution of the

This is a standard DFT multiplied by a phase-shift term that is

By setting r = N 1 i we can rewrite this as:

0.423 + 0.498c1 + 0.079c2

Hamming: 0.54 + 0.46c1

Hanning: 0.5 + 0.5c1

E.4.14 Speech Processing

The 6dB and normalised bandwidth for an N-point window =

Thus the kth frequency bin is a filtered version of zk in

Now we have: X ( k ; m) = w(i ) x (m i ) exp

zk(r) is just the same as x(r) but shifted down in frequency

Concentrate on one particular value of k and define:

Time Resolution: Filter-Bank Viewpoint

E.4.14 Speech Processing

/aI/ from my with 45Hz and 300Hz bandwidth spectrograms

k/NT = 3.5 kHz

Amplitude variations with this period will be attenuated

E.4.14 Speech Processing

Linguistic analysis typically uses a window length of 1020

Time resolution = 2N/afs

Most windows have a

Equal amplitude frequency components with this

Frequency resolution = fs a/N

Duration of window = N/fs

You cannot get good time resolution and good frequency

For all window functions, the product of the time and

/aI/ from my with 45Hz and 300Hz bandwidth spectrograms

the separation between spectral samples is the window

we must sample the spectrum at a frequency of b/N

b = 4 for a Hamming window

dB bandwidth for an N-point window =

To keep all the information about time variation of spectral

E.4.14 Speech Processing

50 Hz Bandwidth: Note harmonic spacing increases + fx warbles

300 Hz Bandwidth: Note constant formant frequencies

/A/ (ah) sung as an arpedgio

Symmetries {xm} {Xk}

Energy Conservation (Parsevals theorem)

Exact line-spectrum of a periodic signal {xm}

E.4.14 Speech Processing

Vous aimerez peut-être aussi