Vous êtes sur la page 1sur 8

Page 3.

This lecture is concerned with the spectrogram as a


representation of how the frequency components within a
signal vary with time.
Define what we mean by normalised time and
frequency
Define the short-term discrete fourier transform
Look at the effect of different window lengths on time
and frequency resolution
Derive a quantitative description of the frequency
resolution of the short term DFT and compare the
performance of common windows
Derive a quantitative description of the time resolution
of the short term DFT
Explain the uncertainty principle and illustrate its
effects with some examples
Review some of the properties of the DFT.

Time-Frequency Representation

E.4.14 Speech Processing

any time quantity must be multiplied by the real sample


period (or divided by the real sample frequency)
any frequency or angular frequency quantity must be
multiplied by the real sample frequency (or divided by the
real sample period)

With sampled data systems it is customary to change the


time scale and work in units of the sample period.
In normalised units the sampling frequency (fs) and
period (T) both equal 1. They can therefore be omitted
from equations: any such omissions can be deduced
using dimensional consistency arguments.
The nyquist frequency is Hz = radians/second
To convert back to real units:

3.2

Time-Frequency Representation

TIMEFREQ.PPT(15/04/2002)

Normalised Time

3.1

Lecture 3

TIMEFREQ.PPT(15/04/2002)

Dark areas of spectrogram show high intensity

my speech

Page 3.2

Hz

Spectrogram

The spectrogram shows the energy in a signal at each


frequency and at each time. We calculate this by
evaluating the short-term discrete fourier transform.

TIMEFREQ.PPT(15/04/2002)

dB

3.3

Time-Frequency Representation

3.4

2 j

k (m i )

i =0

[k=frequency, m=time]

w(i) x(m i)exp

N 1

E.4.14 Speech Processing

there are only N+1 independent frequency values

Notes:
|X(k;m)|2 gives the power at a frequency of k/N Hz
(normalized) for a window centred at m-(N-1).
the (m-i) term in the exponent means that the phase
origin remains consistent by cancelling out the linear
phase shift introduced by a delay of m samples.
the window samples are numbered backwards in time
(for convenience later) hence the summation is
performed backwards in time.
The values X(k;m) are based on the N signal values
from mN+1 to m.
the frequency resolution is 1/N Hz (normalised units)
the spectrum is periodic and (since w and x are real)
conjugate symmetric:
X ( k ) = X ( k + N ) = X * ( k )

X ( k ; m) =

For a window of length N ending on sample m we have:

Multiply by a finite length window and take the DFT.

We often want to estimate the power spectrum of a nonstationary signal at a particular instant of time.

Short-Term Discrete Fourier Transform

TIMEFREQ.PPT(15/04/2002)

i =0

5 0 0

i=0

1 0 0 0

in d o w

Page 3.3

Line spectrum from


larynx oscillation (at
about 0.01 normalised
Hz) is superimposed
on vocal tract
resonances.

i=400

1 5

f1
f2
f3
f4

3.5

f5

m=600

/Is/ from my speech

2 j

k (m i )

w(i) x(m i) exp

N 1

Hamming window of length


401 samples centred on 400.

X ( k ; m) =

TIMEFREQ.PPT(15/04/2002)

Short window eliminates the fine detail (N=29):

(N=401)

in d o w

5 0 0

e d

a ta

1 0 0 0

1 5

( 1 8 3 - 4

-7

-6

-5

-4

-3

-2

-1

3.6

E.4.14 Speech Processing

Zero padded window gives more spectrum points and an illusion


of more detail (N=232=2916). This is the normal case for a
spectrogram.

Time-Frequency Representation

Three different Hamming windows:

TIMEFREQ.PPT(15/04/2002)

dB

dB

dB

i =0

2 j

k ( m i )

w(i ) x(m i ) exp

N 1

2j
ym (r ) exp N kr
r =0

N 1

3.7

Page 3.4

y(r) is a product of two signals so its DFT is the convolution of the


DFTs of w(N1r) and x(mN+1+r)

This is a standard DFT multiplied by a phase-shift term that is


proportional to k: this compensates for the starting time of the
window: mN+1

where ym (r ) = w( N 1 r ) x (m N + 1 + r )

2j (m N + 1)
X ( k ; m) = exp
k

By setting r = N 1 i we can rewrite this as:

X ( k ; m) =

TIMEFREQ.PPT(15/04/2002)

Blackman-Harris: 4 term
0.359 + 0.488c1 +
0.141c2 + 0.012c3
a=2.72, b=8
Sidelobe = 92dB

a=1.81, b=6
Sidelobe = 67dB

0.423 + 0.498c1 + 0.079c2

Blackman-Harris: 3 term

Hamming: 0.54 + 0.46c1


a=1.81, b=4
Sidelobe = 43dB

Hanning: 0.5 + 0.5c1


a=1.65, b=4
Sidelobe = 23dB

Rectangular: w( n) 1
a=1.21, b=2
Sidelobe = -13dB

Time-Frequency Representation

3.8

0
-10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0
-10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0
-10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0
-10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0
-10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

-5

-5

-5

-5

-5

0
S a mple

harris 4

0
S a mple

ha rris 3

0
Sa mple

hamming

0
Sample

ha nning

0
S ample

re ctangle

-80
-0.5

-70

-60

-50

-40

-30

-20

-10

-80
-0.5

-70

-60

-50

-40

-30

-20

-10

-80
-0.5

-70

-60

-50

-40

-30

-20

-10

-80
-0.5

-70

-60

-50

-40

-30

-20

-10

-80
-0.5

-70

-60

-50

0
Norm alis ed Fre que ncy

ha rris 4

0
Norma lis e d Fre que ncy

ha rris 3

0
Norma lis e d Fre que ncy

ha mming

0
Norma lis e d Fre que ncy

ha nning

0
Norm alis e d Freque ncy

re cta ngle

0.5

0.5

0.5

0.5

0.5

E.4.14 Speech Processing

10

10

10

10

10

-40

-30

-20

-10

The 6dB and normalised bandwidth for an N-point window =


a/N and b/N respectively. Common windows & values for a & b are
shown below. In the formulae ck = cos(2k (n N ) / N )

TIMEFREQ.PPT(15/04/2002)

dB

dB

dB
dB

dB

N 1

i =0

Page 3.5

Frequency Shift
by k/N

zk
Low-pass filter
(window)

X(k;)

Thus the kth frequency bin is a filtered version of zk in


which the filter has an impulse response of w(i). From the
previous slide, this is a low-pass filter with a 6dB
bandwidth of a/N.

i =0

= w(i ) zk (m i )

N 1

f k/N

2 j
k ( m i )

f k/N

Zk(j):

z k ( r ) = exp (2 j ( f k / N )r )

Now we have: X ( k ; m) = w(i ) x (m i ) exp

X(j):

x ( r ) = exp (2 jfr )

zk(r) is just the same as x(r) but shifted down in frequency


by k/N. E.g. if x is a complex exponential at frequency f

2 j
kr
zk (r ) = x(r ) exp
N

Concentrate on one particular value of k and define:

Time Resolution: Filter-Bank Viewpoint

3.9

Time-Frequency Representation

3.10

E.4.14 Speech Processing

/aI/ from my with 45Hz and 300Hz bandwidth spectrograms

TIMEFREQ.PPT(15/04/2002)

BW = 45 Hz
BW = 300 Hz

TIMEFREQ.PPT(15/04/2002)

NT = 44 ms
NT = 7 ms

Page 3.6

300 Hz gives
finer time
resolution

k/NT = 3.5 kHz

Horizontal slice
through
spectrogram:

45 Hz gives
finer frequency
resolution

mT = 0.1 s

Vertical slice
through
spectrogram:

4000

5000

6000

Time-Frequency Representation

Amplitude variations with this period will be attenuated


by 6 dB.

-70
0

-65

-60

-55

-50

-45

-40

-35

-80
0

-70

-50

0.05

0.1
0.15
Time (s )

300 Hz, 7 ms
45 Hz, 44 ms
0.2

0.25

E.4.14 Speech Processing

Linguistic analysis typically uses a window length of 1020


ms. The transfer function of the vocal tract does not
change significantly in this time.
3000
Hz

2 (see earlier)

Time resolution = 2N/afs

Most windows have a

Equal amplitude frequency components with this


separation will give distinct peaks

Frequency resolution = fs a/N

Duration of window = N/fs

-60

2000

Uncertainty Principle

3.12

You cannot get good time resolution and good frequency


resolution from the same spectrogram.

TIMEFREQ.PPT(15/04/2002)

For all window functions, the product of the time and


frequency resolutions is equal to 2.

1000

300 Hz, 7 ms
45 Hz, 44 ms

300 Hz, 7 ms

3.11

-40

-30

-20

-10

45 Hz, 44 ms

/aI/ from my with 45Hz and 300Hz bandwidth spectrograms

TIMEFREQ.PPT(15/04/2002)

dB

dB

Overlapping Windows

3.13

Page 3.7

b/N

45 Hz, 44 ms
300 Hz, 7 ms

the separation between spectral samples is the window


width divided by b

we must sample the spectrum at a frequency of b/N

b = 4 for a Hamming window


Significant variation of spectral components occurs at
frequencies below b/N

dB bandwidth for an N-point window =

To keep all the information about time variation of spectral


components, you need only sample the spectrum twice as
fast as the spectral magnitudes are varying. Using
normalised frequencies:

TIMEFREQ.PPT(15/04/2002)

Time-Frequency Representation

fx = f0

1.25f0

1.5f0

3.14

E.4.14 Speech Processing

2f0

50 Hz Bandwidth: Note harmonic spacing increases + fx warbles

300 Hz Bandwidth: Note constant formant frequencies

/A/ (ah) sung as an arpedgio

TIMEFREQ.PPT(15/04/2002)

DFT Properties

3.15

Page 3.8

N 1

m= 0

Discrete
Real
Periodic: xm+N/r = xm
Skew Periodic: xm+N/2r = xm

Periodic: Xk+N = Xk
Hermitian: Xk = X*k
Discrete: Xk = 0 for k ir
Odd Harmonics:
Xk = 0 for k (2i+1)r
Even: xm = xNm Real
Odd: xm = xNm Purely Imaginary
Real & Even Real & Even
Real & Odd Purely Imaginary and Odd

k =0

xm2 = N X k2

N 1

Symmetries {xm} {Xk}

Ex =

Energy Conservation (Parsevals theorem)

Exact line-spectrum of a periodic signal {xm}


Sampled continuous spectrum of zero-extended {xm}
Sampled continuous spectrum of infinite {xm}
convolved with spectrum of rectangular window
FFT is an algorithm for calculating DFT in time
NlogN

N 1

k
2j

X k = xm exp
km = X ( z ) evaluated at z = exp 2j
N

m=0

TIMEFREQ.PPT(15/04/2002)

Time-Frequency Representation

E.4.14 Speech Processing

Vous aimerez peut-être aussi