Académique Documents
Professionnel Documents
Culture Documents
1 1
Lecture overview
Overview of sound processing part of module Sound processing coursework Speech processing applications Analysis of speech signals Time-domain Frequency-domain Time-frequency domain (spectrogram)
Design and implementation of feature extraction Training of speech recogniser Evaluation of speech recogniser Aim to combine use of commercial tools and toolkits and own implementation through MATLAB
4 4
Aim to combine use of commercial tools and toolkits and own implementation through MATLAB
4 4
http://htk.eng.cam.ac.uk/
Aim to combine use of commercial tools and toolkits and own implementation through MATLAB
4 4
Speech processing
Sound part of module will concentrate mainly on speech processing Will study core signal processing techniques:
Spectral analysis and spectrograms Fourier transform Acoustic and articulatory phonetics Feature extraction Acoustic modelling Language modelling Classification Filtering
Speech recognition market worth $40 billion in 2010, growth of 8.8% per year expected Signal processing even more wide ranging e.g. acoustics, sonar, medical, image recognition/processing, ..
6 6
Speech analysis
Audio signals are perceived and understood (in the case of speech) by their frequency content, not by their waveform representation The ear acts as a frequency analyser and feeds information about frequency content to the brain In many speech processing applications the frequency content of signals is required speech recognition, coding, synthesis, enhancement, etc However, frequency analysis is also important for other signals such as images and radio-frequency signals: in fact, for any information-bearing signal To highlight this will now compare the time-domain and frequencydomain representations of signals to see what information can be
7 7
8 8
8 8
30
ms
8 8
30 ms
So we need a frequency analysis technique that can analyse short periods of signal This is the Discrete Fourier Transform (DFT) (discrete because the signal is sampled, not continuous)
8 8
,me
9 9
Ts
,me
9 9
Ts
,me
9 9
Ts
,me
Sampling - take samples of the waveform every Ts seconds Quantisation allocate sample amplitudes to nearest quantisation levels
9 9
Ts
,me
Sampling - take samples of the waveform every Ts seconds Quantisation allocate sample amplitudes to nearest quantisation levels Can represent the discrete time, discrete amplitude signal as a vector, x(n) x = [1, 3, 7, 7, 5, 3, 3, 3, 3, 1, -3, -7, -7] So, x(1) = 1; x(2) = 3; .. x(11) = -3, ..
9 9
10 10
What does it show? Duration of utterance Guide to energy speech or non-speech Maybe indication of voicing voiced or unvoiced Quite limited in terms of detail shown A tanker is a ship designed to carry large volumes of oil or other liquid cargo
11 11
What does it show? Duration of utterance Guide to energy speech or non-speech Maybe indication of voicing voiced or unvoiced Quite limited in terms of detail shown A tanker is a ship designed to carry large volumes of oil or other liquid cargo
11 11
12 12
12 12
DFT
13 13
14 14
DFT
14 14
DFT
DFT
14 14
DFT
DFT
DFT
14 14
15 15
15 15
15 15
16 16
17 17
Freq.
Time
18 18
Freq.
Time
18 18
Freq.
Time
18 18
Freq.
Time
18 18
Freq.
Time
18 18
Time
19 19
20 20
20 20
21 21
22 22 22
22 22 22
Summary
Considered methods for analysing the characteristics and features of an audio signal Observed that real signals (e.g. speech) are not stationary but can vary rapidly over time Time-domain can show this variation Frequency-domain usually provides more information but requires a quasistationary signal for analysis One solution is time-frequency representation (spectrogram) Will return to spectrograms when we study acoustic phonetics
23 23 23