Lecture 1 Speech Analysis

Lecture 1 Overview of Sound
1 1
Lecture overview
Overview of sound processing part of module Sound processing coursework Speech processing applications Analysis of speech signals Time-domain Frequency-domain Time-frequency domain (spectrogram)
Sound processing component overview

Lecture 1 Analysis of sound signals Lecture 2 Fourier transform Lecture 3 Acoustic phonetics Lecture 4 Articulatory phonetics Lecture 5 Speech recognition I - overview Lecture 6 Speech recognition II feature extraction Lecture 7 Speech recognition III acoustic modelling Lecture 8 Speech recognition IV language modelling Lecture 9 Speech enhancement Lecture 10 Case study : Formula 1 motor racing
3 3
Sound processing coursework

Design, implementation and testing of a speech recogniser capable of providing voice dialling for the class of CMPE3I07 in noisy conditions Coursework to be undertaken in pairs Two parts to assessment: Technical report outlining design and evaluation of speech recogniser Practical demonstration Tasks to be carried out: Speech collection Speech labelling Design and implementation of feature extraction Training of speech recogniser Evaluation of speech recogniser Aim to combine use of commercial tools and toolkits and own implementation through MATLAB
4 4

Design, implementation and testing of a speech recogniser capable of providing voice dialling for the class of CMPE3I07 in noisy conditions Coursework to be undertaken in pairs Two parts to assessment: Technical report outlining design and evaluation of speech recogniser Practical demonstration Tasks to be carried out: Speech collection Speech labelling
Speech Filing System (SFS) http://www.phon.ucl.ac.uk/resource/sfs/
Design and implementation of feature extraction Training of speech recogniser Evaluation of speech recogniser Aim to combine use of commercial tools and toolkits and own implementation through MATLAB
4 4

Speech Filing System (SFS) http://www.phon.ucl.ac.uk/resource/sfs/
Design and implementation of feature extraction Training of speech recogniser

HMM Toolkit (HTK) http://htk.eng.cam.ac.uk/
Evaluation of speech recogniser
Aim to combine use of commercial tools and toolkits and own implementation through MATLAB
4 4

Speech Filing System (SFS) http://www.phon.ucl.ac.uk/resource/sfs/ MATLAB
Design and implementation of feature extraction Training of speech recogniser

HMM Toolkit (HTK)
Evaluation of speech recogniser
http://htk.eng.cam.ac.uk/
Aim to combine use of commercial tools and toolkits and own implementation through MATLAB
4 4
Speech processing
Sound part of module will concentrate mainly on speech processing Will study core signal processing techniques:
Spectral analysis and spectrograms Fourier transform Acoustic and articulatory phonetics Feature extraction Acoustic modelling Language modelling Classification Filtering
Examine them in the context of speech processing applications

Speech recognition Speech enhancement Speech synthesis
Work in speech processing requires a wide range of skills and knowledge

5 5
Where are the jobs in speech processing ?

Speech (and signal) processing is found in a very broad range of applications, services and products Many companies involved in speech processing
Specific speech processing companies e.g. Nuance, SRC, . Computer companies Apple, IBM, Microsoft, . Internet companies Google, Yahoo, Skype, . Mobile phone companies and providers Nokia, Motorola, . Telcos BT, AT&T, France Telecom, Deutsch Telecom, . Plus many other smaller companies in a range of areas
Speech recognition market worth $40 billion in 2010, growth of 8.8% per year expected Signal processing even more wide ranging e.g. acoustics, sonar, medical, image recognition/processing, ..
6 6
Speech analysis
Audio signals are perceived and understood (in the case of speech) by their frequency content, not by their waveform representation The ear acts as a frequency analyser and feeds information about frequency content to the brain In many speech processing applications the frequency content of signals is required speech recognition, coding, synthesis, enhancement, etc However, frequency analysis is also important for other signals such as images and radio-frequency signals: in fact, for any information-bearing signal To highlight this will now compare the time-domain and frequencydomain representations of signals to see what information can be
7 7
Time-varying nature of speech

A speech signal changes constantly as different speech sounds are made For speech recognition, synthesis and coding applications the frequency content of the signal needs to be measured every 10-50 ms pseudo-stationary regions
8 8

A speech signal changes constantly as different speech sounds are made For speech recognition, synthesis and coding applications the frequency content of the signal needs to be measured every 10-50 ms pseudo-stationary regions 500 ms
8 8

30 ms
8 8

30 ms
So we need a frequency analysis technique that can analyse short periods of signal This is the Discrete Fourier Transform (DFT) (discrete because the signal is sampled, not continuous)
8 8
Digital speech sampling and quantisation

Take analogue speech signal continuous in amplitude, continuous in time amplitude
,me
9 9

Ts
,me
Sampling - take samples of the waveform every Ts seconds
9 9

q1 q2 q3 q4 q5 q6 q7 q8
Ts
,me
Sampling - take samples of the waveform every Ts seconds
9 9

q1 q2 q3 q4 q5 q6 q7 q8
Ts
,me
Sampling - take samples of the waveform every Ts seconds Quantisation allocate sample amplitudes to nearest quantisation levels
9 9

q1 q2 q3 q4 q5 q6 q7 q8
Ts
,me
Sampling - take samples of the waveform every Ts seconds Quantisation allocate sample amplitudes to nearest quantisation levels Can represent the discrete time, discrete amplitude signal as a vector, x(n) x = [1, 3, 7, 7, 5, 3, 3, 3, 3, 1, -3, -7, -7] So, x(1) = 1; x(2) = 3; .. x(11) = -3, ..
9 9
Time-domain analysis of speech

Examine a time-domain waveform of a sentence of speech x-axis shows time seconds or samples y-axis shows amplitude of each sample What does it show?
10 10
What does it show? Duration of utterance Guide to energy speech or non-speech Maybe indication of voicing voiced or unvoiced Quite limited in terms of detail shown A tanker is a ship designed to carry large volumes of oil or other liquid cargo
11 11
What does it show? Duration of utterance Guide to energy speech or non-speech Maybe indication of voicing voiced or unvoiced Quite limited in terms of detail shown A tanker is a ship designed to carry large volumes of oil or other liquid cargo
11 11

Now zoom in to look at a small section of the utterance This shows more detail Speech/nonspeech, voiced/unvoiced more clearly identified Still limited cannot identify actual sound (phoneme)
12 12

Now zoom in to look at a small section of the utterance This shows more detail Speech/nonspeech, voiced/unvoiced more clearly identified Still limited cannot identify actual sound (phoneme)
12 12
Frequency-domain analysis of speech

For frequency-domain analysis need to transform the time-domain signal into the frequency-domain Several methods exist to do this e.g. Fourier transform, filterbank For signal processing applications most common is to use the Fourier transform Fourier transform comes in different forms: Fourier transform Discrete Fourier transform (DFT) Fast Fourier transform (FFT) - fft function in MATLAB
Time-domain signal Frequency-domain signal
DFT
13 13

This is useful as it shows which frequencies are present in a signal and how much energy is present at that frequency This is important for analysing and classifying signals and generating signals
14 14

DFT
14 14

DFT
DFT
14 14

DFT
DFT
DFT
14 14
15 15
15 15
DFT Magnitude spectrum
15 15
Magnitude spectrum provides much more information

Spectral envelope (phoneme sound) Harmonics (pitch) Energy
15 15

15 15

15 15
16 16
Time-frequency analysis of speech

Time-domain analysis shows that speech signals changes substantially over time it is a time-varying signal Frequency-domain enables us to see the frequency composition of signals that is much more useful for analysis that the time-domain signal However, the frequency-domain analysis can only be performed on quasistationary portions of the signal, which are short by nature (10-50ms) Solution is time-frequency analysis, or spectrogram
17 17
Process to create a spectrogram:

1. 2. 3. 4. 5. Extract short-duration window of signal Take DFT and obtain magnitude spectrum Allocate spectral amplitudes different colours Plot colours Return to 1 until end of signal
Freq.
Time
18 18

1. 2. 3. 4. 5. Extract short-duration window of signal Take DFT and obtain magnitude spectrum Allocate spectral amplitudes different colours Plot colours Return to 1 until end of signal
Freq.
Time
18 18

DFT 1. 2. 3. 4. 5. Extract short-duration window of signal Take DFT and obtain magnitude spectrum Allocate spectral amplitudes different colours Plot colours Return to 1 until end of signal
Freq.
Time
18 18

Freq.
Time
18 18

Freq.
Time
18 18

large volumes of oil or other liquid cargo
Time
19 19
Time-frequency analysis of other signals

What characteristics does this signal have? What could have produced this sound?
20 20

What characteristics does this signal have? What could have produced this sound?
20 20

Compare to equivalent time-domain signal Very hard to identify any features only duration and gradual increase in energy
21 21

What characteristics does this signal have? What could have produced this sound? Also information about the recording
22 22 22

What characteristics does this signal have? What could have produced this sound? Also information about the recording
22 22 22
Summary
Considered methods for analysing the characteristics and features of an audio signal Observed that real signals (e.g. speech) are not stationary but can vary rapidly over time Time-domain can show this variation Frequency-domain usually provides more information but requires a quasistationary signal for analysis One solution is time-frequency representation (spectrogram) Will return to spectrograms when we study acoustic phonetics
23 23 23

Lecture 1 Speech Analysis

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Lecture 1 Speech Analysis

Transféré par

Droits d'auteur :

Formats disponibles

Lecture 1 Overview of Sound

Sound processing component overview

Sound processing coursework

Sound processing coursework

Sound processing coursework

Design and implementation of feature extraction Training of speech recogniser

Evaluation of speech recogniser

Sound processing coursework

Design and implementation of feature extraction Training of speech recogniser

Evaluation of speech recogniser

Examine them in the context of speech processing applications

Work in speech processing requires a wide range of skills and knowledge

Where are the jobs in speech processing ?

Time-varying nature of speech

Time-varying nature of speech

Time-varying nature of speech

Time-varying nature of speech

Digital speech sampling and quantisation

Digital speech sampling and quantisation

Sampling - take samples of the waveform every Ts seconds

Digital speech sampling and quantisation

Sampling - take samples of the waveform every Ts seconds

Digital speech sampling and quantisation

Digital speech sampling and quantisation

Time-domain analysis of speech

Time-domain analysis of speech

Time-domain analysis of speech

Time-domain analysis of speech

Time-domain analysis of speech

Frequency-domain analysis of speech

Frequency-domain analysis of speech

Frequency-domain analysis of speech

Frequency-domain analysis of speech

Frequency-domain analysis of speech

Frequency-domain analysis of speech

Frequency-domain analysis of speech

Frequency-domain analysis of speech

DFT Magnitude spectrum

Frequency-domain analysis of speech

DFT Magnitude spectrum

Magnitude spectrum provides much more information

Frequency-domain analysis of speech

DFT Magnitude spectrum

Magnitude spectrum provides much more information

Frequency-domain analysis of speech

DFT Magnitude spectrum

Magnitude spectrum provides much more information

Frequency-domain analysis of speech

DFT Magnitude spectrum

Time-frequency analysis of speech

Time-frequency analysis of speech

Process to create a spectrogram:

Time-frequency analysis of speech

Process to create a spectrogram:

Time-frequency analysis of speech

Process to create a spectrogram:

Time-frequency analysis of speech

Process to create a spectrogram:

Time-frequency analysis of speech

Process to create a spectrogram:

Time-frequency analysis of speech

Time-frequency analysis of other signals

Time-frequency analysis of other signals

Time-frequency analysis of other signals