Vous êtes sur la page 1sur 49

Lecture 1 Overview of Sound

1 1

Lecture overview
Overview of sound processing part of module Sound processing coursework Speech processing applications Analysis of speech signals Time-domain Frequency-domain Time-frequency domain (spectrogram)

Sound processing component overview


Lecture 1 Analysis of sound signals Lecture 2 Fourier transform Lecture 3 Acoustic phonetics Lecture 4 Articulatory phonetics Lecture 5 Speech recognition I - overview Lecture 6 Speech recognition II feature extraction Lecture 7 Speech recognition III acoustic modelling Lecture 8 Speech recognition IV language modelling Lecture 9 Speech enhancement Lecture 10 Case study : Formula 1 motor racing
3 3

Sound processing coursework


Design, implementation and testing of a speech recogniser capable of providing voice dialling for the class of CMPE3I07 in noisy conditions Coursework to be undertaken in pairs Two parts to assessment: Technical report outlining design and evaluation of speech recogniser Practical demonstration Tasks to be carried out: Speech collection Speech labelling Design and implementation of feature extraction Training of speech recogniser Evaluation of speech recogniser Aim to combine use of commercial tools and toolkits and own implementation through MATLAB
4 4

Sound processing coursework


Design, implementation and testing of a speech recogniser capable of providing voice dialling for the class of CMPE3I07 in noisy conditions Coursework to be undertaken in pairs Two parts to assessment: Technical report outlining design and evaluation of speech recogniser Practical demonstration Tasks to be carried out: Speech collection Speech labelling
Speech Filing System (SFS) http://www.phon.ucl.ac.uk/resource/sfs/

Design and implementation of feature extraction Training of speech recogniser Evaluation of speech recogniser Aim to combine use of commercial tools and toolkits and own implementation through MATLAB
4 4

Sound processing coursework


Design, implementation and testing of a speech recogniser capable of providing voice dialling for the class of CMPE3I07 in noisy conditions Coursework to be undertaken in pairs Two parts to assessment: Technical report outlining design and evaluation of speech recogniser Practical demonstration Tasks to be carried out: Speech collection Speech labelling
Speech Filing System (SFS) http://www.phon.ucl.ac.uk/resource/sfs/

Design and implementation of feature extraction Training of speech recogniser


HMM Toolkit (HTK) http://htk.eng.cam.ac.uk/

Evaluation of speech recogniser

Aim to combine use of commercial tools and toolkits and own implementation through MATLAB
4 4

Sound processing coursework


Design, implementation and testing of a speech recogniser capable of providing voice dialling for the class of CMPE3I07 in noisy conditions Coursework to be undertaken in pairs Two parts to assessment: Technical report outlining design and evaluation of speech recogniser Practical demonstration Tasks to be carried out: Speech collection Speech labelling
Speech Filing System (SFS) http://www.phon.ucl.ac.uk/resource/sfs/ MATLAB

Design and implementation of feature extraction Training of speech recogniser


HMM Toolkit (HTK)

Evaluation of speech recogniser

http://htk.eng.cam.ac.uk/

Aim to combine use of commercial tools and toolkits and own implementation through MATLAB
4 4

Speech processing
Sound part of module will concentrate mainly on speech processing Will study core signal processing techniques:
Spectral analysis and spectrograms Fourier transform Acoustic and articulatory phonetics Feature extraction Acoustic modelling Language modelling Classification Filtering

Examine them in the context of speech processing applications


Speech recognition Speech enhancement Speech synthesis

Work in speech processing requires a wide range of skills and knowledge


5 5

Where are the jobs in speech processing ?


Speech (and signal) processing is found in a very broad range of applications, services and products Many companies involved in speech processing
Specific speech processing companies e.g. Nuance, SRC, . Computer companies Apple, IBM, Microsoft, . Internet companies Google, Yahoo, Skype, . Mobile phone companies and providers Nokia, Motorola, . Telcos BT, AT&T, France Telecom, Deutsch Telecom, . Plus many other smaller companies in a range of areas

Speech recognition market worth $40 billion in 2010, growth of 8.8% per year expected Signal processing even more wide ranging e.g. acoustics, sonar, medical, image recognition/processing, ..

6 6

Speech analysis
Audio signals are perceived and understood (in the case of speech) by their frequency content, not by their waveform representation The ear acts as a frequency analyser and feeds information about frequency content to the brain In many speech processing applications the frequency content of signals is required speech recognition, coding, synthesis, enhancement, etc However, frequency analysis is also important for other signals such as images and radio-frequency signals: in fact, for any information-bearing signal To highlight this will now compare the time-domain and frequencydomain representations of signals to see what information can be

7 7

Time-varying nature of speech


A speech signal changes constantly as different speech sounds are made For speech recognition, synthesis and coding applications the frequency content of the signal needs to be measured every 10-50 ms pseudo-stationary regions

8 8

Time-varying nature of speech


A speech signal changes constantly as different speech sounds are made For speech recognition, synthesis and coding applications the frequency content of the signal needs to be measured every 10-50 ms pseudo-stationary regions 500 ms

8 8

Time-varying nature of speech


A speech signal changes constantly as different speech sounds are made For speech recognition, synthesis and coding applications the frequency content of the signal needs to be measured every 10-50 ms pseudo-stationary regions 500 ms

30 ms
8 8

Time-varying nature of speech


A speech signal changes constantly as different speech sounds are made For speech recognition, synthesis and coding applications the frequency content of the signal needs to be measured every 10-50 ms pseudo-stationary regions 500 ms

30 ms

So we need a frequency analysis technique that can analyse short periods of signal This is the Discrete Fourier Transform (DFT) (discrete because the signal is sampled, not continuous)

8 8

Digital speech sampling and quantisation


Take analogue speech signal continuous in amplitude, continuous in time amplitude

,me

9 9

Digital speech sampling and quantisation


Take analogue speech signal continuous in amplitude, continuous in time amplitude

Ts

,me

Sampling - take samples of the waveform every Ts seconds

9 9

Digital speech sampling and quantisation


Take analogue speech signal continuous in amplitude, continuous in time amplitude
q1 q2 q3 q4 q5 q6 q7 q8

Ts

,me

Sampling - take samples of the waveform every Ts seconds

9 9

Digital speech sampling and quantisation


Take analogue speech signal continuous in amplitude, continuous in time amplitude
q1 q2 q3 q4 q5 q6 q7 q8

Ts

,me

Sampling - take samples of the waveform every Ts seconds Quantisation allocate sample amplitudes to nearest quantisation levels

9 9

Digital speech sampling and quantisation


Take analogue speech signal continuous in amplitude, continuous in time amplitude
q1 q2 q3 q4 q5 q6 q7 q8

Ts

,me

Sampling - take samples of the waveform every Ts seconds Quantisation allocate sample amplitudes to nearest quantisation levels Can represent the discrete time, discrete amplitude signal as a vector, x(n) x = [1, 3, 7, 7, 5, 3, 3, 3, 3, 1, -3, -7, -7] So, x(1) = 1; x(2) = 3; .. x(11) = -3, ..
9 9

Time-domain analysis of speech


Examine a time-domain waveform of a sentence of speech x-axis shows time seconds or samples y-axis shows amplitude of each sample What does it show?

10 10

Time-domain analysis of speech

What does it show? Duration of utterance Guide to energy speech or non-speech Maybe indication of voicing voiced or unvoiced Quite limited in terms of detail shown A tanker is a ship designed to carry large volumes of oil or other liquid cargo
11 11

Time-domain analysis of speech

What does it show? Duration of utterance Guide to energy speech or non-speech Maybe indication of voicing voiced or unvoiced Quite limited in terms of detail shown A tanker is a ship designed to carry large volumes of oil or other liquid cargo
11 11

Time-domain analysis of speech


Now zoom in to look at a small section of the utterance This shows more detail Speech/nonspeech, voiced/unvoiced more clearly identified Still limited cannot identify actual sound (phoneme)

12 12

Time-domain analysis of speech


Now zoom in to look at a small section of the utterance This shows more detail Speech/nonspeech, voiced/unvoiced more clearly identified Still limited cannot identify actual sound (phoneme)

12 12

Frequency-domain analysis of speech


For frequency-domain analysis need to transform the time-domain signal into the frequency-domain Several methods exist to do this e.g. Fourier transform, filterbank For signal processing applications most common is to use the Fourier transform Fourier transform comes in different forms: Fourier transform Discrete Fourier transform (DFT) Fast Fourier transform (FFT) - fft function in MATLAB
Time-domain signal Frequency-domain signal

DFT

13 13

Frequency-domain analysis of speech


This is useful as it shows which frequencies are present in a signal and how much energy is present at that frequency This is important for analysing and classifying signals and generating signals

14 14

Frequency-domain analysis of speech


This is useful as it shows which frequencies are present in a signal and how much energy is present at that frequency This is important for analysing and classifying signals and generating signals
Time-domain signal Frequency-domain signal

DFT

14 14

Frequency-domain analysis of speech


This is useful as it shows which frequencies are present in a signal and how much energy is present at that frequency This is important for analysing and classifying signals and generating signals
Time-domain signal Frequency-domain signal

DFT

DFT

14 14

Frequency-domain analysis of speech


This is useful as it shows which frequencies are present in a signal and how much energy is present at that frequency This is important for analysing and classifying signals and generating signals
Time-domain signal Frequency-domain signal

DFT

DFT

DFT
14 14

Frequency-domain analysis of speech

15 15

Frequency-domain analysis of speech

15 15

Frequency-domain analysis of speech

DFT Magnitude spectrum

15 15

Frequency-domain analysis of speech

DFT Magnitude spectrum

Magnitude spectrum provides much more information


Spectral envelope (phoneme sound) Harmonics (pitch) Energy
15 15

Frequency-domain analysis of speech

DFT Magnitude spectrum

Magnitude spectrum provides much more information


Spectral envelope (phoneme sound) Harmonics (pitch) Energy
15 15

Frequency-domain analysis of speech

DFT Magnitude spectrum

Magnitude spectrum provides much more information


Spectral envelope (phoneme sound) Harmonics (pitch) Energy
15 15

Frequency-domain analysis of speech

DFT Magnitude spectrum

16 16

Time-frequency analysis of speech


Time-domain analysis shows that speech signals changes substantially over time it is a time-varying signal Frequency-domain enables us to see the frequency composition of signals that is much more useful for analysis that the time-domain signal However, the frequency-domain analysis can only be performed on quasistationary portions of the signal, which are short by nature (10-50ms) Solution is time-frequency analysis, or spectrogram

17 17

Time-frequency analysis of speech

Process to create a spectrogram:


1. 2. 3. 4. 5. Extract short-duration window of signal Take DFT and obtain magnitude spectrum Allocate spectral amplitudes different colours Plot colours Return to 1 until end of signal

Freq.

Time

18 18

Time-frequency analysis of speech

Process to create a spectrogram:


1. 2. 3. 4. 5. Extract short-duration window of signal Take DFT and obtain magnitude spectrum Allocate spectral amplitudes different colours Plot colours Return to 1 until end of signal

Freq.

Time

18 18

Time-frequency analysis of speech

Process to create a spectrogram:


DFT 1. 2. 3. 4. 5. Extract short-duration window of signal Take DFT and obtain magnitude spectrum Allocate spectral amplitudes different colours Plot colours Return to 1 until end of signal

Freq.

Time

18 18

Time-frequency analysis of speech

Process to create a spectrogram:


DFT 1. 2. 3. 4. 5. Extract short-duration window of signal Take DFT and obtain magnitude spectrum Allocate spectral amplitudes different colours Plot colours Return to 1 until end of signal

Freq.

Time

18 18

Time-frequency analysis of speech

Process to create a spectrogram:


DFT 1. 2. 3. 4. 5. Extract short-duration window of signal Take DFT and obtain magnitude spectrum Allocate spectral amplitudes different colours Plot colours Return to 1 until end of signal

Freq.

Time

18 18

Time-frequency analysis of speech


large volumes of oil or other liquid cargo

Time

19 19

Time-frequency analysis of other signals


What characteristics does this signal have? What could have produced this sound?

20 20

Time-frequency analysis of other signals


What characteristics does this signal have? What could have produced this sound?

20 20

Time-frequency analysis of other signals


Compare to equivalent time-domain signal Very hard to identify any features only duration and gradual increase in energy

21 21

Time-frequency analysis of other signals


What characteristics does this signal have? What could have produced this sound? Also information about the recording

22 22 22

Time-frequency analysis of other signals


What characteristics does this signal have? What could have produced this sound? Also information about the recording

22 22 22

Summary
Considered methods for analysing the characteristics and features of an audio signal Observed that real signals (e.g. speech) are not stationary but can vary rapidly over time Time-domain can show this variation Frequency-domain usually provides more information but requires a quasistationary signal for analysis One solution is time-frequency representation (spectrogram) Will return to spectrograms when we study acoustic phonetics

23 23 23

Vous aimerez peut-être aussi