Académique Documents
Professionnel Documents
Culture Documents
Overview
Introduction. Main principles of speaker recognition. Selected Method. Speaker recognition Models. Feature Extraction. Feature Extraction Implementation. Conclusion.
Introduction:
Our Project Graduation project I Graduation project II Voice recognition
Face recognition
Identification
Verification
Text Independent:
Speaker models capture characteristics of somebodys speech which show up irrespective of what one is saying.
Selected Method
Text Independent:
Identify the person who speaks regardless to what is saying.
Feature Extraction
Feature Matching
ze
ro
There are silence at the beginning and at the end of the signals. The word consists of two syllables.
Frame Blocking
Continuous speech
Frame Blocking
frame
Windowing
FFT
spectrum
Cepstrum
Mel-frequency wrapping
Continuous speech signal is blocked into frames of N samples with adjacent frames being separated by M(M<N).
Frame1 Frame2 Frame3 Consist of First N samples. Begins M samples after first frame, and the overlaps it by N-M samples. Begins 2M samples after the first frame, and the overlaps it by N-2M samples.
Frame Blocking
Continuous speech
Frame Blocking
frame
Windowing
FFT
spectrum
Cepstrum
Mel-frequency wrapping
Continuous speech signal is blocked into frames of N samples with adjacent frames being separated by M(M<N).
Frame1 Frame2 Frame3 Consist of First N samples. Begins M samples after first frame, and the overlaps it by N-M samples. Begins 2M samples after the first frame, and the overlaps it by N-2M samples.
The speech signals were blocked into frames of N samples with overlap.
Windowing
Continuous speech
Frame Blocking
frame
Windowing
FFT
spectrum
Cepstrum
Mel-frequency wrapping
Each individual frame will be windowed. Hamming window is used in this project.
After Windowing
Before Windowing
After Windowing
After Windowing
Continuous speech
Frame Blocking
frame
Windowing
FFT
spectrum
Cepstrum
Mel-frequency wrapping
Convert each frame of N samples form time domain into the frequency domain.
After Windowing
Continuous speech
Frame Blocking
frame
Windowing
FFT
spectrum
Cepstrum
Mel-frequency wrapping
Mel frequency scale is a linear frequency spacing below 1KHz and a logarithmic spacing above 1KHz.
Filter Bank.
Mel-Frequency Wrapping
After Windowing
Continuous speech
Frame Blocking
frame
Windowing
FFT
spectrum
Cepstrum
Mel-frequency wrapping
Data points of all sounds after passing it into LBG Algorithm for first set
Data points of all sounds after passing it into LBG Algorithm for first set
VQ advantages
The model is trained much faster than other method like Back Propagation. It is able to reduce large datasets to a smaller number of codebook vectors. Can handle data with missing values. The generated model can be updated incrementally. Not limited in the number of dimensions in the codebook vectors like nearest national techniques. Easy to implementation and more accurate.
Performance Rate
Sounds (word: twenty) Laila Mona Mariam Fatema
S1
S1
S1
S1
S1
S2
S2
S4
S2
S3
S3
S3
S3
S3
S2
S4
S4
S2
S1
S4
100%
50%
75%
50%
1 2 3 4 5 6
matches with speaker 1 Not Match matches with speaker 3 matches with speaker 4 Not Match Not Match
Conclusion