Académique Documents
Professionnel Documents
Culture Documents
Y.RAJENDER REDDY(08H61A04C5)
INTRODUCTION
Speech recognition is the process of capturing spoken words using microphone or telephone and converting them into a digitally stored set of words
BLOCK DIAGRAM
Speech acquisition
Through Microphone
Speech preprocessing
Text storage
External Hardware
SPEECH ACQUISITION
The microphone input port with the audio codec receives the signal, amplifies it,
and converts it into 16-bit PCM digital samples at a sampling rate of 8 KHz
application running on the processor that acquires and stores data in memory.
The received samples are stored into memory on the Altera Development and
SPEECH PREPROCESSING
Preprocessing involves taking the speech samples as input, blocking the samples into frames, and returning a unique pattern for each sample. The unique pattern can be achived by following steps 1. The digital samples are divided into overlapped frames.
2. The system checks the frames for voice activity using endpoint detection and energy threshold calculations.
CONTINUE
3. The speech samples are passed through a pre-emphasis filter.
4.The frames with voice activity are passed through a Hamming window.
5. The system finds linear predictive coding (LPC) coefficients for frames .
6. From the LPC coefficients, the system determines the cepstral coefficients The cepstral coefficients serve as feature vectors.
TRAINING
An important part of speech-to-text conversion using pattern recognition is training.
HMM-BASED RECOGNITION
Recognition is the process of comparing the unknown test pattern with each sound class reference pattern and computing a measure of
DIGIT MODELS
The input speech sample is preprocessed and the feature vector is extracted.
Then, the index of nearest codebook vector for each frame is sent to all digit models.
The model with the maximum probability is chosen as the recognisied digit.
TEXT STORAGE
The Nios II processor on the DE2 board sends the digital speech data to a PC.
A target program running on the PC receives the text and writes it to the disk.
APPLICATIONS
Interactive voice response system (IVRS) Voice-dialing in mobile phones and telephones Hands-free dialing in wireless bluetooth headsets PIN and numeric password entry modules Automated teller machines (ATMs)
REFERENCES
1. Topic taken from seminartopics.co.in/ece-seminar-topics/
2. Garg, Mohit. Linear Prediction Algorithms. Indian Institute of Technology, Bombay, India, Apr 2003.
3. Li, Gongjun and Taiyi Huang. An Improved Training Algorithm in HmmBased Speech Recognition.National Laboratory of Pattern Recognition. Chinese Academy of Sciences, Beijing.