Vous êtes sur la page 1sur 15

CVSR COLLEGE OF ENGINEERING

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


TECHNICAL SEMINAR ON

SPEECH TO TEXT CONVERSION


BY

Y.RAJENDER REDDY(08H61A04C5)

INTRODUCTION
Speech recognition is the process of capturing spoken words using microphone or telephone and converting them into a digitally stored set of words

Speech to text conversion is one of the application of speech recognition

Speech-to-text system improves accessibility by providing data entry


options for blind, deaf, or physically handicapped users.

BLOCK DIAGRAM

Speech acquisition
Through Microphone

Speech preprocessing

Hidden Marcov model

Text storage

External Hardware

SPEECH ACQUISITION
The microphone input port with the audio codec receives the signal, amplifies it,

and converts it into 16-bit PCM digital samples at a sampling rate of 8 KHz

. The system needs a parallel/serial interface to the Nios II processor and an

application running on the processor that acquires and stores data in memory.

The received samples are stored into memory on the Altera Development and

Education (DE2) board.

SPEECH PREPROCESSING
Preprocessing involves taking the speech samples as input, blocking the samples into frames, and returning a unique pattern for each sample. The unique pattern can be achived by following steps 1. The digital samples are divided into overlapped frames.

2. The system checks the frames for voice activity using endpoint detection and energy threshold calculations.

CONTINUE
3. The speech samples are passed through a pre-emphasis filter.

4.The frames with voice activity are passed through a Hamming window.

5. The system finds linear predictive coding (LPC) coefficients for frames .

6. From the LPC coefficients, the system determines the cepstral coefficients The cepstral coefficients serve as feature vectors.

HIDDEN MARCOV MODEL


Hidden Marcov Model is used for speech recognition, which converts speech to
text

This model consists of three steps


Training HMM-Based recognition Digit Models

TRAINING
An important part of speech-to-text conversion using pattern recognition is training.

Training involves creating a pattern representative of the features of a class


using one or more test patterns that correspond to speech sounds of the same class.

HMM-BASED RECOGNITION
Recognition is the process of comparing the unknown test pattern with each sound class reference pattern and computing a measure of

similarity between the test pattern and each reference pattern

DIGIT MODELS
The input speech sample is preprocessed and the feature vector is extracted.

Then, the index of nearest codebook vector for each frame is sent to all digit models.

The model with the maximum probability is chosen as the recognisied digit.

TEXT STORAGE

The Nios II processor on the DE2 board sends the digital speech data to a PC.

A target program running on the PC receives the text and writes it to the disk.

APPLICATIONS
Interactive voice response system (IVRS) Voice-dialing in mobile phones and telephones Hands-free dialing in wireless bluetooth headsets PIN and numeric password entry modules Automated teller machines (ATMs)

REFERENCES
1. Topic taken from seminartopics.co.in/ece-seminar-topics/

2. Garg, Mohit. Linear Prediction Algorithms. Indian Institute of Technology, Bombay, India, Apr 2003.

3. Li, Gongjun and Taiyi Huang. An Improved Training Algorithm in HmmBased Speech Recognition.National Laboratory of Pattern Recognition. Chinese Academy of Sciences, Beijing.

4. Altera Nios ii Document

Vous aimerez peut-être aussi