Vous êtes sur la page 1sur 16

Final Year Project Presentation

Speech Recognition System

Submitted by
Aditya Sharma

Under the supervision of
Dr. Y N Singh
Associate Professor

Dept. of Computer Science and Engineering
Institute of Engineering and Technology, Lucknow
Problem Definition and Scope
The aim of this project is to build a speech recognition system for English
language. It is a connected word speech recognition system. The system will
receive speech input consisting of series of connected words and it will output
the text sequence corresponding to it. The system will use Continuous Hidden
Markov Model for acoustic modeling of the speech.
This project has the speech recognizing capabilities. It is designed to work in
noise as well as user constrained environments. This software also can
recognize a word and convert it into text which can be linked with an action
such as execution of a command, opening a program, sending mails etc. It can
recognize single word as well as sequence of connected words separated by a
small pause.
Design of the System
Word detection and Extraction
In speech recognition it is important to detect when a word is spoken. The
system does detects the region of silence. Anything other than silence is
considered as a spoken word by the system. The system uses energy pattern
present in the sound signal and zero crossing rate to detect the silent region.
Taking both of them is important as only energy tends to miss some parts of
sounds which are important. This process is also called Voice Activity
Feature Extraction
Acoustic Modelling
Acoustic modelling is a process of representing a sound using probabilistic
mathematical models.
There are two types of acoustic modelling :
1. Word Model
2. Phone Model
Word models are generally used to small vocabulary systems. In this model
the words are modelled as whole.
In phone model instead of modelling the whole word, we model only parts of
words generally phones. And the word itself is modelled as sequence of
Word Model
Phone Model
Hidden Markov Model
HMM and Speech Recognition
While using HMM for recognition, we provide the occurrences to the model
and it returns a number. This number is the probability with which the model
could have produced the output (occurrences). In speech recognition
occurrences are feature vectors rather than just symbols. Hence for each
occurrence, feature vector has a group of real numbers. Thus, what we need
for speech recognition is a Continuous, Multi-dimensional HMM.
The feature vectors can be quantized by clustering and making vector
codebook. In this project they are modelled into a Gaussian Mixture Model
which has two parameters, mean and variance.
Main Interface
Type of condition No of samples Correct Output Accuracy (%)
Known user and known environment 30 30 100%
Known user and unknown environment 20 14 80%
Unknown user and known environment 25 12 48%
Unknown user and unknown environment 15 3 20%
The trained model was used to recognize the speech.
The recognizer gave best performance when the user and the environment
both were known to the recognizer.
It gave the worst performance when the user and the environment both were
unknown. This happened because the models were trained in constrained
environment with only one user. This problem can be solved by including more
training data from different persons and in different environments.
Other cases produced average results
Thank You