Vous êtes sur la page 1sur 4

MANIPAL INSTITUTE OF TECHNOLOGY

(A constituent Institute of MANIPAL UNIVERSITY)

MANIPAL - 576 104, KARNATAKA, INDIA

Synopsis

TITLE

SUBMITTED BY STUDENT NAME REG NO E-MAIL ADDRESS/PHONE

Under the Guidance of:

GUIDEs NAME
Department
Designation

Name of the Organisation

Details of the organisation (with postal address):

Name of Guide with contact details and email address:

Date of commencement of the project:

Signature of Guide:

1. Introduction:
1.1 Topic Language is man's most important means of communication and speech its primary medium. A speech signal is a complex combination of a variety of airborne pressure waveforms. This complex pattern must be detected by the human auditory system and decoded by the brain. This can be done by using a combination of audio and visual cues to perceive speech more effectively. The project aims to emulate this mechanism in human machine communication systems by exploiting the acoustic and visual properties of human speech. 1.2 Organization

2. Need for the project:


Current speech recognition engines employing only acoustic features are not 100% robust. Visual cues can be used to undermine the ambiguity in the auditory modality. Hence a flexible and reliable system for speech perception can be designed which finds a variety of applications in Dictation systems Voice Based Communications in tele-banking, voice mail, data-base query systems, information retrieval systems, etc System Control in automobiles, robotics, airplanes, etc Security systems for speaker verification

3. Objective:
Recognise 10 English words (speaker independent) with at least 90% accuracy in a noisy environment.

4. Methodology:
The project is carried out in into following parts Processing of Audio Signals o Detection of end points to demarcate word boundaries o Analysis of various acoustic features such as pitch and formants, energy and time difference of speech signals, etc. o Extraction of selected features Processing of Video Signals o Demarcate frames from the video sequence o Identify faces, and then lip regions o Extract features from the lip profile Recognition of Speech by synchronizing Audio and Visual Data

o Synchronize audio and video features for pattern recognition using standardized algorithms o Train the system to recognize the spoken word under adverse acoustic conditions.

5. Project Schedule:
January 2011 o Processing of audio signals o Feature extraction from the chosen training database o Pattern recognition and signature extraction from the features o Training the HMM with the training set February 2011 o Processing of video signals o Feature extraction from the chosen training database o Pattern recognition and signature extraction from the features March 2011 o Synchronize audio and video features for pattern recognition o Extension of training data set to 10 words April 2011 o Up gradation of system for speaker independent applications o Performance analysis by comparing results of audio-only approach with that of joint audio-visual approach May 2011 o Documentation

References:
1. Tsuhan Chen, "Audiovisual Speech Processing, Lip Reading and Lip synchronization", IEEE Signal Processing Magazine, January 2001. 2. R.Chellapa, C.L. Wilson and S. Sirohoey, Human and Machine Recognition of Faces : A survey, Proceedings of the IEEE, vol 83, no.5 May 1995

Vous aimerez peut-être aussi