Speech Recognition Presentation

SPEECH RECOGNITION
FOR MOBILE SYSTEMS

BY:
PRATIBHA CHANNAMSETTY
SHRUTHI SAMBASIVAN
Introduction
What is speech recognition?
Automatic speech recognition(ASR) is the process by
which a computer maps an acoustic speech signal to text.
CLASSIFICATION OF SPEECH RECOGNITION

SYSTEM
Users
- Speaker dependent system
- Speaker independent system
-Speaker adaptive system
Vocabulary
-small vocabulary : tens of word
-medium vocabulary : hundreds of words
-large vocabulary : thousands of words
-very-large vocabulary : tens of thousands of
words.
CLASSIFICATION OF SPEECH RECOGNITION

SYSTEM
Word Pattern
- isolated-word system : single words at a time
- continuous speech system : words are connected
together
HOW SPEECH RECOGNITION WORKS
APPLICATIONS
Healthcare
Military
Helicopters
Training air traffic controllers
Telephony and other domains
WHY SPEECH RECOGNITION?

Speech is the easiest and most common way for people
to communicate.
Speech is also faster than typing on a keypad and more
expressive than clicking on a menu item.
Users with low literacy.
Cellphones have widely proliferated the market.
CHALLENGES ON MOBILE DEVICES

Limited available storage space
Cheap and variable microphones
No hardware support for floating point arithmetic
Low processor clock-frequency
Small cache of 8-32 KB
Highly variable and challenging acoustic environments ranging
from heavy background traffic noises to a small room with
reverberation of multiple speakers speaking simultaneously
Consume a lot of energy during algorithm execution
ASR MODELS
Embedded speech recognition
Speech recognition in the cloud
Distributed speech recognition
Shared speech recognition with user based
adaptation(proposed model of use)
EMBEDDED MOBILE SPEECH RECOGNITION
EMBEDDED MOBILE SPEECH

RECOGNITION
Advantages
Not rely on any communication with a central server
Cost effective
Not affected by the latency
EMBEDDED MOBILE SPEECH RECOGNITION

Disadvantages
Cannot perform complex computations
Lack in terms of speed and memory
To achieve reliable performance, modifications
need to be made to every sub-system of the ASR to take
both factors into account.
SPEECH RECOGNITION IN THE CLOUD

Advantages
Improves speed and accuracy
It provides an easy way to upgrade or modify
the central speech recognition system.

It can be used for speech recognition with
low-end mobile devices such as cheap cellphones.

Disadvantages
Performance degradation
Acoustic models on the central server need to account for
large variations in the different channels.
Each data transfer over the telephone network can cost
money for the end user.
DISTRIBUTED SPEECH RECOGNITION

Advantages
Does not really need high quality speech
Improve word error rates

Disadvantages
The major disadvantage of this mode still remains cost and
the need of continuous and reliable cellular connection,.
Theres a need for standardized feature extraction
processes that account for variability's arising due to
differences in
channel , multi-linguality, variable accents,
and gender differences, etc.
SHARED SPEECH RECOGNITION

WITH USER BASED ADAPTATION
SHARED SPEECH RECOGNITION WITH USER

BASED ADAPTATION
Advantages
The ability to function even without network connectivity.
Works well for the limited set of conditions it encounters.
It can be covered successfully by existing mobile devices, if
trained or adapted accordingly.
Server capacity has to be provided only for average, not
peak use.
Speech recognition Process in detail
Front-end Process
Involves spectral analysis that derives feature vectors to
capture salient spectral characteristics of speech input.
Backend Process
Combines word-level matching and sentence-level search
to perform an inverse operation to decode the message
from the speech waveform.
Acoustic model
Provides a method of calculating the likelihood of any
feature vector sequence Y given a word W.
Each phone is represented by a HMM.
Language Model
The purpose of the language model is to take advantage of
linguistic constraints to compute the probability of different
word sequences
Assuming a sequence of words, ={1,2,,k}, the
probability () can be expanded as
()=(1,2,,k)
We generally make the simplifying assumption that any
word depends only on the previous 1 words in the
sequence
This is known as an N-gram model
Grammars Use context free grammars represented by
Finite State Automata (FSA)
Statistical Speech recognition model
Overview of Statistical Speech recognition
Statistical Speech recognition model

Word sequence is postulated and the language model
computes its probability.
Each word is converted into sounds or phones using
pronunciation dictionary.
Each phoneme has a corresponding statistical Hidden
Markov Model (HMM).
HMM of each phoneme is concatenated to form word model
and the likelihood of the data given the word sequence is
computed.
This process is repeated for many word sequences and the
best is chosen as the output.
Speech recognition on embedded

platforms
Embedded ASR can be deployed either locally or in a
distributed environment with both advantages and
disadvantages.
For LVCSR, embedded devices are limited in terms of CPU
power and amount of memory.
Most importantly, speed is a limiting factor.
Decoding algorithm
Asynchronous stack based decoder memory efficient but
complex.
Viterbi based decoder most efficient.
3 types of search implementation
Combination of static graph and static search space
Static graph space with dynamic search space
Dynamic graph
Mobile speech frameworks
Nuance - Dragon mobile SDK

Openears
Sphinx
CeedVocal SDK
Vlingo
Dragon Mobile SDK

The Dragon Mobile SDK provides speech recognition and textto-speech functionality.
The Speech Kit framework provides the classes necessary to
perform network-based speech recognition and text-tospeech synthesis.
It uses SystemConfiguration and AudioToolbox frameworks.
Speech kit architecture
OpenEars
OpenEars is an iOS framework for iPhone voice recognition
and speech synthesis (TTS).
It uses the open source CMU Pocketsphinx, CMU Flite, and
CMUCLMTK libraries.
OpenEars works by doing the recognition inside the iPhone
without using the network.
Sphinx
CMU Sphinx is a open source toolkit for speech recognition
developed by Carnegie Melon University.
CMU Sphinx is a speaker-independent large vocabulary
continuous speech recognizer.
Pocketsphinx lightweight recognizer library written in C.
Sphinx4 adjustable, modifiable recognizer written in
Java.
CeedVocal SDK
CeedVocal SDK is a isolated word speech recognition SDK for
iOS.
It operates locally on the device and supports 6 languages :
English, French, German, Dutch, Spanish and Italian.
Mobile applications using speech

recognition
Google now
Siri
S-Voice
Dragon Search
Dragon Dictation
Trippo-Mondo
Verbally
References
1. Rethinking Speech Recognition on Mobile Devices, Anuj Kumar, Anuj Tewari,
Seth Horrigan, Matthew Kam, Florian Metze and John Canny.
2. Towards large vocabulary ASR on embedded platforms, Miroslav Novak.
3. Speech Recognition: Statistical Methods, L R Rabiner, B-H Juang.
4. http://www.nuancemobiledeveloper.com, 9th April 2013.
5. http://cmusphinx.sourceforge.net , 9th April 2013.
6. http://www.politepix.com/openears.
7. http://www.creaceed.com/ceedvocal

Speech Recognition Presentation

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Speech Recognition Presentation

Transféré par

Droits d'auteur :

Formats disponibles

SPEECH RECOGNITION

FOR MOBILE SYSTEMS

CLASSIFICATION OF SPEECH RECOGNITION

CLASSIFICATION OF SPEECH RECOGNITION

HOW SPEECH RECOGNITION WORKS

WHY SPEECH RECOGNITION?

CHALLENGES ON MOBILE DEVICES

EMBEDDED MOBILE SPEECH RECOGNITION

EMBEDDED MOBILE SPEECH

EMBEDDED MOBILE SPEECH RECOGNITION

SPEECH RECOGNITION IN THE CLOUD

SPEECH RECOGNITION IN THE CLOUD

the central speech recognition system.

low-end mobile devices such as cheap cellphones.

SPEECH RECOGNITION IN THE CLOUD

DISTRIBUTED SPEECH RECOGNITION

DISTRIBUTED SPEECH RECOGNITION

DISTRIBUTED SPEECH RECOGNITION

SHARED SPEECH RECOGNITION

SHARED SPEECH RECOGNITION WITH USER

Speech recognition Process in detail

Statistical Speech recognition model

Overview of Statistical Speech recognition

Statistical Speech recognition model

Speech recognition on embedded

Mobile speech frameworks

Nuance - Dragon mobile SDK

Dragon Mobile SDK

Speech kit architecture

Mobile applications using speech

Vous aimerez peut-être aussi