Voice Browser Original

Presented By Akhil Sajeendran R7A CS Roll No:05 Reg No : 10016175 SNGCE
Browser technology is changing very fast these days
and we are moving from the visual paradigm to the voice paradigm. Voice browser is the technology to enter this paradigm. A voice browser is a device which interprets a (voice) markup language and is capable of generating voice output and/or interpreting voice input, and possibly other input/output modalities.
A voice browser is a device :

that interprets voice input and interprets voice markup
languages to generate voice output.
that interprets a script which specifies exactly what to
verbally present to the user as well as when to present each piece of information.
Time frame: 1998 to ??
Hands-free accessing of web.

Pragmatic interface for functionally blind users.
Speech Recognition
Speech Synthesis
Voice input VoXML file Text

Automatic speech recognition is the process by which
a computer maps an acoustic speech signal to text.

Speech is first digitized and then matched against a
dictionary of coded waveforms. The matches are converted into text
Text VoXML file Output(Pre-recorded)
The specification defines a markup language for
users via a combination of prerecorded speech, synthetic speech and music. You can select voice characteristics (name, gender and age) and the speed, volume, pitch, and emphasis. There is also provision for overriding the synthesis engine's default pronunciation.
World Wide Web Consortium(W3C)
Voice Browser Working Group Speech Interface Framework
Established on 26 March 1999. Re-chartered through 31 January 2009.
W3C Team Contacts are Kazuyuki Ashimura and Matt
Womer. Co-chaired by Jim Larson and Scott McGlashan .
VoiceXML
Speech Synthesis Speech Recognition Speech Grammars Semantic Interpretation Stochastic Language Models
VoiceXML is a dialog markup language designed for
telephony applications, where users are restricted to voice and DTMF (touch tone) input.
Browser text.html
Web Server
text.vxml
Internet
The specification defines a markup language for prompting users via a combination of prerecorded speech, synthetic speech and music. We can select voice characteristics (name, gender and age) and the speed, volume, pitch, and emphasis. There is also provision for overriding the synthesis engine's default pronunciation.
Speech Grammars Speech Stochastic Language Models Semantic Interpretation
USER
In most cases, user prompts are very carefully designed to
encourage the user to answer in a form that matches context free grammar rules. Speech Grammars allow authors to specify rules covering the sequences of words that users are expected to say in particular contexts. These contexual clues allow the recognition engine to focus on likely utterances, improving the chances of a correct match.
In some applications it is appropriate to use open ended prompts (how can I help). In these cases, context free grammars are unuseful.
The solution is to use a stochastic language model. Such models specify the probability that one word occurs following certain others. The probabilities are computed from a collection of utterances collected from many users.
The recognition process matches an utterance to a
speech grammar, building a parse tree as a byproduct. There are two approaches to harvesting semantic results from the parse tree:
1. Annotating grammar rules with semantic interpretation tags. 2. Representing the result in XML.
It can be divided into three categories : Web Browsing

Limited information Access Spoken Dialog Systems
Browse any web pages using speech input.

Parsing for the purpose of voice recognition done
when the page is accessed. May or may not produce a voice feed back.
Useful information in limited domains like weather in
a city, checking stock updates etc. Audio feed back
Client-server architecture is used

Used for connecting to a remote server by a Java
applet(client). Examples are connecting to email servers
Voice is a very natural user interface which speeds up

browsing. Less space requirements. Portable voice browsers can also be implemented. Practical interface for functionally blind users. Users can browse web while keeping there hands and eyes for other jobs
Voice browsing will become visual(Multi-model) Can be integrated to an OS Integrated to every application.
Browser technology is changing very fast these days
and we are moving from the visual paradigm to the voice paradigm. Voice browser is the technology to enter this paradigm. Voice browser is a device which interpret voice input and generate voice output.
http://www.w3.org/standards/webofdevices/voice
http://xml.coverpages.org/ccxml.html http://reactos.ccp14.ac.uk/Voice/
http://www.w3.org/Voice/1998/Workshop/PhilJenkins
.html (for IBM)

Voice Browser Original

Transféré par

Informations du document

Description originale:

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Voice Browser Original

Transféré par

Droits d'auteur :

Formats disponibles

Presented By Akhil Sajeendran R7A CS Roll No:05 Reg No : 10016175 SNGCE

Browser technology is changing very fast these days

A voice browser is a device :

languages to generate voice output.

that interprets a script which specifies exactly what to

Time frame: 1998 to ??

Hands-free accessing of web.

Voice input VoXML file Text

a computer maps an acoustic speech signal to text.

dictionary of coded waveforms. The matches are converted into text

Text VoXML file Output(Pre-recorded)

The specification defines a markup language for

World Wide Web Consortium(W3C)

Voice Browser Working Group Speech Interface Framework

Established on 26 March 1999. Re-chartered through 31 January 2009.

W3C Team Contacts are Kazuyuki Ashimura and Matt

Womer. Co-chaired by Jim Larson and Scott McGlashan .

VoiceXML is a dialog markup language designed for

Speech Grammars Speech Stochastic Language Models Semantic Interpretation

In most cases, user prompts are very carefully designed to

The recognition process matches an utterance to a

It can be divided into three categories : Web Browsing

Browse any web pages using speech input.

Useful information in limited domains like weather in

a city, checking stock updates etc. Audio feed back

Client-server architecture is used

applet(client). Examples are connecting to email servers

Voice is a very natural user interface which speeds up

Browser technology is changing very fast these days

.html (for IBM)

Vous aimerez peut-être aussi