Vous êtes sur la page 1sur 12

Application Note

Media Resource Control Protocol

MRCP V1 Client Library Users Guide

Application Note

MRCP V1 Client Library Users Guide

Executive Summary
This application note provides information about the classes and methods that make up the Media Resource Control Protocol (MRCP) Version 1 Client Library. The MRCP client library was developed to provide a starting point for integrating MRCP-based applications with the Dialogic communications product set in order to achieve next-generation solutions. The library is available for download with this users guide.

MRCP V1 Client Library Users Guide

Application Note

Table of Contents
Introduction........................................................................................................... 2 Library Overview .................................................................................................... 2 Library Classes ...................................................................................................... 2 MrcpClient Class ............................................................................................ 4 GetRequestStatus .......................................................................................... 4 InitializeMrcpSession ...................................................................................... 4 MrcpAckRecognize ........................................................................................ 5 MrcpDeneGrammar...................................................................................... 5 MrcpDescribe ................................................................................................ 5 MrcpRecognizeSpeech................................................................................... 5 MrcpRecognizeStop ....................................................................................... 6 MrcpSpeak ................................................................................................... 6 MrcpSpeakBargeIn ........................................................................................ 6 MrcpSpeakStop.............................................................................................. 7 MrcpStartRecognizeTimer............................................................................... 7 ShutdownTtsLoop............................................................................................ 7 TearDownMrcpSession ................................................................................... 7 ShutdownAsrLoop .......................................................................................... 7 GetRecognitionValue ...................................................................................... 7 GetListLock .................................................................................................... 8 UnlockList ...................................................................................................... 8 Summary .............................................................................................................. 8 Acronyms .............................................................................................................. 8 For More Information............................................................................................. 9

Application Note

MRCP V1 Client Library Users Guide

Introduction
This document serves as a reference for the Media Resource Control Protocol Version 1 (MRCP V1) Client Library (referred to in this application note as the client library) and is intended to help developers understand the classes and methods dened within it. Developers may choose to customize and extend the library to suit their individual requirements.

to be used in all applications using the library. Memory allocation and MRCP state tracking are not provided by this version of the library. Method denitions for each of the classes will provide additional information. The primary class instantiated by the client application is the MrcpClient class. This class contains the methods used to generate MRCP commands and receives MRCP results from an MRCP server. A single instantiation of this class will support both a TTS and an ASR session with the MRCP server. Session IDs and sequence number generation is provided within this class, reducing the applications need to track these values. See the MrcpClient section for additional information. The secondary class instantiated by the client application is the MrcpRecognize class. This class provides the means to set the large number of parameters required in the Mrcp recognition method (MrcpRecognizeSpeech) contained in the MrcpClient class. When the MrcpRecognize class is instantiated, default recognition parameters are set for the class members. Accessor functions are provided to set the class parameters to meet the recognition requirements of an application. See the MrcpRecognizeSpeech section for additional information. The remaining three classes are instantiated or used by the MrcpClient class. Direct access to the methods in each of these classes is not required by the client application. As a result, this application note does not document the methods available in these classes, but short descriptions are provided here. The rst of these classes is the MrcpRtpHandler class. This class handles the receipt and sending of Real-time Transport Protocol (RTP) audio to the MRCP server. This class provides the means to support MRCP on the Dialogic communications products that do not support RTP streaming. The second class, RtspPacket, is instantiated based on the version information supplied when instantiating the MrcpClient class. This class provides the RTSP signaling required in MRCP v1.0. MrcpClient was abstracted to support multiple signaling classes such as a Session Initiation Protocol (SIP) signaling class required by MRCP v2.0. The nal class, MrcpUtils, is a utilities class providing network communications and parsing methods used throughout the MrcpClient class. This class is not instantiated by the MrcpClient; rather, its methods are called directly.

Library Overview
The client library provides methods for Automatic Speech Recognition (ASR) and Text-To-Speech (TTS) support using the Media Resource Control Protocol (MRCP). Although the current version of the library supports MRCP v1.0, the MRCP v1.0 specic requirements have been abstracted to allow for support for future MRCP versions. A Zip le containing the client library can be downloaded. An MRCP Version 2 library, programmers guide, and users guide are available for download from the Dialogic web site. See the For More Information section for MRCP V1 and V2 client libraries and documentation. Additionally, the client library was developed for Windows platforms; however, platformspecic Application Programming Interfaces (APIs) were kept to a minimum to ensure easy porting to other operating systems like Linux. The client library does not provide the methods or means to support telephony operations, but rather relies on the application using the library for that capability. Status logging is provided via simple printf statements currently writing to the applications console window. Real Time Control Protocol (RTCP) support has not been included within the client library. Due to the nature of MRCP, methods execute asynchronously and, with the exception of MrcpSpeak and MrcpRecognizeSpeech, complete in a near synchronous time frame.

Library Classes
The client library consists of ve classes, two of which are instantiated within the application using the library (MrcpClient and MrcpRecognize). The remaining three classes (MrcpRtpHandler, RtspPacket, and MrcpUtils) are instantiated by the MRCP client class and are transparent to the hosting application. As a result, these classes will not be covered in this application note. The client library denes a namespace of MrcpRefLibByIntel, which needs
2

MRCP V1 Client Library Users Guide

Application Note

Figure 1. MRCP Client Library Classes

Application Note

MRCP V1 Client Library Users Guide

MrcpClient Class This class contains the methods used to send and receive MRCP commands to and from an MRCP server. Each method in the class returns a value to the calling application, which usually provides a status indication for that methods execution. Several of the methods are dened to contain default values commonly used in MRCPs processing of the request they generate.
Constructor Summary The constructor for the class takes parameters used to set class member values and governs the operation of the hosting application. Format
MrcpClient(const string& MrcpSrvrIpAddr, const string& MrcpSrvrPort, int nMrcpVersion) const string& MrcpSrvrIpAddr

Format
bool GetRequestStatus(const string& synthRec, int nReqSeq, bool bLoop) const string& synthRec

Values {synthesizer, recognizer} identify which MRCP session is getting the status
int nReqSeq

The sequence number returned by the method whose status is being tracked
bool bLoop

Default value is false can be used to loop indenitely


Returns Blocks until an updated status has been received from the MRCP server and then returns that status. Notes Useful in tracking the completion of the MrcpSpeak and MrcpRecognizeSpeech methods.

The dotted decimal IP address of the MRCP server with which this object will work
const string& MrcpSrvrPort

InitializeMrcpSession
Summary This method is called to establish an MRCP session with the MRCP server and should be called to establish both an ASR and TTS session. One MrcpClient object can support one TTS and ASR operation instance. The session ID generated by this method is tracked based on type. Format
int InitializeMrcpSession(const string& synthRec, const string& sClientIpAddress, const string sRtpPort, const string sPhone) const string& synthRec

This is the port number dened on the MRCP server to receive MRCP commands. Because MRCP v1.0 is RTSP-based, this port is usually dened as 554.
int nMrcpVersion

The default value for this eld is 1. The version number will be used to determine which signaling class to instantiate in the future. Currently, the value is used to set version information during the formatting of MRCP commands.
Destructor The destructor is responsible for deleting instantiated classes, triggering the shutdown of open sockets to the MRCP server, closing the mutex dened to protect the container classes used for audio transmission, and closing the socket subsystem.

String value {synthesizer, recognizer} used to provide synthesis and recognition sequence and session tracking
const string& sClientIpAddress

GetRequestStatus
Summary This method can be used to return the various status messages returned for longer processing asynchronous methods such as MrcpRecognizeSpeech and MrcpSpeak.

String dotted decimal (that is, 192.168.1.100) used to provide the IP address of the machine running the client application
const string sRtpPort

String containing the IP port number that will be opened to listen for audio port connection requests

MRCP V1 Client Library Users Guide

Application Note

const string sPhone

const string& sGrammar

Value used in the formatted messages sent to the MRCP server


Returns Status received from the MRCP server in response to initialization request. A value of 200 indicates success; all other values indicate failure. Notes This method must be called before any of the other client library methods can be called. A return value of anything other than 200 indicates an error. No additional methods should be called in an error condition. Generally, this method is called just prior to ASR or TTS functions and should be closely paired with the TearDownMrcpSession.

HTTP address that points to the grammar le to be opened and compiled


Returns Status received from the MRCP server in response to a dene grammar request. 200 indicates success; all other values indicate failure. Notes The URI command dened for use in this method will also be required in the recognition method. Grammar les are in an eXtensible Markup Language (XML) format and can have an xml or Grammar XML (GRXML) extension. If a URI has been specied, then that URI will be retained and used in MrcpRecognizeSpeech.

MrcpAckRecognize
Summary Used to acknowledge a recognition complete message to the MRCP server. Format
Int MrcpAckRecognize()

MrcpDescribe
Summary This method is used to request that the MRCP server send parameters congured for the current MRCP session. Format
int MrcpDescribe(string& sdpInfo, string synthRec) string& sdpInfo

Returns Status received from the MRCP server in response to an acknowledge recognize request. A value of 200 indicates success; all other values indicate failure. Notes This method may not be needed based on the MRCP server used. It is recommended that the vendor documentation be reviewed to determine the need to use this method.

Pointer to string in client application that will receive the parameter information returned by this method
string synthRec

MrcpDefineGrammar
Summary This method is used to identify the grammar that will be used in recognize the callers utterance. Format
int MrcpDefineGrammar(const string& sContType, const string& sContId, const string& sGrammar) const string& sContType

Value {synthesizer, recognizer} indicates for which MRCP session the information is requested
Returns Status received from the MRCP server in response to a describe request. A value of 200 indicates success; all other values indicate failure.

MrcpRecognizeSpeech
Summary Called before the audio is streamed to the MRCP server for recognition. Format
int MrcpRecognizeSpeech(MrcpRecognize& cRecognize, list<string> *rtpRecognizeAudio)

Identies the grammar content type text/uri-list is common


const string& sContId

Species the Uniform Resource Identier (URI) for the grammar. This URI is used later in the recognition method.

Application Note

MRCP V1 Client Library Users Guide

MrcpRecognize& cRecognize

const string& SpeakString

Address of the MrcpRecognize class instantiated within the client application


list<string> *rtpRecognizeAudio

Can be either a text string to be spoken or an HTTP pointer to a le to be streamed to the client application
list<string> *rtpAudio

A pointer to an STL list container that will be used to process RTP audio received from the MRCP server
Returns Status received from the MRCP server in response to a recognize speech request. Notes This method is asynchronous with a potential of a delay before processing completes. The GetRequestStatus method can be utilized to determine when this method has completed. Implementation of an MRCP state machine to track the state of this method is recommended. GetRecognitionValue can be utilized to return the value recognized from the callers utterance. If the MrcpDeneGrammar method has been called and an URI value set, it will be used in this method.

This is a pointer to a Standard Template Library (STL) list container class of strings. This container class must be dened in the client application. Audio received from the MrcpSpeak request will be stored in this container as it is received.
const string sKillOnBarge {true, false}

Used to indicate to the MRCP server whether or not the audio stream sent can be barged in on. Default = true.
const string sContentType

Used to indicate the format of the request to the MRCP server. Default = application/synthesis+ssml.
Returns Status received from the MRCP server in response to a speak request. A value of 200 indicates success; all other values indicate failure. Notes The user application is responsible for dening the STL list container that will be used to store audio buffers received from the MRCP server. A locking method is provided to safeguard this list of strings. Content type values other than the default used by this method may be possible. The default value provides support for both TTS and streaming le play.

MrcpRecognizeStop
Summary Used to signal the MRCP server to stop its recognition processing. Format
int MrcpRecognizeStop()

Returns Status received from the MRCP server in response to a recognize stop request. A value of 200 indicates success; all other values indicate failure. Notes A likely use for this method would be a caller abandoning a session while speech recognition was underway.

MrcpSpeakBargeIn
Summary This method is used to signal the MRCP server that a barge-in has occurred. Format
int MrcpSpeakBargeIn()

MrcpSpeak
Summary This method generates audio from the MRCP server for delivery to the client application. Both TTS and streaming le play are supported by this method. Format
int MrcpSpeak(const string& SpeakString, list<string> *rtpAudio, const string sKillOnBarge, const string sContentType)

Returns Status received from the MRCP server in response to a speak barge-in request. A value of 200 indicates success; all other values indicate failure. Notes This method would be used if barge-in was used to halt an audio stream from the MRCP server and if an APIbased event processor from Dialogic indicated a barge-in condition had occurred.

MRCP V1 Client Library Users Guide

Application Note

MrcpSpeakStop Summary This method is used to halt the audio stream (TTS or streaming le play) being sent from the MRCP server. Format
int MrcpSpeakStop()

separate thread within the MrcpRtpHandler class. Its important to signal a shutdown of the loop when not in use to avoid spiking the CPU utilization. TearDownMrcpSession
Summary This method is called to close or tear down an MRCP client session with the MRCP server. Format
int TearDownMrcpSession (const string& synthRec) const string& synthRec

Returns Status received from the MRCP server in response to a speak stop request. A value of 200 indicates success; all other values indicate failure. Notes This method differs from MrcpSpeakBargeIn and would likely be used as part of a cleanup operation if a caller hangs up or abandons the call.

Indicates which session (ASR or TTS) should be closed with the MRCP server
Returns Status received from the MRCP server in response to a tear down request. A value of 200 indicates success; all other values indicate failure. Notes Generally, this method is called just after the ASR or TTS function has completed and should be closely paired with the InitializeMrcpSession.

MrcpStartRecognizeTimer
Summary This method is called to start the timers used to control the collection of caller utterances. Timers include silence timers and duration timers. The default timers are set via the MrcpRecognize class instantiation. Accessors allow the values to be changed. Format
int MrcpStartRecognizeTimer()

ShutdownAsrLoop
Summary Used to shut down the innite loop running in the MrcpRtpHandler class to apply RTP headers and send outbound RTP packets. Format
void ShutdownAsrLoop (bool bSignal) bool bSignal

Returns Status received from the MRCP server in response to a start recognize timer request. A value of 200 indicates success; all other values indicate failure. Notes Should be called immediately after the MrcpRecognizeSpeech method.

Set to true by inline method


Notes ASR RTP packets are processed from the STL list string container in an innite loop in a separate thread within the MrcpRtpHandler class. Its important to signal a shutdown of the loop when not in use to avoid spiking the CPU utilization.

ShutdownTtsLoop
Summary Used to shut down the innite loop running in the MrcpRtpHandler class to collect inbound RTP packets. Format
void ShutdownTtsLoop(bool bSignal)

GetRecognitionValue
bool bSignal

Set to true by inline method


Notes TTS RTP packets are collected in an innite loop in a

Summary This method parses the recognition complete packet from the MRCP server and returns the identied callers utterance.
7

Application Note

MRCP V1 Client Library Users Guide

Format
string GetRecognitionValue(const string& sResult) const string& sResult

int where

Unique numerical value that could be used to identify which application is holding the lock
Returns True, if the lock is released; otherwise, false. Notes Locking and unlocking the STL list of strings container is important and useful. Due to the real-time nature of the RTP stream, hold the lock for a minimum of instructions.

String returned when recognition completes


Returns Returns the recognition value returned in the recognition complete message. Notes Parses the returned message for the recognition complete status that identies the utterance match based on the grammar dened for the session. Works off of the XML tags to identify the value.

Summary
The adoption of MRCP as a standard appears to be well underway with a signicant percentage of speech and equipment vendors offering full support for the protocol. Benets to adopters of this technology include:
Not being locked into one vendor Applications

GetListLock
Summary Method should be used to lock the STL list string containers used for RTP stream processing. Format
bool GetListLock(unsigned int delay, bool timeout_ok) unsigned int delay

written using an MRCP approach should run seamlessly on all vendor platforms supporting the standard
Provides a migration strategy from circuit-switched

solutions to VoIP-based solutions


Close alignment with the XML standard lowers the

entry barriers to the communications market The MRCP class library provided in conjunction with this application note was developed as a starting point for users of the Dialogic communications product set who wish to migrate to MRCP implementations.

The time to wait to acquire the lock for the list


bool timeout_ok

Indicates whether or not to continue to wait for the lock if a timeout has occurred. Default = false
Returns True, if the lock was obtained; otherwise, false.

Acronyms
API Application Programming Interface Automatic Speech Recognition Grammar XML Hypertext Transfer Protocol Media Resource Control Protocol Real Time Control Protocol Real-time Transport Protocol Session Initiation Protocol Standard Template Library Text-To-Speech Uniform Resource Identier eXtensible Markup Language ASR GRXML HTTP MRCP RTCP RTP SIP STL TTS URI XML

Notes Locking and unlocking the STL list of strings container is important and useful. Due to the real-time nature of the RTP stream, hold the lock for a minimum of instructions. UnlockList
Summary This method is used to release the lock for the STL list of strings container. Format
bool UnlockList(int where)

MRCP V1 Client Library Users Guide

Application Note

For More Information


Implementing a Media Resource Control Protocol (MRCP) Client Application with Dialogic Telecommunications Products http://www.dialogic.com/goto/?9591 A Zip le, MRCP V1 Client Library, containing the source code for the MRCP Client Library V1 can be downloaded at http://www.dialogic.com/goto/?10568 MRCP V2 Client Library Users Guide http://www.dialogic.com/goto/?10285 MRCP V2 Client Library Programmers Guide http://www.dialogic.com/goto/?10284 A Zip le, MRCP V2 Client Library, containing the source code for the MRCP Client Library V2 can be downloaded at http://sourceforge.net/projects/ openmrcpclient

To learn more, visit our site on the World Wide Web at http://www.dialogic.com. Dialogic Corporation 9800 Cavendish Blvd., 5th oor Montreal, Quebec CANADA H4M 2V9
INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH PRODUCTS OF DIALOGIC CORPORATION OR ITS SUBSIDIARIES (DIALOGIC). NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN A SIGNED AGREEMENT BETWEEN YOU AND DIALOGIC, DIALOGIC ASSUMES NO LIABILITY WHATSOEVER, AND DIALOGIC DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF DIALOGIC PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY INTELLECTUAL PROPERTY RIGHT OF A THIRD PARTY. Dialogic products are not intended for use in medical, life saving, life sustaining, critical control or safety systems, or in nuclear facility applications. Dialogic may make changes to specications, product descriptions, and plans at any time, without notice. Dialogic is a registered trademark of Dialogic Corporation. Dialogics trademarks may be used publicly only with permission from Dialogic. Such permission may only be granted by Dialogics legal department at 9800 Cavendish Blvd., 5th Floor, Montreal, Quebec, Canada H4M 2V9. Any authorized use of Dialogics trademarks will be subject to full respect of the trademark guidelines published by Dialogic from time to time and any use of Dialogics trademarks requires proper acknowledgement. Windows is a registered trademark of Microsoft Corporation in the United States and/or other countries. Other names of actual companies and products mentioned herein are the trademarks of their respective owners. Dialogic encourages all users of its products to procure all necessary intellectual property licenses required to implement their concepts or applications, which licenses may vary from country to country. Copyright 2007 Dialogic Corporation All rights reserved. 09/07 9603-02

www.dialogic.com

Vous aimerez peut-être aussi