An Application That Talks and Listens (Duites)

AN APPLICATION THAT TALKS AND LISTENS (MANIPULATING TEXT FOR MOBILE SHORT MESSAGING SYSTEM)
DEEP XAVIER T. DUITES
SUBMITTED TO THE FACULTY OF THE COLLEGE OF COMPUTER STUDIES CEBU INSTITUTE OF TECHNOLOGY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF
MASTER IN INFORMATION TECHNOLOGY
OCTOBER 2008
ii
ABSTRACT
Text-To-Speech and Speech-To-Text (TTS&STT) application was a tool that converts SMS text (also known as SMS Language, or Textese, or TxtSpk1) into speech and vice versa. It used the English language as its base language. The conversion process used an SMS dictionary of words which were stored in a database file, to contrast every SMS text to an English word or an English word to an SMS text.
TTS&STT was implemented using Microsoft .NET technology and Microsoft Speech Application's Programming Interface (SAPI). SAPI aided in the conversion processes of both speech recognition and text to speech. The conversion technology was built-in of the Speech API and would not be part of the development of this project.
Aside from speech recognition and reading out of text (from a text file, WAV file, or text typed in the user interface) in which the TTS&STT application was capable of, it could also produce a converted SMS text into a text file or into a WAV file. With these types of outputs produced by TTS&STT application, it could become usable to other applications that rely on speech recognition and text to speech.
SMS Language http://wapedia.mobi/en/SMS_language
iii
ACKNOWLEDGMENT
I would like to express my sincerest thanks to the following for extending their support since the start until the completion of this work: My God, the Almighty Father, who has always been my source of life, wisdom and will. For without Him, I could not have completed this work. My wife, Gilda, for being there always to give encouragement and love. My son, Marcus, for being there to give the smile that fills my heart with so much joy. My daughter, Margaret, for being there to give the sweetest smile that fills my day with so much love and affection. My dean, Prof. Cherry Lyn C. Sta. Romana, for having been one of the persons who encouraged me to finishing this work. My adviser, Prof. Larmie T. Santos-Feliscuzo, for always has been very supportive of me. My mentors in the graduate studies, for their consideration, patience, and understanding. My colleagues in the College of Computer Studies, for their words of encouragement and motivation. My classmates in the graduate studies, for their shared time and thoughts.
iv
TABLE OF CONTENTS 1. INTRODUCTION .............................................................................................. 1 1.1 Background of the Study ............................................................................ 1 1.2 Objectives of the Study .............................................................................. 2 1.3 Significance of the Study ............................................................................ 3 1.4 Scope and Limitations ................................................................................ 3 2. REVIEW OF RELATED LITERATURE ............................................................. 5 2.1 Yap: Voice-To-Text Translation on Your Cell ............................................. 5 2.2 Jott: Speech-to-Text Mobile Interface ........................................................ 5 2.3 NMS Communications Voice SMS ............................................................ 6 2.4 SpinVox ...................................................................................................... 7 2.5 IBM ViaVoice.............................................................................................. 8 2.6 Microsoft Voice Command ......................................................................... 8 3. MATERIALS AND METHODOLOGIES .......................................................... 10 3.1 Project Budget and Materials ................................................................... 10 3.2 Work Schedule ......................................................................................... 11 3.3 Implementation Plan ................................................................................ 11 4. THEORITICAL FRAMEWORK ....................................................................... 12 4.1 TTS&STT Application............................................................................... 12
4.1.1 Speech-To-Text Interface................................................................ 12 4.1.2 Text-To-Speech Interface................................................................ 14 5. CONCLUSIONS AND RECOMMENDATIONS ............................................... 16 APPENDIX A: USERS MANUAL ..................................................................... A-1 A.1 The Installer Package and the Installation Instructions .......................... A-1 Installation Instructions: .......................................................................... A-1 A.2 Getting Started ....................................................................................... A-2 A.3 Using the Speech-To-Text Interface ...................................................... A-4 Steps in Interacting with the Speech-To-Text Application: ..................... A-8 A.4 Using the Text-To-Speech Interface ...................................................... A-8 Steps in Interacting with the Text-To-Speech Application: ................... A-11 A.5 Using the SMS Dictionary Interface ..................................................... A-12 APPENDIX B: PROGRAM LISTINGS BIBLIOGRAPHY
vi
LIST OF FIGURES Figure 1: The basic steps to convert speech to text ............................................ 12 Figure 2: STT's System Architecture .................................................................. 13 Figure 3: The basic steps to convert text to speech ............................................ 14 Figure 4: TTS' System Architecture .................................................................... 15 Figure 5: Contents of the SpeechUiSetup folder............................................... A-1 Figure 6: Contents of the Debug folder ............................................................. A-1 Figure 7: Contents of the SpeechUiSetup folder in Program Files.................... A-2 Figure 8: TTS&STT's main graphical user interface ......................................... A-2 Figure 9: The Launch ASR shortcut button ....................................................... A-4 Figure 10: The STT's user interface.................................................................. A-5 Figure 11: The Launch TTS shortcut button ..................................................... A-8 Figure 12: The TTS user interface ................................................................... A-9 Figure 13: View menu of the STT application ................................................. A-12 Figure 14: View menu of the TTS application ................................................. A-12 Figure 15: The SMS Dictionary interface ........................................................ A-12
vii
LIST OF TABLES Table 1: Materials and Costs .............................................................................. 10 Table 2: Gantt Chart ........................................................................................... 11 Table 3: Main GUI's menu ................................................................................ A-3 Table 4: Menu items of the drop-down File menu ............................................. A-4 Table 5: Menu item of the drop-down Help menu ............................................. A-4 Table 6: The STT's main menu ......................................................................... A-6 Table 7: Menu items of the drop-down File menu of the STT application ......... A-7 Table 8: Menu items of the drop-down Edit menu of the STT application ......... A-7 Table 9: Menu item of the drop-down View menu of the STT application ......... A-7 Table 10: Menu item of the drop-down Help menu of the STT application ....... A-7 Table 11: The TTS main menu ...................................................................... A-10 Table 12: Menu items of the drop-down File menu of the TTS application ..... A-11 Table 13: Menu item of the drop-down View menu of the TTS application ..... A-11 Table 14: Menu item of the drop-down Help menu of the TTS application ..... A-11
1. INTRODUCTION
1.1 Background of the Study For the past several decades, designers have processed speech for a wide variety of applications ranging from mobile communications to automatic reading machines. Speech recognition reduces the overhead caused by alternate communication methods. Speech has not been used much in the field of electronics and computers due to the complexity and variety of speech signals and sounds. However, with modern processes, algorithms, and methods we can process speech signals easily and recognize the text. Voice recognition software (VRS), also known as speech-recognition, automatic speech recognition, ASR or natural language recognition software, converts your voice to text on a computer. In essence, what this means is that you can create text files without typing. When you speak through a microphone (most voice-recognition software includes this accessory), the software translates the sounds into written words. It is initially time consuming, but accuracy and ease of use have now improved so much that it may finally be worth the investment of time and money. Many first-generation voice-recognition packages used discrete speech technology, which meant you had to pause between words in order for the computer to understand them. The latest generation uses continuous speech technology, which allows you to speak more naturally. All require an enrollment process, during which you sit at the computer and read sample
text out loud to help train the speech recognition software to understand your voice patterns. While most voice-recognition software users want a large vocabulary in their software package, the larger the vocabulary, the more time-consuming the training process. Thats because many words sound similar, and you need to train the recognition system to understand how you pronounce them. 1.2 Objectives of the Study This study aims to develop an application software that is capable of performing the following: Convert speech or voice data into SMS text. Convert SMS text into voice data.
In addition, the following specific objectives must also be achieved during the project development and completion: To create a desktop application that is a precursor to a mobile application (since no mobile technology as of this project is developed that is available publicly that is capable of doing the abovementioned objectives.) To design an application interface that will convert voice or speech data into SMS text. To design an application interface that will convert SMS text into voice or speech data. To be able to implement the designed system using Microsoft Visual Basic programming of the .NET technology.
To come up with an application that handles both speech-to-text and text-to-speech.
1.3 Significance of the Study In the innovation of wireless devices these days, mobile communications is greatly a part of it. In mobile communications, text messaging (SMS) is so prevalent. In reality, text messaging has never come too easy; we have to press a lot of buttons in our cell phones just to create a single text message but though to some people it is not much of an agony but to the many, it is. So, this study aims to develop a system that makes text messaging a little easier a system that will transcribe voice or speech data into text message and vice versa. By this process, text messaging may have been a convenience to the many. 1.4 Scope and Limitations The study is only focused to developing a system that is capable of converting voice or speech data into a text message and vice versa. The following will be the main focus of the development of the study: the task to design the interfaces for the conversion processes; the conversion mechanisms for both speech-to-text and text-to-speech; and the definition of the SMS dictionary of words. The following lists the limitations of this study: The software cannot convert a saved or recorded audio message into a text message.
The software cannot extract text messages from a recorded or saved audio file.
The software cannot speak out other formats of audio files but WAV files.
The software cannot convert a recognized word into an SMS text if the recognized word is not present in the defined SMS dictionary.
The software is not capable of recognizing words of other languages but the English language.
The software is not capable of converting the recognized word into another language but the English language to form a text message.
2. REVIEW OF RELATED LITERATURE
The following numbered literatures serve as my basis for the development of this system; for these literatures are the closest applications to this developed system that are by-products of speech technologies used today. 2.1 Yap: Voice-To-Text Translation on Your Cell The system lets you send text messages just by talking into your phone. The software also allows you to query web services like Google, Wikipedia, or YouTube with nothing but your voice. It also provides voice-to-text translation services for mobile phones. Users can say anything they like and Yap will send a text copy to anyone of their contacts. The service is completely automated so you wont have intermediary Yap employees listening to your messages, typing them and then sending them out. They also have a text messaging application call Yap9 that allows you to keep in touch with friends, family, and co-workers. Users can also use the application to instantly query mobile web services just by talking. They can search Google, Wikipedia, Yahoo, and YouTube, or interact with Facebook without using their phones miniature keyboards.2 2.2 Jott: Speech-to-Text Mobile Interface Jott is an application from Jott Networks Inc. accessible via the phone, via web browser, text messaging, email, downloaded software, links created by
2
Kochanov, Ilya Yap: Voice-To-Text Translation On Your Cell http://www.crunchgear.com/2007/09/17/yap-voice-to-text-translation-on-your-cell/ http://www.techcrunch.com/2007/09/17/techcrunch-40-session-2-mobile-communications/
3rd parties, and 3rd party applications which incorporate the Jott Service via Application Programming Interfaces. Jott Networks operates a voice to text service that makes staying organized and in touch easy. Jott allows consumers to easily and safely send emails and text messages, set reminders, organize lists, and post to web services with their voice.3 2.3 NMS Communications Voice SMS Voice SMS is a fast way to send a short message to another mobile subscriber. In that sense its similar to an SMS text message. But Voice SMS is much easier to use. There are no keystrokes to compose the message, you just talk. When someone sends you a Voice SMS, you receive an SMS text message saying You have a Voice SMS from **someone in your phonebook**. Click here to listen to your message. One click and youre listening to the message. In some implementations, it says Dial *0* to listen to your message. Thats four clicks with the Send key, but its equally easy for users. There are additional advantages for Voice SMS over conventional text SMS. The text user interface on mobile phones is great if you use a European language, Kanji, or another widely practiced language. But once you get beyond the top twenty languages, there is little or no support for text messaging. And of course, text messages are of no use to people who are illiterate. Yet they easily learn to use Voice SMS.
Platzek, Dirk Jott: Speech-to-Text Mobile Interface http://jott.com/default.aspx http://www.wunschfeld.net/blog/2007/03/speech-to-text-mobile-interface.html
Voice SMS is perfect for when you want to communicate but dont need or dont want a live conversationfor example when you think the other party is likely to be asleep or in a noisy environment. Voice SMS is a convenient way to give them some information or initiate a non-real time conversation where the other party can delay their response until they are ready.4 2.4 SpinVox SpinVox captures spoken messages and cleverly converts them into text. It then delivers your message to a destination of your choice inbox, blog, wall or space. The captured spoken words will be fed into a Voice Message Conversion System, known as D2 (the Brain), and spits them out as text content. So D2s pretty smart. Hes bound to be, as hes a combination of artificial intelligence, voice recognition and natural linguistics. But he also learns from us humans. He learns all the time about how we speak, and what we say, from the mundane to the ridiculous. And even the smartest machines need a helping hand if they are to stay clever: help to understand and digest new words or phrases to ensure that he is converting what you mean to say. Over the past four years D2 has been chomping through our words, converting millions of messages from millions of different voices and accents, in English, French, Spanish and German. But now hes onto his main course and he wants to feast on your words to make him bigger and stronger.5
4 Turner, Brough Voice SMSCreating a Service People Will Adopt http://www.nmscommunications.com/News/NL/TIN/Jan2007/VoiceSMS.htm 5 Whats SpinVox All About Then?
2.5 IBM ViaVoice The system allows email and web navigation via voice command, meaning, you can use your voice to create, manage, and send email, chat on the Internet, command your browser, launch URLs and surf the Web. It has transcription support for digital handheld recorders. It has a new IBM speech engine with improved background noise adaptation which can result in greater dictation and voice command accuracy. It has over 300,000 vocabulary and backup dictionary words and you can add customized addresses, names, acronyms, terms and colloquialisms to the vocabulary.6 2.6 Microsoft Voice Command Voice Command transforms your Windows Mobile smart phone into your own virtual personal assistant, letting you use your voice to look up contacts, make phone calls, get calendar information, and play and control your music, as well as start programs. Voice Command makes it easier and more convenient than ever to take your digital lifestyle with you wherever you go. Because Voice Command has state-of-the-art speech technology, you will never have to prerecord important phone numbers or use difficult commands to access the information that you need. Simply select one of the many commands that is most natural to you and let Voice Command do
http://www.spinvox.com/homepage.html http://www.spinvox.com/how_it_works.html IBM Corporation IBM ViaVoice Advanced Edition Release 10 http://www.nuance.com/viavoice/advanced/
the rest. You can even ask, "What can I say?" at any time to help find an appropriate command.7
Microsoft Voice Command 1.6 http://www.microsoft.com/windowsmobile/voicecommand/features.mspx
10
3. MATERIALS AND METHODOLOGIES
3.1 Project Budget and Materials Shown below is the table of itemized list of tangible and intangible materials needed to develop and realize the system. Also shown are the costs of each respective material (in pesos) as of this writing: Material a. Personal Computer (1 unit) b. Microsoft Windows Vista c. Microsoft Office 2007 d. Microsoft .NET Framework 3.0 (free, available for download at microsoft.com) e. Microsoft Visual Studio 2008 Express Edition (free, available for download at microsoft.com) f. Standard headset (with microphone) g. Speakers Total Budget Cost
Table 1: Materials and Costs
Cost 32,000.00 7,500.00 6,500.00 0.00 0.00 200.00 300.00 46,500.00
11
3.2 Work Schedule The Gantt chart below shows the timeline of activities that is to be observed during the development of this system:
June 2008 System Analysis and Design Coding Testing Maintenance Documentation
Table 2: Gantt Chart
July 2008
Aug 2008
Sept 2008
Oct 2008
3.3 Implementation Plan This application software will be deployed in a standalone desktop with the software requirements based on Table 1: Materials and Costs. The speech-totext interface of the application will require a standard headset for speech input. The text-to-speech interface of the application will require a set of speakers for the audio output. The SMS dictionary of words will be stored into a database in which the user can add, update and delete words.
12
4. THEORITICAL FRAMEWORK
4.1
TTS&STT Application The TTS&STT application is designed and implemented using objectoriented approach. It uses the Microsoft Visual Basic .NET of the Microsoft .NET Framework as the programming language, Speech Application Programming Interface (SAPI) for speech recognition and synthesis, and Microsoft Access 2003 as its database tool.
4.1.1 Speech-To-Text Interface The Speech-To-Text interface of the TTS&STT application handles the speech recognition capability using the Speech APIs speech recognition engine. The speech recognition process of the speech-to-text interface is based on the following steps, as shown in Figure 1. Voice Input
SAPI SR Engine SS Engine
Text Output
Figure 1: The basic steps to convert speech to text
13
The conversion of speech to text follows the basic steps as shown in Figure 1. The detailed conversion of speech to text is shown in Figure 2.
Voice Input
Recognized Text
SMS text Database
Text Parser
SMS text manipulation Add word Delete word Update word Sort
Text Output
Figure 2: STT's System Architecture
In this conversion process, the user utters a word using a standard microphone and SAPI will recognize the word. Once recognized by SAPI, the recognized word will passed to the text parser and the text parser will
14
compare the recognized word into the SMS text dictionary and outputs the word.
4.1.2 Text-To-Speech Interface The Text-To-Speech interface of the TTS&STT application handles the text to speech conversion or the so called "speech synthesis" using the Speech API's speech synthesis engine. The speech synthesis process of the text-to-speech interface is based on the following steps, as shown in Figure 3. Text Input
Voice Output
Figure 3: The basic steps to convert text to speech
The conversion of text to speech follows the basic steps as shown in Figure 3. The detailed conversion of text to speech is shown in Figure 4.
15
Add word Delete word Update word Sort SMS text manipulation Text Input
SMS text Database
Text Parser
Parsed Text
Voice Output
Figure 4: TTS' System Architecture
In this conversion process, the text input will parsed by the text parser by comparing each word of the text input to the SMS text dictionary. Once parsed, each word of the parsed text will be analyzed by SAPI using the speech synthesis engine and outputs it as voice.
16
5. CONCLUSIONS AND RECOMMENDATIONS
The Speech API (Applications Programming Interface) for mobile applications has been used by a number of developers today. But it has been exclusively developed and distributed to mobile companies and its of no avail to anyone who might want to use the technology. The IDE (Integrated Development Environment) and other related tools for developing such applications are also exclusively distributed.
As of the development of this project, the public API for developing speech application for mobile is not yet available. Despite the limited resource and technology, this project was still pursued. The speech-to-text and text-to-speech interfaces that are supposed to be implemented in a mobile environment are developed instead in a desktop environment.
Due to the unavailability of public API for developing speech application for mobile, the project developer was not able to implement it in the intended target environment which is the mobile environment. Interested developers may implement the following suggestions for its improvement and suitability:
1. The speech-to-text and text-to-speech interfaces should be implemented in a mobile environment that supports speech technology.
17
2. The Speech API for mobile applications to be used should support both speech recognition and voice output.
3. The SMS dictionary of words should be well defined so that the messages it will create will be understandable and reliable.
4. The speech recognition process should be given attention and more training since it requires a so-much-like English or American accent for the engine to recognize a word. More so, constant training with your voice can improve the recognition capability of the engine.
A-1
APPENDIX A: USERS MANUAL
A.1 The Installer Package and the Installation Instructions The contents of the installation folder SpeechUiSetup are shown in Figure 5.
Figure 5: Contents of the SpeechUiSetup folder
The Debug folder, as shown in Figure 5, has the setup program to be used for the installation. The contents of the Debug folder are shown in Figure 6.
Figure 6: Contents of the Debug folder
Installation Instructions: 1. Run the setup.exe or SpeechUiSetup.exe file that is found inside the Debug folder in the installation folder (as shown in Figures 5 & 6). This will install the TTS&STT (Text-To-Speech & Speech-To-Text) program into the host PC. (This step assumes that the Microsoft .NET 3.0 or above has been installed into the host PC already.) 2. After running the setup program, an installation wizard will appear which will guide you through a successful installation process. Just leave the default settings as is and proceed with the installation. 3. After successful installation, you need to proceed to the final instruction which is written inside the file READ_ME.txt, located in the default directory C:\Program Files\DeepQuest Corp\SpeechUiSetup. The contents of the directory are shown in Figure 7.
A-2
Figure 7: Contents of the SpeechUiSetup folder in Program Files
4. After doing the instruction inside READ_ME.txt file, you can doubleclick on SpeechUI.exe file (make sure not the XML Configuration file) to run the program.
A.2 Getting Started Assuming the TTS&STT program is not running, run it by double-clicking the SpeechUI.exe file that is found inside the SpeechUiSetup folder (as shown in Figure 7). After a few seconds, the TTS&STTs main graphical user interface (GUI) will appear on screen, as shown in Figure 8.
Minimize and Close control buttons Menu bar Shortcut buttons
Figure 8: TTS&STT's main graphical user interface
The descriptions of the labeled parts of the main GUI are the following: Menu bar contains the drop-down menus used for launching the different features of the program.
A-3
Shortcut buttons one-click controls for easy access of the main features of the program. Minimize and Close control buttons the minimize button is used to obviously minimize the window if the user decides to get back to the running program later, and the close button is used to completely terminate the running application.
The following tabulations give details of the different menus provided in the programs main GUI: Menu Name File Description When Activated When Deactivated (always activated)
A drop-down menu; it (always activated) contains the menu items which are the main features of the program A drop-down menu; it (always activated) contains the menu item that launches a dialog box that displays the programs version, the copyright year, the author and the brief description of the program.
Help
(always activated)
Table 3: Main GUI's menu
Menu Name Launch ASR
Description
When Activated
When Deactivated (always activated)
It opens the Speech- (always activated) To-Text interface which allows the user to use speech to text conversion. It opens the Text-To- (always activated) Speech interface which allows the user to use text to speech conversion.
Launch TTS
(always activated)
A-4
Menu Name Exit
Description It terminates the running program.
When Activated (always activated)
Table 4: Menu items of the drop-down File menu
Menu Name About
Description It opens the dialog box that displays the programs version, the copyright year, the author and the brief description of the program.
Table 5: Menu item of the drop-down Help menu
A.3 Using the Speech-To-Text Interface The Speech-To-Text (STT) interface can be launched by accessing the File menu then choosing Launch ASR or by clicking on the Launch ASR shortcut button of the main GUI, as shown in Figure 9 below.
Launch ASR button
Figure 9: The Launch ASR shortcut button
Using either of the two options to launch STT, the STT graphical user interface will appear on screen, as shown in Figure 10.
A-5
Minimize and Close control buttons Menu bar
Recognized text area
Parsed text area
Status bar Figure 10: The STT's user interface
The descriptions of the labeled parts of the Speech-To-Text GUI are the following: Menu bar contains the advance and basic operations of the STT interface. Recognized text area contains the recognized text or words. (If the text area contains [Recognized text will be placed here.], which is the default text, then no recognized text yet has been detected by the recognition engine.) Parsed text area contains the parsed text or words. (If the text area contains [Parsed text will be placed here.], which is the default text, then no parsing of the recognized text has been done yet.) Minimize and Close control buttons the minimize button is used to obviously minimize the window if the user decides to get back to STT program later, and the close button is used to terminate the STT application and return to the main GUI. Status bar indicates the status of the recognition engine.
The following tabulations give details of the different menus provided in the Speech-To-Text GUI: Menu Name File Description When Activated When Deactivated (always activated)
A drop-down menu; it (always activated) contains the items for
A-6
Menu Name
Description loading a grammar, saving to file and closing the interface.
When Activated
When Deactivated
Edit
A drop-down menu; it (always activated) contains the items for parsing the recognized text to SMS text, clearing the textboxes A drop-down menu; it (always activated) contains the item that will display the SMS dictionary of words. A drop-down menu; it (always activated) contains the menu item that launches a dialog box that displays the application name, the copyright year and the author.
(always activated)
View
(always activated)
Help
(always activated)
Table 6: The STT's main menu
Menu Name Load Default Grammar
Description It loads the default grammar supported by the Speech API installed in the host PC. (not implemented no support yet as of this writing) A submenu that contains the menu items for saving the recognized text or
Load External Grammar Save To File
(not implemented (not implemented no support yet as of no support yet as of this writing) this writing) (always activated) (always activated)
A-7
Menu Name
Description the parsed text into a WAV file.
When Activated
When Deactivated
Close
It terminates the STT application and returns to the main application.
(always activated)
(always activated)
Table 7: Menu items of the drop-down File menu of the STT application
Menu Name Parse to TxtSpk
Description
When Activated
It parses/converts the (always activated) recognized text to SMS text based on the SMS dictionary of words. It clears the contents of the two textboxes. (always activated)
Empty Textboxes
(always activated)
Table 8: Menu items of the drop-down Edit menu of the STT application
Menu Name Dictionary
Description
When Activated
It opens the interface (always activated) of the SMS dictionary of words, as shown in Figure 15.
Table 9: Menu item of the drop-down View menu of the STT application
Menu Name About
Description It opens a dialog box that displays the name of the application, the copyright year and the name of the author
Table 10: Menu item of the drop-down Help menu of the STT application
A-8
Steps in Interacting with the Speech-To-Text Application: 1. The user needs to load the default grammar (installed in the host PC and supported by the Speech API) by clicking on File Load Default Grammar. 2. The user can now utter the words he or she wishes to be recognized (by the speech recognition engine used in the STT application) using a standard microphone. 3. After the uttered words are recognized, the user can parse or convert the recognized text to SMS text based on the database of SMS words. 4. After parsing or conversion, the user can optionally save the recognized text to a WAV file by clicking on File Save To File Recognized Text, or the user can optionally save the parsed text to a WAV file by clicking on File Save To File Parsed Text.
A.4 Using the Text-To-Speech Interface The Text-To-Speech (TTS) interface can be launched by accessing the File menu then choosing Launch TTS or by clicking on the Launch TTS shortcut button of the main GUI, as shown in Figure 11 below.
Launch TTS button
Figure 11: The Launch TTS shortcut button
Using either of the two options to launch the TTS application, the TTS graphical user interface will appear on screen, as shown in Figure 12.
A-9
Minimize and Close control buttons Menu bar
Text entry area
Speech control buttons
Parsed text area
Status bar Figure 12: The TTS user interface
Voice settings
The descriptions of the labeled parts of the Text-To-Speech GUI are the following: Menu bar contains the advance and basic operations of the TTS interface. Text entry area contains the text or words typed by the user or extracted from a text file which are to be spoken by the speech engine. Parsed text area contains the parsed text or words that have been spoken by the speech engine. Speech control buttons controls the speech engine by making it speak, stop, pause and reset. Voice settings sets the supported voices (if there are other voices installed), format for the audio output, adjusts the speaking rate (speed) and the volume. Minimize and Close control buttons the minimize button is used to obviously minimize the window if the user decides to get back to TTS application later, and the close button is used to terminate the TTS application and return to the main GUI. Status bar indicates the status of the speech engine.
The following tabulations give details of the different menus provided in the Speech-To-Text GUI:
A-10
Menu Name File
Description
When Activated
A drop-down menu; it (always activated) contains the menu items that allows the application to speak from a WAV file or from a text file. A drop-down menu; it (always activated) contains the menu item that will display the SMS dictionary of words. A drop-down menu; it (always activated) contains the menu item that launches a dialog box that displays the application name, the copyright year and the author.
View
(always activated)
Help
(always activated)
Table 11: The TTS main menu
Menu Name Speak from Text File
Description It opens the Open Dialog Box to let the user choose a text file. It opens the Open Dialog Box to let the user choose a WAV file. It opens the Save As Dialog Box to let the user save the audio equivalent of the parsed text to a WAV file. It terminates the TTS
Speak from Wave File
(always activated)
(always activated)
Save to Wave File
(always activated)
(always activated)
Close
(always activated)
(always activated)
A-11
Menu Name
Description application and returns to the main application.
When Activated
When Deactivated
Table 12: Menu items of the drop-down File menu of the TTS application
Menu Name Dictionary
Description
When Activated
It opens the interface (always activated) of the SMS dictionary of words, as shown in Figure 15.
Table 13: Menu item of the drop-down View menu of the TTS application
Menu Name About
Description It opens a dialog box that displays the name of the application, the copyright year and the name of the author
Table 14: Menu item of the drop-down Help menu of the TTS application
Steps in Interacting with the Text-To-Speech Application: 1. The user needs to type a word or set of words in the text entry area or the user opens a text file (click on File Speak from Text File) or opens a WAV file (click on File Speak from Wave File). 2. The user can now adjust the voice settings, as shown in Figure 8, but leaving it as is will do the job perfectly. 3. After adjustments were made in the voice settings (if any), the user needs to click on the Speak button in the Speech control buttons group, as shown in Figure 8. Note that if the user decided to let the application speak from a WAV file, then it is not necessary for the user to click on the Speak button since this option will let the application speak automatically. 4. Optionally, the user can save the typed words in the text entry area into a WAV file if he or she wishes to.
A-12
A.5 Using the SMS Dictionary Interface The SMS Dictionary interface can be launched using the View Dictionary menu of both the Speech-To-Text and Text-To-Speech applications, as shown in Figures 13 and 14 respectively.
Figure 13: View menu of the STT application
Figure 14: View menu of the TTS application
Using either of the options to launch the SMS Dictionary, the SMS Dictionary interface will appear on screen as shown in Figure 15.
Figure 15: The SMS Dictionary interface
In this interface, the user can add, update, delete word or words, or sort (in ascending or descending manner) the words in the dictionary.
B-1
APPENDIX B: PROGRAM LISTINGS

---------------------------------------------------------------------------------------------------------------------Imports System.Data.OleDb Imports System.Speech.Recognition Public Class frmASR Dim SpRecognizer As New SpeechRecognitionEngine Private Sub mnuAbout_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles mnuAbout.Click MsgBox("Speech Recognition application." & Chr(13) & "Copyright 2008" & Chr(13) & "Deepquest Corp.", MsgBoxStyle.OkOnly + MsgBoxStyle.Information, "ASR") End Sub Private Sub mnuDictionary_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles mnuDictionary.Click frmDictionary.Show() End Sub Private Sub mnuClose_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles mnuClose.Click Me.Close() frmSpeechEngineMain.Show() End Sub Private Sub frmASR_FormClosed(ByVal sender As Object, ByVal e As System.Windows.Forms.FormClosedEventArgs) Handles Me.FormClosed frmDictionary.Close() SpRecognizer.Dispose() SpRecognizer = Nothing frmSpeechEngineMain.Show() End Sub Private Sub mnuParseToTxtSpk_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles mnuParseToTxtSpk.Click Dim p_Str As String = txtRecoString.Text txtRecoString.Text = TextParser(p_Str) End Sub Private Function TextParser(ByVal p_Str As String) As String Dim n_Str As String = String.Empty Dim t_Str() As String Dim d_Str As String = " " & vbCrLf & "." & "," & "?" & "!" Dim d_Chars() As Char = d_Str.ToCharArray 'Tokenize the string t_Str = p_Str.Split(d_Chars, StringSplitOptions.None) 'Compare every string if it's in the database For Each s As String In t_Str n_Str = n_Str & " " & ParseToDigiWord(s) Next Return n_Str.Trim End Function Private Function ParseToDigiWord(ByVal f_Str As String) As String 'Connect to the database Dim objConnection As OleDbConnection = ConnectToDatabase() 'Search if the string is in the dictionary Dim strSQLSearch As String = "SELECT digiWord FROM Words WHERE realWord = '" & f_Str & "'" Dim objDataAdapter As New OleDbDataAdapter(strSQLSearch, objConnection) Dim objDataTable As New DataTable("mWords") objDataAdapter.Fill(objDataTable) If objDataTable.Rows.Count > 0 Then f_Str = objDataTable.Rows(0)("digiWord").ToString End If
B-2
objDataAdapter.Dispose() objDataAdapter = Nothing objDataTable.Dispose() objDataTable = Nothing objConnection.Close() objConnection = Nothing Return f_Str End Function Private Sub frmASR_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load Dim grammar As Grammar = DefaultGrammar() SpRecognizer.LoadGrammarAsync(grammar) SpRecognizer.SetInputToDefaultAudioDevice() SpRecognizer.RecognizeAsync() AddHandler SpRecognizer.SpeechDetected, AddressOf SpeechDetectedEventHandler AddHandler SpRecognizer.SpeechRecognized, AddressOf SpeechRecognizedEventHandler AddHandler SpRecognizer.SpeechRecognitionRejected, AddressOf SpeechRecognitionRejectedEventHandler AddHandler SpRecognizer.RecognizeCompleted, AddressOf SpeechRecognizeCompletedEventHandler End Sub Private Sub SpeechRecognizedEventHandler(ByVal sender As Object, ByVal e As SpeechRecognizedEventArgs) Dim result As RecognitionResult = e.Result Dim phrase As String = result.Text 'Update the recognized text window's string txtRecoString.Text = txtRecoString.Text & phrase & " " tssRecoText.Text = "Recognized" 'Update the parsed text window's string txtParseText.Text = txtParseText.Text & ParseToDigiWord(phrase) & " " End Sub Private Sub SpeechRecognitionRejectedEventHandler(ByVal sender As Object, ByVal e As SpeechRecognitionRejectedEventArgs) tssRecoText.Text = "Rejected" End Sub Private Sub SpeechDetectedEventHandler(ByVal sender As Object, ByVal e As SpeechDetectedEventArgs) tssRecoText.Text = "Ready" End Sub Private Sub SpeechRecognizeCompletedEventHandler(ByVal sender As Object, ByVal e As RecognizeCompletedEventArgs) SpRecognizer.RecognizeAsync() End Sub Private Sub mnuLoadDefaultGrammar_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles mnuLoadDefaultGrammar.Click Dim grammar As Grammar = DefaultGrammar() SpRecognizer.UnloadAllGrammars() SpRecognizer.LoadGrammar(grammar) End Sub Private Function DefaultGrammar() As Grammar 'Connect to the database Dim objConnection As OleDbConnection = ConnectToDatabase() 'Extract all word pairs in the dictionary Dim strSQL As String = "SELECT realWord FROM Words" Dim objDataAdapter As New OleDbDataAdapter(strSQL, objConnection)
B-3
Dim objDataTable As New DataTable("mWords") objDataAdapter.Fill(objDataTable) Dim words As New Choices Dim gBuilder As New GrammarBuilder For Each row As DataRow In objDataTable.Rows words.Add(row.Item("realWord")) Next gBuilder.Append(words) objDataAdapter.Dispose() objDataAdapter = Nothing objDataTable.Dispose() objDataTable = Nothing objConnection.Close() objConnection = Nothing Dim grammar As New Grammar(gBuilder) Return grammar End Function Private Function ConnectToDatabase() As OleDbConnection Dim strConnectionString As String = "Provider=Microsoft.Jet.OLEDB.4.0;" & _ "Data Source=C:\Data\TextWords.mdb;" Dim objConnection As New OleDbConnection(strConnectionString) Try objConnection.Open() Catch oleDbException As OleDbException MsgBox(oleDbException.Message, MsgBoxStyle.OkOnly + MsgBoxStyle.Critical, "Error " & oleDbException.ErrorCode) Catch invalidOperationException As InvalidOperationException MsgBox(invalidOperationException.Message, MsgBoxStyle.OkOnly + MsgBoxStyle.Critical, "Error") End Try Return objConnection End Function Private Sub txtRecoString_DoubleClick(ByVal sender As Object, ByVal e As System.EventArgs) Handles txtRecoString.DoubleClick txtRecoString.Text = String.Empty txtParseText.Text = String.Empty End Sub Private Sub txtParseText_DoubleClick(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles txtParseText.DoubleClick txtParseText.Text = String.Empty End Sub Private Sub mnuEmptyTextboxes_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles mnuEmptyTextboxes.Click txtRecoString.Text = String.Empty txtParseText.Text = String.Empty End Sub Private Sub mnuSaveRecognizedText_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles mnuSaveRecognizedText.Click If String.IsNullOrEmpty(txtRecoString.Text) Then MsgBox("Cannot save an empty text to a file.", MsgBoxStyle.OkOnly + MsgBoxStyle.Exclamation, "Empty") Exit Sub End If Dim SaveAsDialog As New SaveFileDialog SaveAsDialog.Filter = "Text files (*.txt)|*.txt"
B-4
If SaveAsDialog.ShowDialog = DialogResult.OK Then 'Write string to file FileOpen(1, SaveAsDialog.FileName, OpenMode.Output) PrintLine(1, txtRecoString.Text) FileClose(1) End If SaveAsDialog.Dispose() SaveAsDialog = Nothing tssRecoText.Text = "Saved" End Sub Private Sub mnuSaveParsedText_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles mnuSaveParsedText.Click If String.IsNullOrEmpty(txtParseText.Text) Then MsgBox("Cannot save an empty text to a file.", MsgBoxStyle.OkOnly + MsgBoxStyle.Exclamation, "Empty") Exit Sub End If Dim SaveAsDialog As New SaveFileDialog SaveAsDialog.Filter = "Text files (*.txt)|*.txt" If SaveAsDialog.ShowDialog = DialogResult.OK Then 'Write string to file FileOpen(1, SaveAsDialog.FileName, OpenMode.Output) PrintLine(1, txtParseText.Text) FileClose(1) End If SaveAsDialog.Dispose() SaveAsDialog = Nothing tssRecoText.Text = "Saved" End Sub End Class ---------------------------------------------------------------------------------------------------------------------Imports System.Data.OleDb Imports System.Speech.Synthesis Imports System.Speech.AudioFormat Public Class frmTTS Dim SSynthesizer As New SpeechSynthesizer Private Sub mnuAbout_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles mnuAbout.Click MsgBox("Text-To-Speech application." & Chr(13) & "Copyright 2008" & Chr(13) & "Deepquest Corp.", MsgBoxStyle.OkOnly + MsgBoxStyle.Information, "TTS") End Sub Private Sub frmTTS_FormClosed(ByVal sender As Object, ByVal e As System.Windows.Forms.FormClosedEventArgs) Handles Me.FormClosed frmSpeechEngineMain.Show() SSynthesizer.Dispose() SSynthesizer = Nothing frmDictionary.Close() End Sub Private Sub frmTTS_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load 'Populate the contents of the voice combo box For Each voice As InstalledVoice In SSynthesizer.GetInstalledVoices() cbVoice.Items.Add(voice.VoiceInfo.Name) Next cbVoice.SelectedIndex = 0
B-5
'Populate the contents of the format combo box Dim hrtz As New ArrayList Dim type As New ArrayList hrtz.Add("8kHz") hrtz.Add("11kHz") hrtz.Add("12kHz") hrtz.Add("16kHz") hrtz.Add("22kHz") hrtz.Add("24kHz") hrtz.Add("32kHz") hrtz.Add("44kHz") hrtz.Add("48kHz") type.Add("8 Bit Mono") type.Add("8 Bit Stereo") type.Add("16 Bit Mono") type.Add("16 Bit Stereo") For Each f As String In hrtz For Each t As String In type cbFormat.Items.Add(f & " " & t) Next Next cbFormat.SelectedIndex = 18 'Add an event handler to notify if speaking is completed AddHandler SSynthesizer.SpeakCompleted, AddressOf SpeakCompletedEventHandler 'Add an event handler to notify if speaking is started AddHandler SSynthesizer.SpeakStarted, AddressOf SpeakStartedEventHandler End Sub Private Sub btnSpeak_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btnSpeak.Click Dim p_Str As String = TextParser(txtTextWindow.Text) Try btnSpeak.Enabled = False btnStop.Enabled = True btnPause.Enabled = True SSynthesizer.SelectVoice(cbVoice.Text) SSynthesizer.SetOutputToDefaultAudioDevice() SSynthesizer.Volume = trVolume.Value SSynthesizer.Rate = trRate.Value SSynthesizer.SpeakAsync(p_Str) txtRealSpoken.Text = p_Str Catch ex As Exception MsgBox(ex.Message, MsgBoxStyle.OkOnly + MsgBoxStyle.Critical, "Error") End Try End Sub Private Sub btnStop_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btnStop.Click SSynthesizer.SpeakAsyncCancelAll() btnStop.Enabled = False btnPause.Enabled = False btnSpeak.Enabled = True End Sub Private Sub btnPause_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btnPause.Click If btnPause.Text = "Pause" Then SSynthesizer.Pause() btnPause.Text = "Resume" Else SSynthesizer.Resume() btnPause.Text = "Pause" End If End Sub
B-6
Private Sub mnuClose_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles mnuClose.Click Me.Close() End Sub Private Sub btnReset_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btnReset.Click 'Reset buttons' status SSynthesizer.SpeakAsyncCancelAll() btnStop.Enabled = False btnPause.Enabled = False btnSpeak.Enabled = True 'Reset voice, rate, volume and format values cbVoice.SelectedIndex = 0 trRate.Value = 0 trVolume.Value = 100 cbFormat.SelectedIndex = 18 'Clear textbox txtRealSpoken.Text = String.Empty End Sub Private Sub mnuDictionary_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles mnuDictionary.Click frmDictionary.Show() End Sub Private Sub mnuSpeakTextFile_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles mnuSpeakTextFile.Click Dim f_Str As String = String.Empty, f_pStr As String = String.Empty Dim TTSOpenFileDialog As New OpenFileDialog TTSOpenFileDialog.Filter = "Text files (*.txt)|*.txt" If TTSOpenFileDialog.ShowDialog() = DialogResult.OK Then If TTSOpenFileDialog.FileName <> String.Empty Then Try FileOpen(1, TTSOpenFileDialog.FileName, OpenMode.Input) Do Until EOF(1) f_pStr = LineInput(1) f_Str = f_Str & f_pStr & vbCrLf Loop txtTextWindow.Text = f_Str Catch ex As Exception MsgBox(ex.Message, MsgBoxStyle.OkOnly + MsgBoxStyle.Critical, "Error") Finally FileClose(1) End Try End If End If TTSOpenFileDialog.Dispose() TTSOpenFileDialog = Nothing End Sub Private Function TextParser(ByVal p_Str As String) As String Dim n_Str As String = String.Empty Dim t_Str() As String Dim d_Str As String = " " & vbCrLf & "." & "," & "?" & "!" Dim d_Chars() As Char = d_Str.ToCharArray 'Tokenize the string t_Str = p_Str.Split(d_Chars, StringSplitOptions.None) 'Compare every string if it's in the database For Each s As String In t_Str n_Str = n_Str & ParseToRealWord(s) & " " Next Return n_Str End Function Private Function ParseToRealWord(ByVal f_Str As String) As String
B-7
'Connect to the database Dim strConnectionString As String = "Provider=Microsoft.Jet.OLEDB.4.0;" & _ "Data Source=C:\Data\TextWords.mdb;" Dim objConnection As New OleDbConnection(strConnectionString) 'Open the database Try objConnection.Open() Catch oleDbException As OleDbException MsgBox(oleDbException.Message, MsgBoxStyle.OkOnly + MsgBoxStyle.Critical, "Error " & oleDbException.ErrorCode) Catch invalidOperationException As InvalidOperationException MsgBox(invalidOperationException.Message, MsgBoxStyle.OkOnly + MsgBoxStyle.Critical, "Error") End Try 'Search if the string is in the dictionary Dim strSQLSearch As String = "SELECT realWord FROM Words WHERE digiWord = '" & f_Str & "'" Dim objDataAdapter As New OleDbDataAdapter(strSQLSearch, objConnection) Dim objDataTable As New DataTable("mWords") objDataAdapter.Fill(objDataTable) If objDataTable.Rows.Count > 0 Then f_Str = objDataTable.Rows(0)("realWord").ToString End If objDataAdapter.Dispose() objDataAdapter = Nothing objDataTable.Dispose() objDataTable = Nothing objConnection.Close() objConnection = Nothing Return f_Str End Function Private Sub mnuSaveToWaveFile_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles mnuSaveToWaveFile.Click If String.IsNullOrEmpty(txtTextWindow.Text) Then MsgBox("Cannot save an empty text to a wave file.", MsgBoxStyle.OkOnly + MsgBoxStyle.Exclamation, "Empty") Exit Sub End If Dim p_Str As String = TextParser(txtTextWindow.Text) Dim TTSSaveFileDialog As New SaveFileDialog TTSSaveFileDialog.Filter = "Wave files (*.wav)|*.wav" If TTSSaveFileDialog.ShowDialog() = DialogResult.OK Then SSynthesizer.SetOutputToWaveFile(TTSSaveFileDialog.FileName) SSynthesizer.SpeakAsync(p_Str) End If TTSSaveFileDialog.Dispose() TTSSaveFileDialog = Nothing End Sub Private Sub mnuSpeakWaveFile_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles mnuSpeakWaveFile.Click Dim TTSOpenFileDialog As New OpenFileDialog TTSOpenFileDialog.Filter = "Wave files (*.wav)|*.wav" If TTSOpenFileDialog.ShowDialog() = DialogResult.OK Then txtTextWindow.Text = "[Enter the text you wish spoken here.]" txtRealSpoken.Text = String.Empty
B-8
txtTextWindow.Enabled = False txtRealSpoken.Enabled = False 'Disable | Enable control buttons btnSpeak.Enabled = False btnStop.Enabled = True btnPause.Enabled = True 'Create a prompt to handle the wave file Dim promptBuilder As New PromptBuilder promptBuilder.AppendAudio(TTSOpenFileDialog.FileName) SSynthesizer.SpeakAsync(promptBuilder) End If TTSOpenFileDialog.Dispose() TTSOpenFileDialog = Nothing End Sub Private Sub SpeakCompletedEventHandler(ByVal sender As Object, ByVal e As SpeakCompletedEventArgs) btnSpeak.Enabled = True btnStop.Enabled = False btnPause.Enabled = False txtTextWindow.Enabled = True txtRealSpoken.Enabled = True tssStatus.Text = "Ready" End Sub Private Sub SpeakStartedEventHandler(ByVal sender As Object, ByVal e As SpeakStartedEventArgs) tssStatus.Text = "Speaking" End Sub Private Sub txtTextWindow_DoubleClick(ByVal sender As Object, ByVal e As System.EventArgs) Handles txtTextWindow.DoubleClick txtTextWindow.Text = String.Empty End Sub Private Sub txtRealSpoken_DoubleClick(ByVal sender As Object, ByVal e As System.EventArgs) Handles txtRealSpoken.DoubleClick txtRealSpoken.Text = String.Empty End Sub End Class ---------------------------------------------------------------------------------------------------------------------Imports System.Data.OleDb Public Class frmDictionary Private Sub frmDictionary_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load PopulateListView() End Sub Private Sub mnuClose_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles mnuClose.Click Me.Close() End Sub Private Sub PopulateListView() Dim wordCount As Integer = 0 Dim objConnection As OleDbConnection = ConnectToDatabase() If objConnection.State = ConnectionState.Closed Then Exit Sub End If Dim strSQL As String = "SELECT * FROM Words"
B-9
Dim objDataAdapter As New OleDbDataAdapter(strSQL, objConnection) Dim objDataTable As New DataTable("mWords") objDataAdapter.Fill(objDataTable) 'Populate the ListView control lvwWords.Items.Clear() Dim listItem As ListViewItem Dim objDataRow As DataRow For Each objDataRow In objDataTable.Rows listItem = lvwWords.Items.Add(objDataRow.Item("realWord")) listItem.SubItems.Add(objDataRow.Item("digiWord")) wordCount += 1 Next tssWordCount.Text = wordCount objDataAdapter.Dispose() objDataAdapter = Nothing objDataTable.Dispose() objDataTable = Nothing objConnection.Close() objConnection.Dispose() objConnection = Nothing End Sub Private Function ConnectToDatabase() As OleDbConnection Dim strConnectionString As String = "Provider=Microsoft.Jet.OLEDB.4.0;" & _ "Data Source=C:\Data\TextWords.mdb;" Dim objConnection As New OleDbConnection(strConnectionString) Try objConnection.Open() Catch oleDbException As OleDbException MsgBox(oleDbException.Message, MsgBoxStyle.OkOnly + MsgBoxStyle.Critical, "Error " & oleDbException.ErrorCode) Catch invalidOperationException As InvalidOperationException MsgBox(invalidOperationException.Message, MsgBoxStyle.OkOnly + MsgBoxStyle.Critical, "Error") End Try Return objConnection End Function Private Sub mnuDelete_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles mnuDelete.Click Dim r_Word As String Dim d_Word As String 'Get the data from the selected item in the listview If lvwWords.SelectedItems.Count <> 0 Then r_Word = lvwWords.SelectedItems(0).Text d_Word = lvwWords.SelectedItems(0).SubItems(1).Text Else MsgBox("No item is selected.", MsgBoxStyle.OkOnly + MsgBoxStyle.Exclamation, "No Selection") Exit Sub End If Dim objConnection As OleDbConnection = ConnectToDatabase() If objConnection.State = ConnectionState.Closed Then Exit Sub End If Dim strSQL As String = "DELETE FROM Words WHERE realWord = '" & r_Word & "' AND digiWord = '" & d_Word & "'" Dim objCommand As New OleDbCommand objCommand.Connection = objConnection objCommand.CommandText = strSQL objCommand.CommandType = CommandType.Text Dim m_Row As Integer = objCommand.ExecuteNonQuery
B-10
If m_Row = 1 Then PopulateListView() MsgBox("Word is successfully deleted" & ControlChars.CrLf & "in the dictionary.", MsgBoxStyle.OkOnly + MsgBoxStyle.Information, "Deletion") Else MsgBox("Cannot find the word in the dictionary.", MsgBoxStyle.OkOnly + MsgBoxStyle.Critical, "Not Found") End If objCommand.Dispose() objCommand = Nothing objConnection.Close() objConnection.Dispose() objConnection = Nothing End Sub Private Sub mnuUpdate_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles mnuUpdate.Click Dim r_Word As String Dim d_Word As String 'Get the data from the selected item in the listview If lvwWords.SelectedItems.Count <> 0 Then r_Word = lvwWords.SelectedItems(0).Text d_Word = lvwWords.SelectedItems(0).SubItems(1).Text Else MsgBox("No item is selected.", MsgBoxStyle.OkOnly + MsgBoxStyle.Exclamation, "No Selection") Exit Sub End If frmUpdateWordPair.txtRealWord.Text = r_Word frmUpdateWordPair.txtDigiWord.Text = d_Word frmUpdateWordPair.Show() Me.Close() End Sub Private Sub mnuAdd_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles mnuAdd.Click Me.Close() frmAddWordPair.Show() End Sub Private Sub mnuSortAtoZ_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles mnuSortAtoZ.Click Dim objConnection As OleDbConnection = ConnectToDatabase() If objConnection.State = ConnectionState.Closed Then Exit Sub End If Dim strSQL As String = "SELECT * FROM Words ORDER BY realWord" Dim objDataAdapter As New OleDbDataAdapter(strSQL, objConnection) Dim objDataTable As New DataTable("mWords") objDataAdapter.Fill(objDataTable) 'Populate the ListView control lvwWords.Items.Clear() Dim listItem As ListViewItem Dim objDataRow As DataRow For Each objDataRow In objDataTable.Rows listItem = lvwWords.Items.Add(objDataRow.Item("realWord")) listItem.SubItems.Add(objDataRow.Item("digiWord")) Next objDataAdapter.Dispose() objDataAdapter = Nothing objDataTable.Dispose() objDataTable = Nothing
B-11
objConnection.Close() objConnection.Dispose() objConnection = Nothing End Sub Private Sub mnuSortZtoA_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles mnuSortZtoA.Click Dim objConnection As OleDbConnection = ConnectToDatabase() If objConnection.State = ConnectionState.Closed Then Exit Sub End If Dim strSQL As String = "SELECT * FROM Words ORDER BY realWord DESC" Dim objDataAdapter As New OleDbDataAdapter(strSQL, objConnection) Dim objDataTable As New DataTable("mWords") objDataAdapter.Fill(objDataTable) 'Populate the ListView control lvwWords.Items.Clear() Dim listItem As ListViewItem Dim objDataRow As DataRow For Each objDataRow In objDataTable.Rows listItem = lvwWords.Items.Add(objDataRow.Item("realWord")) listItem.SubItems.Add(objDataRow.Item("digiWord")) Next objDataAdapter.Dispose() objDataAdapter = Nothing objDataTable.Dispose() objDataTable = Nothing objConnection.Close() objConnection.Dispose() objConnection = Nothing End Sub End Class ---------------------------------------------------------------------------------------------------------------------Imports System.Data.OleDb Public Class frmAddWordPair Private Sub btnCancel_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btnCancel.Click Me.Close() frmDictionary.Show() End Sub
Private Sub btnSave_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btnSave.Click Dim r_Word As String = txtRealWord.Text Dim d_Word As String = txtDigiWord.Text If String.IsNullOrEmpty(r_Word) Or String.IsNullOrEmpty(d_Word) Then MsgBox("Cannot append with an empty string.", MsgBoxStyle.OkOnly + MsgBoxStyle.Critical, "Empty") Exit Sub End If Dim strConnectionString As String = "Provider=Microsoft.Jet.OLEDB.4.0;" & _ "Data Source=C:\Data\TextWords.mdb;" Dim objConnection As New OleDbConnection(strConnectionString) Try objConnection.Open() Catch oleDbException As OleDbException MsgBox(oleDbException.Message, MsgBoxStyle.OkOnly + MsgBoxStyle.Critical, "Error " & oleDbException.ErrorCode)
B-12
Catch invalidOperationException As InvalidOperationException MsgBox(invalidOperationException.Message, MsgBoxStyle.OkOnly + MsgBoxStyle.Critical, "Error") End Try 'Search if r_Word in the database already since we 'do not allow duplicates on our data Dim strSQLSearch As String = "SELECT realWord FROM Words WHERE realWord = '" & r_Word & "'" Dim objDataAdapter As New OleDbDataAdapter(strSQLSearch, objConnection) Dim objDataTable As New DataTable("mWords") objDataAdapter.Fill(objDataTable) 'Append the word pair Dim strSQL As String = "INSERT INTO Words VALUES('" & r_Word & "', '" & d_Word & "')" Dim objCommand As New OleDbCommand objCommand.Connection = objConnection objCommand.CommandText = strSQL objCommand.CommandType = CommandType.Text Dim m_Row As Integer = objCommand.ExecuteNonQuery If m_Row = 1 Then MsgBox("Word pair is successfully added" & ControlChars.CrLf & "into the dictionary.", MsgBoxStyle.OkOnly + MsgBoxStyle.Information, "Append") End If objCommand.Dispose() objCommand = Nothing objConnection.Close() objConnection.Dispose() objConnection = Nothing Me.Close() frmDictionary.Show() End Sub End Class ---------------------------------------------------------------------------------------------------------------------Imports System.Data.OleDb Public Class frmUpdateWordPair Private Sub btnCancel_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btnCancel.Click Me.Close() frmDictionary.Show() End Sub Private Sub btnSave_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btnSave.Click Dim r_Word As String = txtRealWord.Text Dim d_Word As String = txtDigiWord.Text If String.IsNullOrEmpty(d_Word) Then MsgBox("Cannot update with an empty string.", MsgBoxStyle.OkOnly + MsgBoxStyle.Critical, "Empty") Exit Sub End If Dim strConnectionString As String = "Provider=Microsoft.Jet.OLEDB.4.0;" & _ "Data Source=C:\Data\TextWords.mdb;" Dim objConnection As New OleDbConnection(strConnectionString) Try objConnection.Open() Catch oleDbException As OleDbException MsgBox(oleDbException.Message, MsgBoxStyle.OkOnly + MsgBoxStyle.Critical, "Error " & oleDbException.ErrorCode) Catch invalidOperationException As InvalidOperationException MsgBox(invalidOperationException.Message, MsgBoxStyle.OkOnly + MsgBoxStyle.Critical, "Error") End Try
B-13
Dim strSQL As String = "UPDATE Words SET digiWord = '" & d_Word & _ "' WHERE realWord = '" & r_Word & "' AND digiWord = '" & d_Word & "'" Dim objCommand As New OleDbCommand objCommand.Connection = objConnection objCommand.CommandText = strSQL objCommand.CommandType = CommandType.Text Dim m_Row As Integer = objCommand.ExecuteNonQuery If m_Row = 1 Then MsgBox("Word is successfully updated" & ControlChars.CrLf & "in the dictionary.", MsgBoxStyle.OkOnly + MsgBoxStyle.Information, "Update") Else MsgBox("Cannot find the word in the dictionary.", MsgBoxStyle.OkOnly + MsgBoxStyle.Critical, "Not Found") End If objCommand.Dispose() objCommand = Nothing objConnection.Close() objConnection.Dispose() objConnection = Nothing Me.Close() frmDictionary.Show() End Sub End Class ---------------------------------------------------------------------------------------------------------------------Public NotInheritable Class AboutBox Private Sub AboutBox1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load Me.LabelProductName.Text = "Speech Engine" Me.LabelVersion.Text = "Version 1.0" Me.LabelCopyright.Text = My.Application.Info.Copyright Me.LabelCompanyName.Text = "Deepquest Corp." Me.TextBoxDescription.Text = "An application that demonstrates speech recognition and text-tospeech." End Sub Private Sub OKButton_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles OKButton.Click Me.Close() End Sub End Class ---------------------------------------------------------------------------------------------------------------------Public Class frmSpeechEngineMain Private Sub mExit_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles mExit.Click End End Sub Private Sub mAbout_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles mAbout.Click AboutBox.Show() End Sub Private Sub mLaunchTTS_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles mLaunchTTS.Click Me.Hide() frmTTS.Show() End Sub Private Sub mLaunchASR_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles mLaunchASR.Click Me.Hide() frmASR.Show()
B-14
End Sub Private Sub pbASR_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles pbASR.Click Me.Hide() frmASR.Show() End Sub Private Sub pbTTS_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles pbTTS.Click Me.Hide() frmTTS.Show() End Sub End Class ------------------------------------------------------------------------------------------------------------------
BIBLIOGRAPHY
Dunn, Michael D., Pro Microsoft Speech Server 2007: Developing Speech Enabled Applications with .NET, Berkeley, California: Apress, 2007 Halvorson, Michael, Microsoft Visual Basic 2008, Redmond, Washington: Microsoft Press, 2008 Barker, F. Scott, Database Programming with Visual Basic .NET and ADO.NET: Tips, Tutorials, and Code, New York: Sams Publishing, 2003 Dobson, Rick, Programming Microsoft Visual Basic .NET for Microsoft Access Databases, Redmond, Washington: Microsoft Press, 2003 Willis, Thearon, Beginning Visual Basic 2005 Databases, Indianapolis, Indiana: Wiley Publishing, Inc., 2006 Pelland, Patrice, Microsoft Visual Studio 2005 Express Edition: Build A Program Now, Redmond, Washington: Microsoft Press, 2006 Kochanov, Ilya, Yap: Voice-To-Text Translation On Your Cell, http://www.crunchgear.com/2007/09/17/yap-voice-to-text-translation-on-your-cell/ Platzek, Dirk, Jott: Speech-to-Text Mobile Interface, http://www.wunschfeld.net/blog/2007/03/speech-to-text-mobile-interface.html Turner, Brough, Voice SMSCreating a Service People Will Adopt, http://www.nmscommunications.com/News/NL/TIN/Jan2007/VoiceSMS.htm Wapedia.com, SMS Language, 2006 http://wapedia.mobi/en/SMS_language Wikipedia.com, SMS Language, Common Abbreviations http://en.wikipedia.org/wiki/SMS_language TextingSlang.com, Common Abbreviations http://www.textingslang.com/

An Application That Talks and Listens (Duites)

Transféré par

Informations du document

Description originale:

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

An Application That Talks and Listens (Duites)

Transféré par

Droits d'auteur :

Formats disponibles

AN APPLICATION THAT TALKS AND LISTENS (MANIPULATING TEXT FOR MOBILE SHORT MESSAGING SYSTEM)

DEEP XAVIER T. DUITES

MASTER IN INFORMATION TECHNOLOGY

SMS Language http://wapedia.mobi/en/SMS_language

To come up with an application that handles both speech-to-text and text-to-speech.

2. REVIEW OF RELATED LITERATURE

Kochanov, Ilya Yap: Voice-To-Text Translation On Your Cell http://www.crunchgear.com/2007/09/17/yap-voice-to-text-translation-on-your-cell/ http://www.techcrunch.com/2007/09/17/techcrunch-40-session-2-mobile-communications/

Platzek, Dirk Jott: Speech-to-Text Mobile Interface http://jott.com/default.aspx http://www.wunschfeld.net/blog/2007/03/speech-to-text-mobile-interface.html

Microsoft Voice Command 1.6 http://www.microsoft.com/windowsmobile/voicecommand/features.mspx

3. MATERIALS AND METHODOLOGIES

Cost 32,000.00 7,500.00 6,500.00 0.00 0.00 200.00 300.00 46,500.00

SAPI SR Engine SS Engine

SAPI SR Engine SS Engine

SMS text Database

Figure 2: STT's System Architecture

SAPI SR Engine SS Engine

SMS text Database

SAPI SR Engine SS Engine

5. CONCLUSIONS AND RECOMMENDATIONS

APPENDIX A: USERS MANUAL

Figure 5: Contents of the SpeechUiSetup folder

Figure 6: Contents of the Debug folder

Figure 7: Contents of the SpeechUiSetup folder in Program Files

Figure 8: TTS&STT's main graphical user interface

Table 3: Main GUI's menu

Menu Name Launch ASR

When Deactivated (always activated)

Menu Name Exit

Description It terminates the running program.

When Activated (always activated)

When Deactivated (always activated)

Table 4: Menu items of the drop-down File menu

Menu Name About

When Activated (always activated)

When Deactivated (always activated)

Table 5: Menu item of the drop-down Help menu

Launch ASR button

Figure 9: The Launch ASR shortcut button

Minimize and Close control buttons Menu bar

Recognized text area

Parsed text area

Status bar Figure 10: The STT's user interface

A drop-down menu; it (always activated) contains the items for

Description loading a grammar, saving to file and closing the interface.

Table 6: The STT's main menu

Menu Name Load Default Grammar

When Activated (always activated)

When Deactivated (always activated)

Load External Grammar Save To File

Description the parsed text into a WAV file.

It terminates the STT application and returns to the main application.

Menu Name Parse to TxtSpk

When Deactivated (always activated)

Menu Name Dictionary

When Deactivated (always activated)

Menu Name About

When Activated (always activated)

When Deactivated (always activated)

Launch TTS button

Figure 11: The Launch TTS shortcut button

Minimize and Close control buttons Menu bar

Text entry area