Vous êtes sur la page 1sur 4

Reviewing Automatic Language Identification

Speech Recognition System

The Audio Notebook

Accent identification

Languages have characteristic sounds patterns: they are described subjectively as singsong, rhythmic, guttural, nasal and so on. In mono-lingual spoken language systems, the objective is to determine the content of the speech, typically implemented by phoneme recognition coupled with word recognition and sentence recognition. This requires that researchers cue in on small portions of the speech - frames, phonemes, syllables, subword units, and so on, to determine what the speaker said. When a person speaks, air compression waves are produced. These waves travel through the air to where others can hear them, or into a microphone of an SR system. A microphone converts this original speech signal into an analog electrical signal. (a noise cancelling microphone) Once the speech signal is converted into an analog electrical signal, some processing may take place to keep the values of the signal into a reasonable range. This is called gain control. The analog speech signal is then converted into a digital speech signal with an analog-to-digital (A/D) converter. Digital signal processing (DSP). The digital signal from the A/D converter includes a tremendous amount of data per second. The digital speech signal, comprising the samples output by the A/D converter, is transformed by a digital signal processor into another digital speech signal. DSP step uses various transform functions to determine several physical measurements of the speech signal for each frame of time. A frame may include the amplitudes of the speech in several frequency bands. (frame features) A plurality of parallel speech recognition circuits each receive and store slightly altered versions of the input speech as part of the registration process. The intent is to improve the ability to recognize speech in adverse speech environments, such as high noise. The present invention is directed to an improved speech recognition system for analysing an input signal to recognize speech contained within the signal and output an accepted and recognized text signal. Accepted and recognized-text output signal represents in textual form the speech recognized in the input signal and accepted by the comparator. A listener experiences when attempting to capture information presented during a lecture, meeting or interview. Listener must divide their attention between the talker and their notetaking activity. The goal is to retain the original audio while allowing a listener to quickly and easily access portions of interest. Very informal listening tests have been performed using a subset of the same corpus, consisting of isolated words spoken by 6 female speakers, one from each country. 5 Portuguese listeners and 1 French one have heard sequences

Accent modification http://www.asha.org/pu blic/speech/development /accent-modification/

Language accent classification in American English

of words pronounced by the same speaker, until they could make a guess about her accent. They are accustomed to hearing English spoken by foreigners (Frequently participated in international projects and conferences) It was very difficult for the listeners to make a guess without hearing at some 6 to 12 words from each speaker An accent is the unique way that speech is pronounced by a group of people speaking the same language. A persons accent depends on many factors. Accents are a natural part of spoken languages. Accents reflect the unique characteristics and background of a person. Some people may have difficulty communicating because of their accent. These include people not understanding you, avoiding social interaction with those who may not understand you, frustration from having to repeat yourself all the time, people focusing on your accent more than on what you are trying to say. Negative effects on job performance, educational advancement, and everyday life activities. Accent includes sound pronunciation (consonants & vowels), stress, rhythm, and intonation of speech It is well known that speaker variability caused by accent is one factor that degrades performance of speech recognition algorithms. If knowledge of speaker accent can be estimated accurately, then a modified set of recognition models which addresses speaker accent could be employed to increase recognition accuracy. A database of foreign language accent is established that consists of words and phrases that are known to be sensitive to accent. A speech recognizer is a device that automatically transcribes speech into text. It can be thought as a voice-actuated typewriter in which a computer program carries out the transcription and the transcribed text appears on a workstation display. The designation word denotes a word form defined by its spelling. In early design strategy of the field a recognizer would segment the speech in to successive phones (basic pronunciation units), then identify the particular phones corresponding to the segments, and finally transcribe the recognized phone strings into an English text. There are many sources of acoustical distortion that can degrade the accuracy of speech-recognition systems. For example, obstacles to robustness include additive noise from machinery, competing talkers, reverberation from surface reflections in a room, and spectral shaping by microphones and

Statistical Methods for Speech Recognition

Environmental Robustness in automatic speech recognition

Robust Continuous Speech Recognition Using Parallel Model Combination

Speech and text messaging system with distributed speech recognition and speaker database transfers

the vocal tracts of individual speakers. Sources of distortion cluster into two complementary classes: additive noise (1st two example), distortions resulting from the convolution of the speech signal with an unknown linear system Corrupted waveform may be preprocessed in such a way that the resulting parameters are closely related to those of clean speech. Techniques in this category include spectral subtraction, spectral mapping, and inherently robust parameterizations. These methods only use statistical information about the interfering noise in the compensation process. No account is taken of what was said. Other schemes have attempted to estimate the clean speech signal using information about the speech. These include inhomogeneous estimators using HMMs and minimum mean square error estimators. 2nd class of methods attempt to modify the pattern matching stage in order to account for the interfering noise. Methods using this approach include noise masking, state based filtering, cepstral mean compensation, HMM decomposition, and parallel model combination (PMC). PMC there is no mismatch between training and test conditions. Invariably in real applications there is some mismatch, either in the form of additive noise, or variations in the channel conditions. Some method for compensating the parameters of the models is, therefore, required. PMC is a method for compensating the parameters in a computationally efficient manner. A system for automatically storing a message comprises a telecommunications device for transmitting and receiving an audio message coupled to a telecommunication network. It further comprises a data processing system including a speech recognition system connected to the telecommunication device. The telecommunication device has a control unit which transfers the audio message to the data processing system. The data processing system then converts said audio message into a digital signal. The system has a memory to store said digital signal. The speech recognition system converts the digital signal into a text file and stores it in its memory. The converted audio message can be stored and managed in a message data base or a message managing system as a text file. This is advantageous as a user can easily select a message out of a plurality of messages when all messages are in a visualized text form If speech recognition system is speaker independent, it receives the audio message and converts it into a text file. The data processing system can the process this text file easily, in a message management program.

Automatic speech analysis

If the speech recognition system is speaker dependent, the speech recognition system has to be adapted to the respective speaker call. Each system is equipped with a speaker dependent speech recognition system having a data base or a parameter set which has been adapted individually to the respective owners voice. This individually different data base or parameter set is then transmitted from the respective callers data processing device, via a data communication network, to the called persons data processing system which then converts audio message into a text file. In some cases people speaking the same language cannot understand each other because of the accents they are accustomed to. The speakers speech is recorded and a statistical model is created from the recorded speech. The statistical model is then compared to previously prepared statistical models of speakers with known different levels of conformity to the desired accent.

Vous aimerez peut-être aussi