Speaker Recognition

Designing of MatLab
Based Automatic Speaker

Recognition Systems
Jamel Price
Ali Eydgahi
Department of Engineering and Aviation Science
University of Maryland Eastern Shore
Outline
Introduction

Project Description
Motivation
Speech Recognition Process

Digital Signal Processing

Feature Extraction
Mel Frequency Cepstrum Coefficient
Feature Matching
Signal Models

Hidden Markov Modeling

Dynamic Time Warping

Proposed MATLAB based Speaker Recognition System

Sample Output
Automatic Speech Recognition Data Files
Future Work
Strategic Approach
Introduction
Project Description
This project represents one of the many design and development activities offered to
undergraduate students in the area of Science, Technology, Engineering and
Mathematics.
This presentation describes the design of an automatic speaker recognition system
using the Matlab software environment, which was a part of a NASA Langley
Research collaboration through the Chesapeake Information Based Aeronautic
Consortium (CIBAC).
Motivation
Advances in aerospace technology have brought tremendous resources within reach

of todays pilots. This research responds to the urgent need to drastically and
effectively reduce cockpit interface complexity and pilot workload.
The proposed software attempts to deliver effective and commercially attractive voice
interface solutions that allow pilots to interact with their cockpit environment in a safe
and more efficient manner.
Speech Recognition
Process
Digital Signal Processing

Digital signal processing (DSP) is the

processing of signals by digital means. Through
a processing techniques a discrete or digital
output analogous to the analog signal is
produced. Signals are processed in order to
improve signal quality or to extract information.
A digital signal processor can be replicated by
means of software using a data-manipulation
and development environment such as
Matlab.
Speaker recognition is the process of automatically recognizing who is speaker based

on unique characteristics contained in speech waves. This makes is possible to use a
speakers voice to verify their identity and control access to private services.
All speaker recognition systems at the highest level contain two modules, feature
extraction and feature matching.
Feature Extraction The process of extracting unique information from speech files that can
later be used to identify the speaker.
Feature Matching The process of actually identifying the speaker which involves comparing
the unknown voice data with a database of know speakers stored in the systems database.
Block Diagram of Speaker Recognition System
Feature Extraction Process

A voiceprint represents the most basic, yet unique, features of the speech command
in the frequency domain.
A voiceprint is merely a matrix of numbers in which each number represents the
energy or average power that is heard in a particular frequency band during a specific
interval.
During the feature extraction stage a database of voiceprints is created in order to
be used as a reference in the feature matching stage.
Techniques used to parametrically represent a voice command for speech

recognition tasks include, but is not limited to Mel-frequency cepstrum coefficients
(MFCC).
The MFCC technique was used in this project. The MFCC technique is based on the
known variation of the human ears critical bandwidth frequencies with filters spaced
linearly at low frequencies, below 100 Hz, and logarithmically at high frequencies,
above 100 Hz.
MFCC Process for Feature Extraction
Feature Matching Process

To verify a voice command, the MFCC process is completed on the unknown

utterance spoken input into the system.
This newly attained voiceprint is compared against those reference voiceprints
created and stored in the database during the feature extraction stage.
Using a pattern recognition technique, similarities and differences between the
unknown signal model and reference voiceprints will be determined.
Signal Models
Real world processes generally produce observable outputs which can be

characterized like signals. Signals can be discrete in nature or continuous in
nature (speech). There are two types of signal models:
Deterministic Models (Dynamic Time Warping)

Generally exploit some known specific properties of the signal to determine values of
new signal parameters
Statistical Models (Hidden Markov Modeling)

One tries to characterize only the statistical properties of the signal via a stochastic
process. A probability of the likelihood of unknown signal is computed using a given
model.
Dynamic Time Warping

A common task with continuous data is comparing one series with

another. In the case where the time series have the same
component shapes but do not match we must warp the time axis of
one or both series to achieve better alignment.
A warping path defines a mapping between the series in question.
There are exponentially many warping paths, but we are interested
in the path which minimizes the warping cost.
The voiceprint with the lowest warping cost represents the alignment
between the input voice command and reference voiceprint. This
voiceprint is the match.
Hidden Markov Modeling

Given the model parameters created for the reference speakers, the
probability of hidden states that could have generated a particular unknown
output sequence is computed using the Viterbi algorithm.
The Viterbi algorithm makes one key assumption.
The most likely hidden sequence up to a certain point t must depend only on
the observed event at point t, and the most likely sequence at point t 1.
In order to implement a speech recognition system using HMM the following

steps must be taken:

For each reference voice command, a Markov model must be built
using parameters that optimize the observations of the word.

A calculation of model likelihoods for all possible reference models
against the unknown model must be completed using the Viterbi
algorithm followed by the selection of the reference with the highest
model likelihood value.
Proposed MATLAB
based Speaker
Recognition System
Training
Typing main in the MatLab command

window executes the program.
The main menu gives the user the option of
training the system, testing the system or
clearing the systems database.
Selecting Train gives the user the option of
creating an entirely new reference database
or adding a speaker to the existing
database.
During the training phase, analog voice data
is converted to discrete numerical data using
MFCC code. These voiceprints are then
automatically stored in an excel database
according to an unique identifier.
Reference Database in Excel
Testing
During the testing phase, a voice print is

created on the unknown voice data and
stored as a temporary excel file.
If Dynamic Time Warping is selected

as the pattern matching algorithm, this
unknown voiceprint is compared
against each voiceprint stored in the
reference database and the identifier of
the voiceprint with the smallest warping
path is returned.
If Hidden Markov Modeling is selected,
a HMM is created for the unknown
voiceprint as well as each voiceprint in
the database. The unknown model is
then compared against the latter and
the identifier of the voiceprint with the
highest likelihood value is returned.
Dynamic Time Warping Command Window Output
Hidden Markov Modeling Command Window Output
Graphical Output
Automatic Speaker Recognition Data Files Created During

Recognition Process
acoustic_data.xls workbook
which contains voiceprints of each
speaker; analogous to a database.
*.wav input voice waveform of each

speaker recorded @ 16Bits with a
sample rate of 12KHz. Although these
files are pictured here, they are
normally deleted immediately after
feature extraction.
*.mat a proprietary MATLAB file

format which stores data in binary
form. Hidden Markov Models are
stored with the .mat extension
test_data.xls, unknown.wav,
unknown.mat (not pictured)
contain voiceprint, waveform and
Hidden Markov model of unknown
speaker respectively. These files are
deleted immediately following testing
stage.
Future Work
Using an additional program referred to as the MATLAB compiler, we will convert the
application into self contained C-code.
Implement the Speech Recognition program into Texas Instruments DSP hardware
(TMS320).
Development of Speech Recognition System

An automatic speech recognition system is termed a speaker-independent system.

Unlike a speaker recognition system which is speaker dependent, a speakerindependent system is designed to work for any speaker of a specific language such
as English.
Tasks will be executed upon system

recognition of speaker-independent
voice commands stored in the systems
reference database.
Reference Database in Matlab

Speaker Recognition

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Speaker Recognition

Transféré par

Droits d'auteur :

Formats disponibles

Designing of MatLab

Based Automatic Speaker

Speech Recognition Process

Digital Signal Processing

Proposed MATLAB based Speaker Recognition System

Advances in aerospace technology have brought tremendous resources within reach

Digital Signal Processing

Digital signal processing (DSP) is the

Speaker recognition is the process of automatically recognizing who is speaker based

Block Diagram of Speaker Recognition System

Feature Extraction Process

Techniques used to parametrically represent a voice command for speech

MFCC Process for Feature Extraction

Feature Matching Process

To verify a voice command, the MFCC process is completed on the unknown

Real world processes generally produce observable outputs which can be

Deterministic Models (Dynamic Time Warping)

Statistical Models (Hidden Markov Modeling)

Dynamic Time Warping

A common task with continuous data is comparing one series with

Hidden Markov Modeling

In order to implement a speech recognition system using HMM the following

Typing main in the MatLab command

Reference Database in Excel

During the testing phase, a voice print is

If Dynamic Time Warping is selected

Dynamic Time Warping Command Window Output

Hidden Markov Modeling Command Window Output

Automatic Speaker Recognition Data Files Created During

*.wav input voice waveform of each

*.mat a proprietary MATLAB file

Development of Speech Recognition System

An automatic speech recognition system is termed a speaker-independent system.

Tasks will be executed upon system

Reference Database in Matlab

Vous aimerez peut-être aussi