Vous êtes sur la page 1sur 46

Introduction of HMM Tool Kit (HTK)

Benefits of HTK

World recognized state-of-the-art speech recognition system


Support a variety of different input formats Support different features Support almost all common speech recognition technologies

Detail features of HTK

HTK can support a variety of different formats ex : pcm, wav, , ALIEN(unknown), etc. Feature extraction:

MFCC, filterbank, PLP, LPC, , etc.

Very free HMM definition Training Viterbi (segmentation) Forward/Backward (Baun-Welch) Single model re-estimation (change feature)
3

Detail features of HTK

HMM system refinement Context-dependent model Parameter tying/clustering Regression class tree (MLLR) Language Word grammar and network Bigram language model Decoding

Evaluate recognition results Forced alignment NBest lists/lattices


4

Detail features of HTK

HMM adaptation

MLLR/Regression Tree MAP

Mean/variance

HTK procedures

Data/Setting preparation Define Acoustic units (phone table) Define Dictionary (word) Define grammar/network Collect speech database Generate transcription Feature Extraction Set configuration file for MFCC feature extraction Prepare Script files (corpus file) Define HMMs structure (prototype) Training HMM models Prepare Script files (corpus file) Set configuration file for training, recognition, ,etc. Flat start (uniform segmentation) Viterbi search (forced alignment : segmentation ) Recognition/Performance Evaluation Viterbi search
6

HMM/Data Setting

Phone Table Dictionary Grammar Rule Define HMMs

Define Acoustic Units (Phone table)

Using our traditional 100 RCD initials 40 CI finals

Dictionary (411 Syllable table)

Using our traditional 411 syllables (plus silence) word phones_list

...

Task Grammar Rule

Task: Phone Dialing

Dial three two six five four


Dial nine zero four one oh nine Phone Woodland Call Steve Young

10

Generate Task Grammar Rule Network

Task: free-syllable decoding

Define gram file $syllable=zhi | chi | ri | a | ;


( SENT-START < $syllable [sil] > SENT-END )

Parsing gram by HParse wdnet

11

Free-Syllable Decoding wdnet


Syntax : # I Nodes # J arcs N=? L=? # define nodes I=x W=www # define arcs J=x S=y E=z ..

12

Database Preparation

Collect speech data MLF (Master Label File)

The content is in word level

Transcribe the collected speech database Corpus files (training/test set) Script files

Change MLF into Phone level labeling

Feature Extraction (MFCC)

13

Word/Phone-Level Transcriptions

Word Master Label File

Phone Master Label File

#!MLF!# */4_t0062_t0062331.lab tai yin . */4_t0062_t0062340.lab . .

using HLEd to transform

#!MLF!# */4_t0062_t0062331.lab sil t ai NULL yin sil . */4_t0062_t0062340.lab . .

14

EX
IS sil sil DE sp

15

Feature Extraction
HCOPY : Data Copy (with format changing)

16

Script files

codetr.scp
source destination

17

Feature Extraction Configuration


# byte order NATURALREADORDER=TRUE NATURALWRITEORDER=TRUE # Waveform parameters SOURCEFORMAT=ALIEN HEADERSIZE=256 SOURCERATE=1250.0 # Coding parameters TARGETKIND=MFCC_E TARGETRATE=100000.0 SAVECOMPRESSED=F SAVEWITHCRC=T WINDOWSIZE=320000.0 # ZMEANSOURCE=T USEHAMMING=T PREEMCOEF=0.97 NUMCHANS=20 USEPOWER=F #normalized the dynamic range of MFCC CEPLIFTER=22 LOFREQ=0 HIFREQ=4000 NUMCEPS=12 ENORMALISE=T DELTAWINDOW=2 ACCWINDOW=2

18

HMM Configuration

Config File (command-level) Command C config_file User Defaults > export HCONFIG=my_HTK_config Built-in Defaults ref Chap 18 in HTK manual

19

Define HMM Structure

20

HMM Prototype Definition


<TransP> 5 ~o <VecSize> 39 <MFCC_Z_E_D_A> 0.0 1.0 0.0 ~h "proto" 0.0 0.5 0.5 <BeginHMM> 0.0 0.0 0.5 <NumStates> 5 0.0 0.0 0.0 <State> 2 <NumMixes> 4 0.0 0.0 0.0 <Mixture> 1 0.25 <EndHMM> <Mean> 39 <Variance> 39 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.5 0.0

21

Create HMM Prototypes

22

Training Procedure

Model Initialization

Flat start (unknown segmentation uniform segmentation) Viterbi search (given segmentation) Forward/backward only in word level
Mixture splitting

Model Refinement

23

Configuration file for Training/Test


# byte order #BYTEORDER=VAX NATURALREADORDER=TRUE NATURALWRITEORDER=TRUE # MFCC parameters SOURCEFORMAT=HTK SOURCERATE=100000.0 TARGETKIND=MFCC_E_D_A_Z TARGETRATE=100000.0 DELATWINDOW=2 ACCWINDOW=2 #new variable can replace the varFloor VARFLOORPERCANTILE = 20
24

Training Corpus
Mat4500_train.scp Mat4500_train_phones.mlf

25

Flat start

Viterbi search
Forward/Backward

26

Initialize HMMs from Flat Start

27

Initialize HMMs from Viterbi Search

28

Utterance Segmentation *.mlf

mat4500_train.mlf (phone-level with segmentation information)

29

Silence and short pause model


sp share the middle state for silence Sil.hed

AT 2 4 0.2 {sil.transP} AT 4 2 0.2 {sil.transP} AT 1 3 0.3{sp.transP} TI silst {sil.state[3],sp.state[2]}

30

Mixture Splitting

31

Mixture Splitting Script

MU2.hed

32

Recognition/Evaluation Procedure
Recognition
Evaluation

33

Test Corpus

Mat4500_test.scp

Mat4500_test.mlf

34

Two Type of Result Formats

35

Confusion Matrix Analysis

36

Force Alignment

Viterbi decoding

HVite using option -a You can get some statistics of the HMM segmentation Useful for mixture number determined

37

Speaker Adaptation MLLR, MAP

MLLR

In training phase generate the states occupation statistics % HERest s HHed RN models //ReName hmmid LS stats //loads states occupation statistics RC 32 rtree //Regression class = 32 or RC 32 rtree {sil.state[2-4].mix}
38

force alignment of adaptation data %Hvite -a -I adapWords.mlf -m . Find global MLLR %HEAdapt C -g -K global.tmf -I adapPhone.mlf . *.tmf : transform model file Find MLLR regression Tree] %HEAdapt C -J global.tmf K rc.tmf -I adapPhone.mlf Recognition %HVite -J rc.tmf .
39

MAP adaptation

HEAdapt C -j 0.9 -k -I adapPhone.mlf -j : weight -k : using MLLR before MAP

40

Further topics

Model/state tying (HMM definition) Context-dependent model Fast training/search (Beam search) Insertion/Deletion problem Duration constraint word transition penalty Word Lattice output

41

Detail options for the HTK commands

HCompV

Typical arguments HCompV C xxx f 0.01 m S *.scp M output_dir hmm -m : update mean -f f : set varFloor to f*global variance in hmm macro ~o ~v varFloor1 <Variance> 38 ..
42

Detail options for the HTK commands

HERest

Typical arguments HERest C xxx I *.mlf t 250.0 150.0 1000.0 -S *.scp H hmm_macros H hmm_defs M output_dir hmmlist -t f [i l] : set the pruning threshold to f f f+i until f=l -T tracing option octal number, command dependent

00020 show occupation counts


43

Detail options for the HTK commands

HVite

Typical arguments HVite H hmm_macros H hmm_defs S *.scp i output_mlf w wdnet p 0.0 s 5.0 t 250 dict tiedlist -t f [i l] : set the pruning threshold to f f f+i until f=l -m : show model boundaries -a : force alignment, -I input.mlf -p, -s : word insertion penalty, weight for grammar score
44

Detail options for the HTK commands

HResult

Typical arguments HResult I *.mlf hmmlist answer.mlf -n : use NIST -e s t : label t is made equivalent to s

45

Detail options for the HTK commands

HInit

Typical arguments HInit S *.scp M hmm_macro H hmm_defs model Typical arguments HRest S *.scp M hmm_macro H hmm_defs model Use wavesufer.

HRest

HSLab

46

Vous aimerez peut-être aussi