Vous êtes sur la page 1sur 23

Linear Predictive Coding Methods Speech Processing

ECE 5525

Final Project Report

By: Mohamed M. Eljhani Fall 2010

Problem Description
The main purpose of this project is to show the different between three linear predictive methods by implementing a Matlab program that convert from a frame of speech to a set of linear Prediction coefficients, using three basic methods, i.e. The Auto-correlation Method. The Covariance Method. The attice !ilter Method.

Choose a section of a steady state vowel, and a section of unvoiced speech, to plot PC spectra from the three methods along with the normal spectrum from the "amming window weighted frame. #rite a MAT A$ program to convert from a frame of speech to a set of linear prediction coefficients, using % methods, i.e., the Autocorrelation Method, the Covariance Method, and the attice !ilter Method. Choose a section of a steady state vowel, and a section of unvoiced speech, and plot PC spectra from the % methods along with the normal spectrum from the "amming window weighted frame. &se ' ( %)), p ( *+, with "amming #indow weighting for the autocorrelation method. &se the same parameters for the Covariance and attice Methods. &se the files ah.wav to get a vowel steady state sound beginning at sample %))), and the file test *,-.wav to get a fricative beginning at sample %))). .'ote that for the covariance and lattice methods, you also need to preserve p samples before the starting sample at n ( %))) for computing correlations, and error signals/.

Problem Solution
The following MAT A$ program reads in a file of speech and computes the original spectrum .of the signal weighted by a "amming window/, and plots on top of this the PC spectrum from the autocorrelation, covariance, and lattice methods. There is a main program and three functions .durbin for the auto-correlation method, choles-y for the covariance method although simple matri0 inversion was used, rather than the Choles-y decomposition, and lattice for the traditional lattice method/.
1 1 read in speech file, chooses section of speech and solve for set of 1 lpc coefficients using the autocorrelation method, the covariance 1 method and the lattice method 1 1 plot the resulting spectra from all three methods 1 1 read in waveform for a speech file 1 1 20in,fs,mode,format3(loadwav.4ah5truncated.wav4/60in/ filename(input .4enter speech filename74,4s4/6 20in,fs,mode,format3(loadwav.filename/6 1 normali8e input to 2-*,*3 range and play out sound file 0inn(0in9ma0.0in/6 sound.0inn,fs/6 2nrows ncol3(si8e.0in/6 m(input.4starting sample for speech frame74/6 '(input.4frame duration74/6 p(input.4lpc order74/6 wtype(input.window type.*("amming, )(:ectangular/7/6 stitle(sprintf.4file7 1s, ss7 1d '7 1d p7 1d4,filename,m,',p/6 1 print out number of samples in file, read in plotting parameters fprintf.4 number of samples in file7 1;.)f <n4, nrows/6 1 autocorrelation method--choose section of speech, window using "amming 1 window, compute autocorrelation for i(),*,...,p 1 get original spectrum 0ino(2.0in.m7m='-*/.>hamming.'//4 8eros.*,?*+-'/36 h)(fft.0ino,?*+/6 f()7fs9?*+7fs-fs9?*+6 plot.f.*7+?;/,+)>log*).abs.h).*7+?;////,title.stitle/,ylabel.4log magnitude .d$/4/,... 0label.4fre@uency in "84/6 hold on6

1 autocorrelation method 0f(0in.m7m='-*/6 2:,A,-,alpha,B3(durbin.0f,',p,wtype/6 alphap(alpha.*7p,p/6 num(2* -alphap436 2ha,f3(fre@8.B,num,?*+,fs/6 plot.f,+)>log*).abs.ha//,4--4/6 1 fvtool.B,num/6 1 covariance method--choose section of speech, compute correlation matri0 1 and covariance vector 0c(0in.m-p7m='-*/6 2phim,phiv,AC,alphac,BC3(choles-y.0c,',p/6 numc(2* -alphac436 2hc,f3(fre@8.BC,numc,?*+,fs/6 plot.f,+)>log*).abs.hc//,4g--4/6 1 fvtool.BC,numc/6 1 lattice method--choose section of speech, compute forward and bac-ward 1 errors 0l(0in.m-p7m='-*/6 2A ,alphal,B ,-3(lattice.0l,',p/6 alphalat(alphal.7,p/6 numl(2* -alphalat436 2hl,f3(fre@8.B ,numl,?*+,fs/6 plot.f,+)>log*).abs.hl//,4r-.4/6 legend.4windowed speech4,4autocorrelation method.-/4,4covariance method.--/4,4lattice me1 fvtool.B ,numl/6 function 2:,A,-,alpha,B3(durbin.0f,',p,wtype/ 1 function 2:,A,-,alpha,B3(durbin.0f,',p,wtype/ 1 1 compute window based on wtype6 wtype(* for "amming window, wtype() for 1 :ectangular window 1 1 compute :.)7p/ from windowed 0f 1 1 solve Curbin recursion fo A,-,alpha in D easy steps 1 step *--A.)/(:.)/ 1 step +---.*/(:.*/9A.)/ 1 step %--alpha.*,*/(-.*/ 1 step E--A.*/(.*--.*/.F+/A.)/ 1 steps ?-D--for i(+,%,...,p6 1 step ?---.i/(2:.i/-sum from j(* to i-* alpha.j,i-*/.>:.i-j/39A.i-*/ 1 step ,--alpha.i,i/(-.i/ 1 step ;--for j(*,+,...,i-* 1 alpha.j,i/(alpha.j,i-*/--.i/alpha.i-j,i-*/ 1 step D--A.i/(.*--.i/.F+/A.i-*/ 1 if wtype((* win(hamming.'/6 else

win(bo0car.'/6 end 1 window frame for autocorrelation method 0f(0f.>win6 1 compute autocorrelation for -()7p :.-=*/(sum.0f.*7'--/.>0f.-=*7'//6 end 1 solve for lpc coefficients using Curbin4s method A(8eros.*,p/6 -(8eros.*,p/6 alpha(8eros.p,p/6 A.*/(:.*/6 ind(*6 -.ind/(:.ind=*/9A.ind/6 alpha.ind,ind/(-.ind/6 A.ind=*/(.*--.ind/.F+/>A.ind/6 for ind(+7p -.ind/(.:.ind=*/-sum.alpha.*7ind-*,ind-*/4.>:.ind7-*7+///9A.ind/6 alpha.ind,ind/(-.ind/6 for jnd(*7ind-* alpha.jnd,ind/(alpha.jnd,ind-*/--.ind/>alpha.ind-jnd,ind-*/6 end A.ind=*/(.*--.ind/.F+/>A.ind/6 end B(s@rt.A.p=*//6 function 2phim,phiv,AC,alphac,BC3(choles-y.0c,',p/ 1 choles-y decomposition 1 1 first compute phim.*,*/...phim.p,p/ 1 ne0t compute phiv.*,)/...phiv.p,)/ 1 for i(*7p for -(*7p phim.i,-/(sum.0c.p=*-i7p='-i/.>0c.p=*--7p='--//6 end end for i(*7p phiv.i/(sum.0c.p=*-i7p='-i/.>0c.p=*7p='//6 phi8.i/(sum.0c.p=*7p='/.>0c.p=*-i7p='-i//6 end phi)(sum.0c.p=*7p='/.F+/6 1 use simple matri0 inverse to solve e@uations--come bac- to Choles-y decomposition later phiv(phiv46 1 solve using matri0 inverse, phim>alphac(phiv 1 alphac(inv.phim/>phiv 1 alphac(inv.phim/>phiv6

AC(phi)-sum.alphac4.>phi8/6 BC(s@rt.AC/6 function 2A ,alphal,B ,-3(lattice.0c,',p/ 1 lattice solution to lpc e@uations 1 1 follow *) step solution 1 step *--set e.)/.m/(b.)/.m/(s.m/ 1 step +--compute -*(alpha.*,*/ from basic lattice reflection coefficient 1 e@uation D.DG 1 step %--determine forward and bac-ward errors e.*/.m/ and b.*/.m/ from 1 e@ns D.DE and D.D; 1 step E--set i(+ 1 step ?--determine -i(alpha.i,i/ from e@n D.DG 1 step ,--determine alpha.j,i/ for j-*,+,...,i-* from e@n D.;) 1 step ;--determine e.i/.m/ and b.i/.m/ from e@ns. D.DE and D.D; 1 step D--set i(i=* 1 step G--if iH(p, go to step ? 1 step *)--finished e.7,*/(0c6 b.7,*/(0c6 -.*/(sum.e.p=*7p=',*/.>b.p7p='*,*//9s@rt..sum.e.p=*7p=',*/.F+/>sum.b.p7p='-*,*/.F+///6 alphal.*,*/(-.*/6 btemp(2) b.7,*/4346 e.*7'=p,+/(e.*7'=p,*/--.*/>btemp.*7'=p/6 b.*7'=p,+/(btemp.*7'=p/--.*/>e.*7'=p,*/6 for i(+7p -.i/(sum.e.p=*7p=',i/.>b.p7p='-*,i//9s@rt..sum.e.p=*7p=',i/.F+/>sum.b.p7p='*,i/.F+/alphal.i,i/(-.i/6 for j(*7i-* alphal.j,i/(alphal.j,i-*/--.i/>alphal.i-j,i-*/6 end btemp(2) b.7,i/4346 e.*7'=p,i=*/(e.*7'=p,i/--.i/>btemp.*7'=p/6 b.*7'=p,i=*/(btemp.*7'=p/--.i/>e.*7'=p,i/6 end A (sum.0c.p=*7p='/.F+/6 for i(*7p A (A >.*--.i/.F+/6 end B (s@rt.A /6

The following plots are for a vowel from the file ah.wav, and a fricative from the file test *,-.wav.

Introduction
There e0ist many different types of speech compression that ma-e use of a variety of different techni@ues. "owever, most methods of speech compression e0ploit the fact that speech production occurs through slow anatomical movements and that the speech produced has a limited fre@uency range. The fre@uency of human speech production ranges from around %)) "8 to %E)) "8. Ipeech compression is often referred to as speech coding which is defined as a method for reducing the amount of information needed to represent a speech signal. Most forms of speech coding are usually based on a lossy algorithm. ossy algorithms are considered acceptable when encoding speech because the loss of @uality is often undetectable to the human ear. There are many other characteristics about speech production that can be e0ploited by speech coding algorithms. Jne fact that is often used is that period of silence ta-e up greater than ?)1 of conversations. An easy way to save bandwidth and reduce the amount of information needed to represent the speech signal is to not transmit the silence. Another fact about speech production that can be ta-en advantage of is that mechanically there is a high correlation between adjacent samples of speech. Most forms of speech compression are achieved by modeling the process of speech production as a linear digital filter. The digital filter and its slow changing parameters are usually encoded to achieve compression from the speech signal. inear Predictive Coding . PC/ is one of the methods of compression that models the process of speech production. Ipecifically, PC models this process as a linear sum of earlier samples using a digital filter inputting an e0citement signal. An alternate e0planation is that linear prediction filters attempt to predict future values of the input signal based on past signals. PC K...models speech as an autoregressive process, and sends the parameters of the process as opposed to sending the speech itselfL. Mt was first proposed as a method for encoding human speech by the &nited Itates Cepartment of Cefense in federal standard *)*?, published in *GDE. Another name for federal standard *)*? is PC-*) which is the method of linear predictive coding. Ipeech coding or compression is usually conducted with the use of voice coders or vocoders. There are two types of voice coders7 waveform-following coders and model-base coders. #ave form following coders will e0actly reproduce the original speech signal if no @uanti8ation errors occur. Model-based coders will never

e0actly reproduce the original speech signal, regardless of the presence of @uanti8ation errors, because they use a parametric model of speech production which involves encoding and transmitting the parameters not the signal. PC vocoders are considered model-based coders which means that PC coding is lossy even if no @uanti8ation errors occur. All vocoders, including PC vocoders, have four main attributes7 bit rate, delay, comple0ity, @uality. Any voice coder, regardless of the algorithm it uses, will have to ma-e trade offs between these different attributes. The first attribute of vocoders, the bit rate, is used to determine the degree of compression that a vocoder achieves. &ncompressed speech is usually transmitted at ,E -b9s using D bits9sample and a rate of D -"8 for sampling. Any bit rate below ,E -b9s is considered compression. The linear predictive coder transmits speech at a bit rate of +.E -b9s, an e0cellent rate of compression. Celay is another important attribute for vocoders that are involved with the transmission of an encoded speech signal. Nocoders which are involved with the storage of the compressed speech, as opposed to transmission, are not as concern with delay. The general delay standard for transmitted speech conversations is that any delay that is greater than %)) ms is considered unacceptable. The third attribute of voice coders is the comple0ity of the algorithm used. The comple0ity affects both the cost and the power of the vocoder. inear predictive coding because of its high compression rate is very comple0 and involves e0ecuting millions of instructions per second. PC often re@uires more than one processor to run in real time. The final attribute of vocoders is @uality. Ouality is a subjective attribute and it depends on how the speech sounds to a given listener. Jne of the most common test for speech @uality is the absolute category rating .AC:/ test. This test involves subjects being given pairs of sentences and as-ed to rate them as e0cellent, good, fair, poor, or bad. inear predictive coders sacrifice @uality in order to achieve a low bit rate and as a result often sound synthetic. An alternate method of speech compression called adaptive differential pulse code modulation .ACPCM/ only reduces the bit rate by a factor of + to E, between *, -b9s and %+-b9s , but has a much higher @uality of speech than PC.

The general algorithm for linear predictive coding involves an analysis or encoding part and a synthesis or decoding part. Mn the encoding, PC ta-es the speech signal in bloc-s or frames of speech and determines the input signal and the coefficients of the filter that will be capable of reproducing the current bloc- of speech. This information is @uanti8ed and transmitted. Mn the decoding, PC rebuilds the filter based on the coefficients received. The filter can be thought of as a tube which, when given an input signal, attempts to output speech. Additional information about the original speech signal is used by the decoder to determine the input or e0citation signal that is sent to the filter for synthesis.

LPC Model
The particular source-filter model used in PC is -nown as the inear predictive coding model. Mt has two -ey components7 analysis or encoding and synthesis or decoding. The analysis part of PC involves e0amining the speech signal and brea-ing it down into segments or bloc-s. Aach segment is than e0amined further to find the answers to several -ey @uestions7 P Ms the segment voiced or unvoicedQ P #hat is the pitch of the segmentQ P #hat parameters are needed to build a filter that models the Nocal tract for the current segmentQ PC analysis is usually conducted by a sender who answers these @uestions and usually transmits these answers onto a receiver. The receiver performs PC synthesis by using the answers received to build a filter that when provided the correct input source will be able to accurately reproduce the original speech signal. Assentially, PC synthesis tries to imitate human speech production. !igure demonstrates what parts of the receiver correspond to what parts in the human anatomy. This diagram is for a general voice or speech coder and is not specific to linear predictive coding. All voice coders tend to model two things7 e0citation and articulation. A0citation is the type of sound that is passed into the filter or vocal tract and articulation is the transformation of the e0citation signal into speech.

LPC Methods
P PC methods are the most widely used in speech coding, speech synthesis, speech recognition, spea-er recognition and verification and for speech storage. R PC methods provide e0tremely accurate estimates of speech parameters, and does it e0tremely efficiently. R basic idea of inear Prediction7 current speech sample can be closely appro0imated as a linear combination of past samples, i.e.

P P methods have been used in control and information theoryS called methods of system estimation and system identification R used e0tensively in speech under group of names including *. covariance method +. autocorrelation method %. lattice method E. inverse filter formulation ?. spectral estimation formulation ,. ma0imum li-elihood method ;. inner product method

LPC Estimation Issues


P need to determine TUkV directly from speech such that they give good estimates of the time-varying spectrum P need to estimate TUkV from short segments of speech P need to minimi8e mean-s@uared prediction error over short segments of speech P resulting TUkV assumed to be the actual TakV in the speech production model (W intend to show that all of this can be done efficiently, reliably, and accurately for speech

Autocorrelation Method
assume IFn .m/ e0ists for ) X m X -* and is e0actly 8ero everywhere else .i.e., window of length samples/ #here w.m/ is a finite length window of length samples

if IFn .m/ is non-8ero only for ) X m X -* then is non-8ero only over the interval ) X m X -* = p at values of m near ) .i.e., m ( ),*,Y..,p-*/ we are predicting signal from 8ero-valued samples .outside the window range/ (W eFn .m/ will be .relatively/ large at values near m ( .i.e., m ( , =*,YY, =p-*/ we are predicting 8ero-valued samples .outside the window range/ from non-8ero samples (W eFn .m/ will be .relatively/ large for these reasons, normally use windows that taper the segment to 8ero .e.g., "amming window/

Covariance Method
Covariance is a second basic approach to defining the speech segment IFn .m/ and the limits on the sums, namely fix the interval over which the mean-s@uared error is computed, giving

Changing the summation inde0 gives

Zey difference from Autocorrelation Method is that limits of summation include terms before m() (Wwindow e0tends p samples bac-wards from Iince we are e0tending window bac-wards, don[t need to taper it using a "#- since there is no transition at window edges

Autocorrelation Covariance Summar!


&se order linear predictor to predict from p previous samples Minimi8e mean-s@uare error

over analysis window of

duration -samples Iolution for optimum predictor coefficients solving a matri0 e@uation (W two solution have evolved

is based on

*. Autocorrelation Method (W signal is windowed by a tapering window in order to minimi8e discontinuities at beginning .predicting speech from 8ero-valued samples/ and end .predicting 8ero-valued samples from speech samples/ of the interval6 the matri0 is shown to be an autocorrelation function, the resulting autocorrelation matri0 can be readily solved using standard matri0 solutions +. Covariance method (W the signal is e0tended by p samples outside the normal range of to include p samples occurring prior to m() .they are available/ and eliminates the need for a tapering window6 resulting matri0 of correlations is symmetric different method of solution with somewhat different set of optimal prediction coefficients,

Lattice Method
both covariance and autocorrelation methods use two step solutions *. computation of a matri0 of correlation values +. efficient solution of a set of linear e@uations another class of P methods called lattice methods, has evolved in which the two steps are combined into a recursive algorithm for determining P parameters begin with Curbin algorithm--at the i th stage the set of coefficients are coefficients of the i th order optimum P

"inal Problem Solution


The following MAT A$ program LPC Matlab "iles#test$lpc%m reads in a file of speech and computes The original spectrum .of the signal weighted by a "amming window/, and plots on top of this the PC spectrum from the Autocorrelation method Covariance method lattice method There is a main program and % functions *. Curbin for the auto-correlation method, LPC Matlab "iles#durbin%m +. choles-y for the covariance method although simple matri0 inversion was used, rather than the Choles-y decomposition, LPC Matlab "iles#choles&!$full%m %. lattice for the traditional lattice method, LPC Matlab "iles#lattice%m LPC Matlab "iles#autolpc%m is used to generate lpc parameters.

LPC Comparisons

LPC Computations

Conclusion
inear Predictive Coding is an analysis9synthesis techni@ue to lossy speech compression that attempts to model the human production of sound instead of transmitting an estimate of the sound wave. inear predictive coding brea- up a sound signal into different segments and then send information on each segment to the decoder. The encoder send information on whether the segment is voiced or unvoiced and the pitch period for voiced segment which is used to create an e0citement signal in the decoder. The encoder also sends information about the vocal tract which is used to build a filter on the decoder side which when given the e0citement signal as input can reproduce the original speech.

'ibliograph! ( )eferences

awrence :abiner :utgers &niversity, Cept. of Alectrical and Computer Angineering Course Website: www.caip.rutgers.edu9\lrr. being changed to cronos.rutgers.edu9\lrr. . :. :abiner and :. #. Ichafer, Theory and Applications of Cigital Ipeech Processing, Prentice-"all Mnc., +)*). inear Predictive Coding, ]eremy $radbury, Cecember ?, +))).

Vous aimerez peut-être aussi