Vous êtes sur la page 1sur 13

VOICE PREDICTION AND SPEECH

COMPRESSIONUSING ADSP2105

http://www.eforu.page.tl/
ABSTRACT

The objective of this paper is to develop automatic voice prediction, compression


and reproduction system. The prediction is the process of identifying the voiced signal,
and omit if it is unvoiced signal. The predicted signal is stored in the form of compressed
signal. Then it is reproduced whenever it is required.
The problem involved in the preservation of voice from noisy environment is
discussed here.
The paper is concentrated on lossless and lossy compression of speech signals. the
LPC have been shown to be very robust against noise and perform well even for irregular
aperture times.
ADSP2105 processor is used for implementing the prediction, compression and
retrieves the speech. The performance of the system was tested with some algorithm such
as CELP, ADPCM, LDCELP, CSACELP and LPC10.

http://www.eforu.page.tl/
-authors
INTRODUCTION

In this scientific world electronics had made it possible to control huge


machineries by means of a single tiny micro chips. Likewise in the mobile technology
especially in cellular technology speech compression plays a vital role. The current
compression techniques are lossy, but it is possible to reduce the rate to 8 kbps with
almost no perceptible loss in quality that too at a lower cost. Our paper is based on the
principle of Linear Predictive Coding (LPC).

SPEECH ANALYSIS

The speech analysis is the process of measuring parameters of speech as well as


study of speech creation and phonemes.
The three parameters of speech are:
♦ Pitch --------Corresponds to frequency of sound.
♦ Loudness---Corresponds to intensity of sound.
♦ Quality------Corresponds to harmonic content of sound.

http://www.eforu.page.tl/
‘Phonemes’ are the smallest and basic unit of sound which can be recognized in
contrast with their environment. The sounds that stretch from the centre of one phoneme
to next phoneme thereby spanning the transition region is called ‘Dipone’.
Some common methods of speech analysis are:
♦ Short time Fourier analysis.
♦ Linear prediction coding.
♦ Homomorphic filtering.

THEORY ON SPEECH COMPRESSION

Speech compression is a data compression design to reduce the size of data files.
To achieve the compression effect there exist many lossy and lossless algorithms.

Shannon lossless source coding theorem is based on the concept of block coding.
To illustrate this concept, let us consider a special information source in which the
alphabet consists of only two letters: A = {a, b}

Here, the letters `a' and `b' are equally likely to occur.

An n-th order block code is just a mapping which assigns to each block of n
consecutive characters a sequence of bits of varying length. The following examples
illustrate this concept.

Third order block code: Triplets of characters are mapped to bit sequence of lengths one
through six .

Char P(Char) Code word


aaa 0.405 0
bbb 0.405 10
aab 0.045 1100
abb 0.045 1101
bba 0.045 1110
baa 0.045 11110
aba 0.005 111110
bab 0.005 111111

http://www.eforu.page.tl/
Here R=0.68 bits/character.
where R=1/n ∑ p(Bn) l(Bn) bits/sample.

where l(Bn) is the length of the codeword for block. Bn

An example:

Note that 17 bits are used to represent 24 characters --- an average of 0.71 bits/character.

LINEAR PREDICTIVE CODING (LPC):

Linear Predictive Coding (LPC) is one of the most powerful speech analysis
techniques, and one of the most useful methods for encoding good quality speech at a low
bit rate. It provides extremely accurate estimates of speech parameters, and is relatively
efficient for computation.
LPC is frequently used for transmitting spectral envelope information, and as such
as to be tolerant for transmission errors. Historically, digital speech signals are sampled at
a rate of 8000 samples/sec. Typically, each sample is represented by 8 bits (using mu-
law). This corresponds to an uncompressed rate of 64 kbps

MATHEMATICAL MODELLING:

http://www.eforu.page.tl/
The digital speech signal is the output of a digital filter (called the LPC filter)
whose input is either a train of impulses or a white noise sequence.
The LPC filter is given by: H(z) = 1/{1+a1z-1+…+a10z-10}

The input-output relationship of the filter is given by the linear difference equation:

s(n) + ∑ ak * s(n-1) = u(n)

The LPC model can be represented in vector form as: A = {a1, a2,…,a10,G,V/UV,T}

A changes every 20 msec or so. At a sampling rate of 8000 samples/sec, 20 msec is


equivalent to 160 samples. The digital speech signal is divided into frames of size
20msec. There are 50 frames/second. The model says that vector A ix equivalent to
S={s(0),s(1),…s(159)}. Thus the 160 values of S is compactly represented by the 13
values of A.

LPC Synthesis: Given A, generate S.

LPC Analysis: Given S, find the best A.

LPC ANALYSIS

Consider one frame of speech signal S={s(0),s(1),….,s(159)}. The signal s(n) is


related to the innovation u(n) through the linear difference equation

http://www.eforu.page.tl/
s(n) + ∑ak s(n-k) = u(n) k varies from 1 to10. The ten LPC parameters (a1, a2,..,a10) are
chosen to minimize the energy of the innovation: f = ∑ u2(n) n varies from 0 to159.
Using standard calculus, we take the derivative of f with respect to a i and set it to zero:
df/dak= 0. From this the ten coefficients can be determined. .

Bits allocation can be shown as follows:

The 34 bits for the LSP are allocated as follows:

LPC No. of bits


a1 3
a2 to a5 4 each

a6 to a10 3 each

VOICED OR UNVOICED

Voiced sounds are produced by vibrations of the vocal cords. Their spectrum is
periodic with some fundamental frequency (which corresponds to the pitch). Examples of
voiced sounds include all of the vowels. Unvoiced signals do not have a fundamental
frequency or a harmonic structure. Instead, they are white noise.
For Voiced Sounds (V): The impulse train is shifted (insensitive to phase change).
For Unvoiced Sounds (UV): A different white noise sequence is used.
S (z) = E (z) A (z)
Where 1/A (z) is Transfer function

http://www.eforu.page.tl/
S (z) is the original speech signal.
The spectrum of the error signal E(z) will have a different structure depending on
whether the sound it comes from is voiced or unvoiced . Thus we can predict whether it is
voiced or unvoiced.

Code Excited Linear Prediction (CELP)

The block diagram of the CELP encoder is shown below:

CODEC
CODEC is the combination of coder and decoder. The can convert the analog to
digital signal and vice versa.

There are three types of codec. They are:


♦ Source codec.
♦ Wave codec.
♦ Hybrid codec.
Here hybrid codec is used because it is having the advantage of both the source and wave
codec.

Encoder:

http://www.eforu.page.tl/
Decoder

ADSP2105 – OVER VIEW

ADSP2105 is a 24 bit DSP processor with on chip memory and Harvard


architecture for three bus performance, instruction and dual data bus. It has independent
computation units such as ALU, Multiplier/accumulator and shifter. Also it offers a
single cycle instruction execution and multifunction instruction set and has 13.824 MIPS.
It has on-chip program memory RAM or ROM, Data memory RAM, Integrated
peripherals – serial ports and timers.

IMPLEMENTATION

Various modules of the total tasks are:


1. Pre – emphasis of speech signal.
2. Gain calculation.
3. Auto correlation of speech signal frames.
4. Pitch prediction.

http://www.eforu.page.tl/
5. LPC analysis.
6. Storing of signals.
7. LPC synthesis and De – emphasis of speech signals.

IMPLEMENTATION BLOCK DIAGRAM

TEST RESULTS

The hardware designed for speech compression is tested with five different
algorithms such as ADPCM, LDCELP, CSACELP, CELP and LPC10.

COMPRESSION TEST

Total no. of bits requirement for different algorithm:

ORIGINAL ADPCM LDCELP CSACELP CELP LPC10


64000 32000 16000 8000 4800 2400

http://www.eforu.page.tl/
COMPRESSION STATUS
70000
60000
50000
BITS

40000
30000
20000 Series
10000
0 1

P
LD M

10
LP
P
AL

EL
EL
PC
IN

CE

C
AC
C

LP
G

AD
RI

CS
O

ALGORITHMS

QUALITY TEST

The performance of the coder is analysed by its quality of reproduced signal. a


DRT(Diagnostic Rhyme Test) is a means of determining the intelligibility of a system as
a percentage of correct word recognition from a standardized list of pairs of rhyming
words. Eg. Goat vs coat or Thick vs Tick. The DAM (Diagnostic Acceptability Measure)
rates the intelligibility and subjective acceptability using procedures that eliminate much
of the dependence on personal preferences of the listeners.

TEST/ NATURAL LPC10


METHOD SPEECH
DRT 95 90
DRT with noise 92 82
DAM 65 48

http://www.eforu.page.tl/
QUALITY TEST
100
80
QUALITY IN
PERCENT

60 NATURAL
40 SPEECH
LPC 10
20
0
DRT DRT DAM
withNOISE
TESTING METHOD

DELAY TEST

The effect of encoding and decoding delay a voice digitization/ compression


algorithm which must be considered in the context of the particular application.

ADPCM(ms) LPC10(ms) CSACELP(ms) CELP(ms)


0.125 2.5 10 30

DELAY TEST
35
DELAY IN ms

30
25
20
15
10
5 Series1
0
10
M

P
P

EL
EL
PC

C
C
LP
AD

SA
C

METHODS

http://www.eforu.page.tl/
APPLICATIONS

1. Digital cellular technology.


2. Internet band width sharing.
3. Answering machine.
4. Pagers.

CONCLUSION

The LPC and its derivative methods are most superior to all, because it can be
implemented easily, better compression ratio and it gives better signal quality. It is one
of the most powerful speech analysis techniques and one of the most useful methods for
encoding good quality speech signal at a low bit rate and provides extremely accurate
estimates speech parameters. And parallel processors can be involved to process audio,
video signals separately for the real time low cost applications.

REFERENCES

1. L. R. Rabiner and R. W. Schafer, ‘Digital processing of speech signals’


2. S. Furui, ‘Digital speech processing, synthesis and recognition’
3. A. J. Rubio Ayuso and J. M. Lopez Soler, ‘Speech recognition and coding: New
advances and trends’.
4. M. R. Schroeder, ‘Computer speech: recognition, compression and synthesis’

http://www.eforu.page.tl/

Vous aimerez peut-être aussi