Vous êtes sur la page 1sur 23

Master of Science Thesis Data Communication Systems

Vector Quantization and Speech encoding methods Part 2


(Continue from previous)

Brunel University Electronic and Computer Engineering London, UK

Ippokratis Karakotsoglou System Engineer Telecommunications

3.1 Vector Quantization Analysis


3.1.1 Principal operation
Although Vector Quantization is a new area, it has developed very fast and now is becoming more and more an area of research. By using Vector Quantization we can have lower distortion than when we use scalar quantization at the same rate. Vector Quantization can be realized as a pattern matching technique regardless its nature. In Vector Quantization (VQ) we group the source output into vectors. We can think for example that L-consecutive samples of speech constitute an L-dimensional vector. This is the input to the vector quantizers. The encoder and the decoder have a set of L-dimensional vectors that is called the codebook of the vector quantizer. Each vector is encoded by comparison with the contents of the codebook, which are also known as codevectors or patterns. The encoder takes an input vector and outputs the index of the codeword that has the lowest distortion. The lowest distortion is found by evaluating the Euclidean distance between the input vector and each codeword in the codebook. Once the codeword with the lowest distortion is found, the index of that codeword is sent through a channel. When the encoder receives the index of the codeword, it replaces the index with the associated codeword. As I have mentioned before the mean squared error is used to measure the distortion. The representative codeword is determined to be the closest in Euclidean distance from the input vector. The Euclidean distance is defined by:

Khalid Sayood Introduction to Data Compression [2] Where xj is the jth component of the input vector, and yij is the jth is component of the codeword yi.

If we use a codebook of size K, in order to inform the decoder what code-vector was selected we need to use log2K bits. Then, the number of bits per sample will be (log2K)/L bits. This is also referred as the resolution or rate of the vector quantizer. For a fixed size K the resolution is determined by the size N of the codebook. Let us consider for example the quantization of a two-dimensional vectors X=[x1x2]. The two dimensional space is divided into cells as you can see in the figure below. The cells Ck are hexagons. Each vector that falls in a cell is quantized into a vector Y that is the center of the hexagon (centroid). The image at the end of this document shows 21 vectors. We have then L=21 one for each of the 21 hexagons into which the two dimensional-space has been partitioned. The distortion that this method introduces is the measure of the mean square error that is:

d2(X,Y)=(X-Y)(X-Y)=1/n (xk-yk)2
K 1

Gray [6] gives a definition of Vector Quantization as follows: A Vector Quantizer of dimension k and size N is a sampling from vector (or a point) in k-dimensional Euclidean space, Rk, into a finite set C containing N output or reconstruction points, called code vectors or codewords. Thus, Q:Rk->C, where C=(y1,y2,.,yN) and yi E Rk for each iEJ {1,2,N}. The set C is called the codebook.

3.1.2 Structure of a Vector Quantizer


The task of an encoder of a Vector Quantizer is to examine each input vector x and find in which cell of the k-dimensional space Rk lies. The encoder uses a codebook

to identify the index of the cell, which is sent over to the decoder. The decoder then generates the code vector yi that represents this region. Gray [6] defines this as a function that models this operation is given by Si(x) =1 if x Ri or 0 otherwise The quantizer then can be represented as follows: Q(x)= yiSi(x)
i 1 N

This model is called structural decomposition Gersho-Gray [6] and can be seen in the image at the end of this document . Each multiplier indicated by circles simply multiplies with either one or zero to produce its output. The output of the multiplication will be either one or zero and there will be only one output non-zero making the sum a summation of zeros and only one non-zero term. It can also be seen as a memory cell (the circle multiplier) storing the code vector whose value is retrieved when the input is one. The S boxes are memoryless which examines the input vector to see if lies in particular cell. The components of the vector are the consecutive samples blocked to use Shannons term in a vector. The geometrical character in k-dimensional Euclidean space of a cell determines the complexity of the operation for obtaining the output of an S box.

3.1.3 Codebook Design


The process of placing the quantizer output points in the codebook is called codebook design. During this process we group the source outputs in N-dimensional vectors and obtain a representative set of output vectors. Just imagine for simplicity that we are plotting in two dimensions to have a two-dimensional vector for source outputs.

The codebook design can be fixed or adaptive. In the fixed design a predesigned a priori design is used which is then improves by an iterative process. This is happening in order to achieve a level of optimality. The main goal of a quantizer design is to provide the minimum average distortion by partitioning the cells appropriately. The goal is to produce cells in such a way that minimize the distortion. This is very basic because we can measure the performance and judge about the ability of all quantizers. The problem is more specifically to find the best encoding algorithm and then we can say that this encoder is optimal for this decoder. Usually encoders satisfy the (NN) nearest neighbor, which is explained further below in this document. Based on this we can speak about the optimality of the decoder. The centroid condition has to be satisfied based on the squared error distortion measure. The centroid is the optimal output of a given cell. It is the center of cell. The optimality conditions then are two. (a) The minimum average distortion has to be satisfied and (b) The centroid condition Gray [6] provided a Lemma: A quantizer is regular if it satisfies the necessary conditions for optimality with a squared error distortion measure A basic idea in the years of VQ research for an optimal design is to produce a new improved codebook from the previous codebook through the iteration process. The quality of a given codebook is based on the measure of the average distortion. Stuart Lloyd [9] used a similar approach to generate pdf-optimized scalar quantizer. Below you can see the Lloyd Algorithm Concept - Gray-Gersho [6] (a) Given a codebook Cm find the optimal partition into quantization cells which means use the nearest neighbor criteria to form the nearest neighbor cells

(b) Using the centroid criteria find C m+1, the optimal reproduction alphabet for the cells just found The key to understand the algorithm is that the algorithm checks after each iteration if the average distortion of the codebook has changed since the last iteration. If it has changed the algorithm stops otherwise continues. The previously described algorithm was the first of two algorithms described by Lloyd. The second does not use the Lloyd iteration instead is does several passes through the set of quantizer parameters where the decision boundaries and output points are iterated one at a time from left to right on the real number line - Gray [6]. Such algorithm is the Linde Buzo Gray algorithm designed by Linde, Buzo and Gray. The iterative process is known as the k-means algorithm where from a large set of output vectors known as the training set each element is assigned to a closest representative pattern taken from an initial set of k representative patterns. This pattern that is assigned to an element is then updated by computing the centroid of the training set of vectors assigned to it. When this assignment process finishes we will have k groups of vectors clustered around each of the output points. Linde, Buzo, Gray based on this approach came up with a general algorithm where inputs were not only scalars.

3.1.4 Various Calculations on Vector Quantization


From the codebook size we can derive the bits needed to inform the receiver which one of the codebook vectors is the quantizer output. Consider for example we have R vectors in the codebook. Then we need log2R bits to represent the quantizer output. If the vector is of L dimension then we used log2R to represent and send the quantizer output. Then we have a rate of log2R/L in bits per pixel.

On the table at the end of the document you can see the calculations for a range of codebook sizes. I am assuming you understand that the same codebook needs to be available both at the receiver and the transmitter in order to reconstruct the speech. Now about the codebook we have the following options: The codebook is sent over to the receiver. The codebook is already available to the receiver The receiver has the same training set to reproduce the codebook A generic codebook is used available both at the transmitter and the receiver

Sending the codebook over the receiver generates a significant overhead but it is an option. To compute this overhead we proceed as follows: Lets say each codeword in the codebook is an array of L elements. If we use B bits to represent each element of the array then we need B X L X K bits to transmit a K-level quantizer codebook. Now, suppose we want to encode samples using R bits per sample. By using an Ldimensional quantizer, which means we group L, samples together into vectors and then we will have R X L bits available to represent each vector and 2RL different output vectors. This product R X L is called the rate dimension product.

3.1.5 Computational Complexity is an issue


Complexity in implementation in signal processing is a very important issue and measured by the amount of computations needed per unit time. Computational Complexity is highly dependant from hardware architectures and one needs to consider the particularities of the various hardware architectures. Although all these are true we can have an indication of complexity a system has by considering

the multiplication a system needs to perform because they are the most demanding operations followed closely by additions. Memory is also an issue because the storage requirements for a codebook are very demanding. The required space and time complexity are given by kN=k2rk Where r is the resolution in bets per vector component and k is the dimension of vector. The formula shows that time and space grow exponentially with dimension. One component of a vector is a word of storage. When we say time needed we mean time to perform the needed operations for a given vector. The number of operations per vector is the search complexity. The operations can be multiplications or divisions. Consider the following example: Lets say we have a speech signal that is sampled at 8kHz. If we use VQ with a resolution with one bit per vector then we have a bit rate 8 kb/s. Now, k consecutive samples we need a codebook with size 2k Then the required processor speed is given by s=Nfs=2kfs. The 1/s is the maximum time available for one operation. At the table at the end of the document you can see the dimension as a function of time for a speech signal sampled at 8000 per second.

3.1.6 Nearest-Neighbor Quantization


Because our goal in communications is to reach Shannon limits the encoding process for a codebook requires the full search of the codebook to find the closest match. This is happening because the codebook is unstructured since the clustering algorithm LGB has designed it. As a result, for a fixed rate there is a growth in computational complexity exponentially with k where k is a k-dimensional Euclidean space. A way

to improve this situation is to reduce the complexity while maintaining performance. This can be done with tree-structured search algorithms. A quantizer of this type has the feature that the partition is completely determined by the measure of distortion and the codebook. An encoder for this quantizer is optimal if it minimizes the average distortion. We saw above that the squared error distortion is measured by the formula

Spanias Speech coding a tutorial review [4] A Voronoi or nearest-neighbor (NN) vector quantizer is one whose partition cells are given in such a way that each cell Ri consists of all points x which have less distortion when reproduced with a given vector yi that with any other vector. Ri={x: d(x,yi) d(x,yj) for all j J Gray [6].

3.2 Vector Quantization and speech


3.2.1 Application of Vector Quantization to speech
Now, coming to speech, the range of the input is quite large. As a result of this we have a very large codebook in order to represent the various vectors from the source. One way to solve this is by using Gain-Shape Vector Quantization where the source output is normalized and then the normalized vector and factor are quantized separately. The incoming vectors are a block of consecutive signal samples. We have a Nx1dimensional vector si=[si(1)si(2)..si(N-1)]T with amplitude components with real values Sk where 1 i N-1. The quantizer maps the Si -incoming vector to a channel symbol {un. n =1,2L}. We assume that the channel is noiseless. The codebook has L codevectors.

Spanias Speech coding a tutorial review [4]

Both transmitter and receiver have the same codebook in their memory. The transmitter sends the index of the signal sample(s) and the receiver maps the index to the codebook. Actually the incoming vector si is compared to each codeword to find the closest match according to a fidelity criterion, which measures the distortion. The most commonly distortion measure is

Spanias Speech coding a tutorial review [4] The resulting cell is the closest match to the signal sample. The L-entries of the codebook actually LNx1 real valued vectors, are designed by dividing the vector space into L-nonovelapping cells, Cn. Each cell is associated with a template vector. The quantizer assigns the channel symbol un to the vector si. If si belongs Cn the centroid of the cell will represent it. It is important to say here that by applying simple primary VQ to speech does not give a superb quality of speech. We need modified improved implementations of VQ to reach a fairly good quality of speech. Now, if we think how VQ can be modified to give a better quality of speech we will see that by applying a more sophisticated modified criteria of distortion measure could give better reconstruction of speech. Weighted squared distortion measures give better results that simple squared error distortion.

3.2.2 Simple VPCM


The simplest form of Vector Quantization is a generalized model of the scalar PCM and is called Vector PCM or VPCM. In Vector PCM, a block of consecutive samples is treated as one entity, one vector. The vector is encoded with a binary word and an approximation to the original vector is generated using only this binary word. The codebook is fully searched for each incoming vector. The number of bits per sample

is given by B=log2L/N

and the signal to noise ratio is given by SNR=6B+KN=6

log2L/N +KN
Where, B is the coded transmission rate in bits/sample, L the size of codebook and KN is a constant depending on the dimension N that is expressed in dB units. SNR is also expressed in dB units. For a codebook with N=6, L=64 and B coded rate transmission of 1 bit/sample VPCM increases approximately at a rate of 6/N dB for each doubling of the codebook size actually for each additional bit used to code the entire vector. VPCM has improved SNR because it exploits the correlation of vectors. Coming to speech coding, Gersho and Cuperman [10] studied that K2 is larger than K1 by more than 3 dB. Further K8 is larger than K1 by more than 8 dB. Even though Vector Quantization has significant coding gain by increasing N and L. Observing this we see that the computational complexity for a given rate grows exponentially with N. It is 2BN and the number of memory locations required is N2BN. We can see from this that the real benefits of Vector Quantization can be achieved at rates of 1 bit per sample or less. The exponential growth of the encoding complexity with the dimension for a given rate is the main reason not using VPCM for high-dimensional vectors. An example of a VPCM codebook for speech waveform coding can be seen in the image at the end of the document. Here each of 64 patterns consists of six samples joined by straight-line segments and the various patterns are superimposed. Each of these patterns is a variety of possible shapes for a waveform segment of the same length. The patterns follow the lowest mean square rule.

3.2.3 Adaptive Vector Quantization


Speech as a waveform is not stationary but we can assume that it is stationary within segments of duration 10-20ms. Speech waveforms have the property of changing their statistics during the time period. Observing these we would like to design codebooks for speech encoders/decoders that would adapt to the changing statistics of speech waveform. This means that the encoder would learn the particular character from the talker and would adapt the codebook. This is a complex approach but from my point of view is one of the best speech encoding methods. Some problems that researchers are facing, is the large amount of information needed to be transmitted to decoder because of the new codebook retransmission. Also the processing speed and delays are a significant overhead. A simple adaptive scheme is using a speech selector/classifier, which computes some simple statistics, and then select one of the K classes to characterize the frame. Then the K-codebook corresponding to the classification is used to code the input. The selected classification is transmitted to receiver. A simple example would be to use an adaptive vector quantizer with three classes. Gersho - Cuperman [10] In the first class, the waveform is highly correlated, has high energy levels and contains speech. The second class is a different speech classification with voiced speech. This class has average correlation and high energy. The third class is unvoiced sounds and as such it has low correlation and low energy.

3.2.4 Adaptation in codebook for speech coding


Using structured codebooks, which then allow for efficient search, can reduce the complexity in high-dimensionality VQ. Tree-structured quantizers have lower encoding complexity at the expense of loss performances and increased memory requirements.

Because speech is a non-stationary process we may like to adapt the codebook design on the fly to its changing statistics. Vector Quantizers with adaptive codebooks are called Adaptive VQ (A-VQ). There are forward-adaptive and backward-adaptive quantizers. Forward-adaptive quantizers update the codebook based on current data while on backward-adaptive quantizers the codebook is updated based on past data that are available to the decoder.

3.3 Memoryless Vector Quantization


3.3.1 Tree-Structured Vector Quantization (TSVQ)
Tree-structured vector quantization (TSVQ) is well known for its fast search properties. The fast search can be reached by having a properly structured codebook so we can allocate the desired output vector easily. The idea is based on rejecting first a group of output points and then making fewer comparisons since the rejection of a group would reduce the number of comparisons. This can be explained. The NN Voronoi encoding requires N distortion measures for a codebook of size N. Measures are taken sequentially for each test vector. For each measure we need k multiplications and k-1 additions, Gersho-Gray [6]. We may then think this is not an efficient algorithm and requires improvement. A cleverer implementation is needed. TSVQ gives us a better solution. Having a two-dimensional vector the output points in each quadrant are the mirror image of the output points in the neighbouring quadrants. Now, for an input to this quantizer, we can reduce the number of comparisons necessary for finding the closest output point by using the sign of the components of the input. That is why we say that this quantizer has low search complexity. Please notice that search complexity is the number of compares per output bit. The quantization complexity can be compared to scalar quantization at the same rate of course.

Example of TSVQ Suppose we have the space separated as below for a TSVQ. The space is divided in hyper planes Gersho - Gray [6] denoted by capital letters. There is a sign depending on which side of hyper plane a point (input vector) lies. This in turn is a decision which code vector we select. This is another way of building a decision tree like in Huffman coding although in Huffman distortion measure does not play any role in building the tree. If the input x is on the right side of A then vectors 1 and 5 immediately eliminated. If the input were on the left side of A, then code vectors 2 and 3 would be eliminated. Next we test to see which side of C the input is located. We do different tests depending if it is located on the (-) side or the (+) side. As you can see each test eliminates one or more vectors. Now, coming back to search algorithm the sign of the input vector will tell us which quadrant this input lies. Because all the quadrants are mirror images of the neighbouring quadrants we only need to compare the input to the output points in the same quadrant. In this way we reduce the number of comparisons by a factor of four. Now, for L dimensions the sign on the L components will indicate in which of the 2 L quadrants the input vector lies. Thus the number of comparisons would be 2 L. Generally In a M-ary search with a balanced tree the input vector is compared with a test vector to select which of the m-paths will follow to go to the next stage. At each stage the number of vectors is reduced to 1/m. This method of searching is called a classification or decision tree. A tree is said to be unbalanced if different paths through the tree do not travel the same number of nodes. Otherwise the tree is balanced. How TSVQ works

Lets see how TSVQ works for a nonsymmetrical situation. First we divide the set of output points into two groups and G0 and G1 we assign a test vector to each group in a such way that the output vectors in each group are closer to the test vector assigned to this group. When we get an input vector we compare it against the test vectors to find in which quadrant it belongs. We then make the comparisons against the output points of the quadrant the input vector belongs to. The output points of the other quadrant have been rejected. In this way we reduce the comparisons to be made by half. The process can continue by dividing again the remaining quadrant into two groups G00 and G01 or G10 and G11 and assigning test vectors to them. By continuing this process we will end up to a set of groups which will consist from single points if the number of output points is a power of two. The number of comparisons need to be made is reduced to 2lgK. So, for a codebook of size 1024 we need to make 20 vector comparisons instead of 1024. That is a significant gain. How ever you cant get something from nothing. The price for this gain is an increase in distortion and storage requirements. As we saw conventional TSVQ consists of an l-level Q-ary tree, where Q=2b, and b is the number of bits per level. When quantization is performed on the tree, an equivalent of codebook of N=2bb is available. Then we have only Q(lb) search operations per vector, in comparison to Q(2lb) searches for the full-search vector quantization (FSQV). TSVQ then has a computational advantage in comparison to FSVQ.

3.3.2 Gain-Shape Vector Quantization


This is another way to reduce both memory requirements and complexity of the code. Gain-Shape is based on the idea that the same pattern of variation in a vector may appear with a wide range of values. The extraction of root mean square from the

vector components is called gain. Based on the previously mentioned idea of the unaffected pattern then we realize that the probability distribution of the shape is almost independent of the gain. Gersho - Gray [6] explain in details how the Gain-Shape process of quantization works. This can be explained briefly as follows: Before the encoding process is started the energy level of a vector is normalized and quantized separately using a scalar quantizer. The energy-level normalized vector is called shape. The effectiveness of gain-shape VQ depends on the degree to which the gain of a randomly selected speech vector is statistically independent of its shape. Using this technique we can implement vectors with higher dimensions but the optimality is lost. At a rate of 1 bit/sample 8 kb/s the highest dimension that can be implemented in standard VQ is k=8 and the SNR is 9.7 dB while using gain-shape with dimension of 12 the SNR improves by 0.7 dB at the same level of complexity. Again we realize that the complexity problem is a tough one and cannot be solved easy.

3.3.3 Multistage Vector Quantization


This method that was first proposed by Juang reduces both memory requirements and encoding complexity. The idea is to divide the encoding process into multiple stages Multistage Vector Quantization (MSVQ). After the first stage is performed with a small codebook the second stage uses the error vector between the first quantized output and the original to obtain a more refined input. Specifically, taking the difference between the original and the quantized vectors we measure the encoding error, which is then fed into a second-stage vector quantizer. A third stage can then be used to provide further refinement.

Let me explain further this process that is widely used in speech coding. Lets say we have a two-stage quantizer. The input vector X is quantized by the first stage quantizer, which gives an approximation X1. This approximation is then subtracted from X and produces an error vector E2. This error vector is then applied to the second quantizer giving an output X3. The overall result of the approximation is obtained by adding X1 and X3. Overall approximation A=X1+X3 The encoder then simply transmits the index words of the first and the second stage. This means that the actual index binary word that has to be transmitted is a concatenation of the binary words needed to identify the correct path in each quantization at the same rate. Consider an example where we use d stages. For a d stage quantizer we will have an overall approximation A=X1+X3++Xd At the first stage we get an approximation X1 using a codebook K1 of size N1. At the second stage we get an approximation by subtracting the first approximation from the original E2=X-X1. We used at the second stage a codebook K2 so that X3=X1+E2. This is an improved approximation to X. Similarly for the third stage we will get an improved approximation X5=X1+E2+E3. Continuing in this way finally we get for d stage quantizer Xd=X1+E2+. +Ed Please notice that there is a separate codebook for each of the d vectors. The overall codeword is a concatenation of the codewords chosen from each of the codebooks. While the transmitter transmits the indexes of the codewords the decoder does a table look-up to the different codebooks and forms the sum Xd=X1+E2+.

+Ed. The complexity and storage requirements can be greatly reduced using this quantizer. The vector dimension is increased proportionally to the number of stages to keep the k/d constant. The encoding complexity is increased by k2 while in one-stage VQ increases exponentially rather quadratically. Figures at the end of the document show the computational complexity and memory requirements against dimension for a range of vector quantizers and one-stage vector quantization.

3.4 Memory based Vector Quantization


3.4.1 Predictive Vector Quantization
The term predictive in vector quantization means that the input vector is coded in a way that depends on the past input vector. Thinking this we can say that the quantizer has memory while so far we examined memoryless quantizers. The vector as we said before is a finite set of adjacent samples of the signal. They are usually blocks of consecutive samples of say speech segments. The sequences of vectors to be coded are generated by scalar quantization. The scalar quantization x(n) will generate say k consecutive samples. These samples coming serially are converted to parallel set of samples forming then a k-dimensional vector. In a Blocked Speech Signal a speech waveform x(n) is partitioned into blocks of k samples making a vector sequence which is Xm=[x(mk),x(mk+1),,x(mk+k-1)]t Gersho Gray [6] It is obvious that comparison of consecutive samples or samples close to each other will show that the samples are similar which means they are highly correlated.

The idea in Predictive Vector Quantization is to guess the next input based on the past sequences with as much accuracy as possible. The better this accuracy is the better the quantizer is. The prediction is made based on the past say m samples and from this point of view the predictor is assumed to be of finite order m which means that it depends only from the past m samples. This is called Gray [6] closed loop prediction of Xn . If the prediction Yn and the input vector is Xn . The difference between the input vector and the prediction is vector quantized to make E(quantized)=Q(en) where en=Xn-Yn. We can conclude from the previous that the error in vector quantization is equal to the error from the quantization of the difference of the signals. The signal to noise ratio (SNR) for PVQ quantizing on the difference is much less that the SNR from any other quantizer operating on the input vector for the same size of the codebook. This due to the fact that quantizing on the difference vector means operating on a vector with components that are less correlated. Generally PVQ has an increased performance over memoryless VQ.

3.4.2 Adaptive Vector Predictive Coding


This can be thought as Differential Pulse Code Modulation (DPCM) where we can use vector prediction to remove the redundant information. The performance is improved by classifying the input frames of speech into categories. Then we can use an appropriate predictor and quantizer to quantize speech. So, the predictor and quantizer is selected based on the class that the input frame has been classified. The SNR is very high comparing with other VQ methods and is one of the best speech coding methods.

3.4.3 Recursive Vector Quantization (RVQ)


As we said before the operation of the decoder depends on the past history of channels symbols received by the decoder. Now, if we have a sequence of input vectors the encoder produces both a sequence of channel symbols and a sequence of channel states that define the encoders behavior in response to the input vec tors. Now if the input vectors are a k-dimensional vectors, which means Xn Rk then the channel symbols take values N=2R. The state variable takes values from S, which is the state space. If R is the rate of the coding scheme in bits per input vector then R/k is the resolution or rate in bits per input sample. To make sure that the decoder can track the encoder state we require that the decoder knows the initial state and all the other states can be derived from the initial state and the channel symbols. This can be done with a state transition function. Then when the encoder and the decoder know both the initial state the operation of both the encoder and decoder is completely predictable. This coding method is called recursive-coding system or feedback source coding system. Gersho-Gray [6] Now, suppose that the encoder operates on the basis of NN or minimum distortion idea and that the encoder decoders are in state s. The decoder can have specific statedependent possible outputs. These possible outputs are called state codebook for the state s. The goal for Recursive Vector Quantizers (RVQ) is more or less same. It is to reduce the average distortion. This can be achieved by designing a RVQ (Recursive Vector Quantizer) based on the long training sequence. Now, we have to think a bit more. We know that RVQ have an initial state based on it and are nonstationary. The long-term sample averages more or less will converge each other and so RVQ have ergodic properties. So, we cannot relate them to an unexpected average distortion.

3.4.4 Finite State Vector Quantizer (FSVQ)


A finite state quantizer is a Recursive Quantizer with only a finite number of states s. Please notice that simple VQ is FSVQ with only one state. Again it is all about a predicting the next input based on the past-encoded vectors. Each state defines a codebook and the whole idea of predicting the next input vector is based on how good the codebook design is. The past-encoded vector is the current state. The number of possible predictions of the future is generally constrained to some finite number countable and implementable. So we can think of FSVQ as a collection of K separate memoryless quantizers. A rule decides which of the Kcodebooks will be used to encode the current input vector. The current state specifies which codebook will be used and the resulting channel index combined with the current state determine the next input. The VQ can be realized as switch selecting the codebook to be used.

3.4.5 Tree and Trelis Coding (TCQ)


Suppose we have an RVQ who is in s initial state and has a channel symbol alphabet N=2R. Suppose the channel symbols are binary words so that N=2. The operation of the decoder can be seen through a tree. The paths of the tree represent the states taken by the decoder in successive time instants. So, each level of the tree represents the appearance (time arrival) of a new input vector. The difference between the traditional TSVQ and the TCQ is that in TSVQ the entire tree represents the possible ways in which a single input vector is searched. The initial state is the root of the tree and all others are the decoder states. Suppose that the encoder is in the initial state and the channel index is binary either 0 or 1. If the encoder produces a 0, then the decoder produces a (0,s) reproduction and advances to the state s0=f(0,s). If the encoder produces a 1 then the reproduction will be (1,s) and the decoder advances to the state s1=f(1,s).

Vous aimerez peut-être aussi