Speech Compression

SPEECH COMPRESSION USING POLYNOMIAL APPROXIMATION
Presented by Srijan Silwal(108671) Siddharth Sah(108672)
Contents
Introduction to speech Compression and its need Polynomial approximation Frame parameters Interpolation Encoding Compression of spectral component,gain,pitch
Speech Signal
Human Speech is acoustic signal It is converted to electrical signal by transducers.
Properties of electrical signals

1. It is a one-dimensional signal, with time as its independent variable. 2. It is random in nature. 3. It is non-stationary, i.e. the frequency spectrum is not constant in time. 4. Although human beings have an audible frequency range of 20Hz 20kHz, the human speech has significant frequency components only upto 4kHz,a property that is exploited in the compression of speech.
Digital Representation of Speech
With the advent of digital computing machines, it was propounded to exploit the powers of the same for processing of speech signals. The analog signal is sampled at some frequency and then quantized at discrete levels.
Parameters of Digital Speech

1. Sampling rate 2. Bits per second 3. Number of channels. The sound files can be stored and played in digital computers. Various formats have been proposed by different manufacturers for example .wav .au
What is COMPRESSION?
Compression is a process of converting an input data stream into another data stream that has a smaller size. Compression is possible only because input data has some amount of redundancy associated with it. The main objective of compression systems is to eliminate this redundancy.
Why Compression?
Multimedia files in general need plenty of disk space for storage and sound files are no exception. Hence compression of these files has become a necessity. When compression is used to reduce storage requirements, overall program execution time may be reduced. This is because reduction in storage will result in the reduction of disc access attempts.
Applications of Compression
1. The use of compression in recording applications is extremely powerful. The playing time of the medium is extended in proportion to the compression factor. 2. In the case of tapes, the access time is improved because the length of the tape needed for a given recording is reduced and so it can be rewound more quickly.
3. In digital audio broadcasting and in digital television transmission, compression is used to reduced the bandwidth needed. 4. The time required for a web page to be displayed and the downloading time in case of files is greatly reduced due to compression.
Polynomial Approximation-Introduction
Methods for speech compression aim at reducing the transmission bit rate while preserving the quality and intelligibility of speech. A method for compressing speech is based on polynomial approximations of the trajectories in time of various speech features (i.e., spectrum, gain, and pitch).
Continued..
One method of compression, called segmental coding, uses polynomial functions to approximate trajectories of speech features present in successive time frames. Useful compression results if the number of bits per second needed to transmit the polynomial coefficients is smaller than the number of bits per second needed to transmit the original feature frames for the segment.
POLYNOMIAL SPEECH COMPRESSION: FRAMES

The input signal is analyzed in brief windows (frames) that usually span a few tens of milliseconds. This process is repeated for successive windows and results in a discrete stream of frame parameters. The speech samples contained in a frame are processed by a spectral analyzer that provides a relatively small number of spectral features for each frame. Other speech features, such as voicing, gain (energy), and pitch, are also obtained and assigned to frames. These frame parameters already represent a compressed form of the original signal.
The frame parameters are transmitted to the receiver (decoder) where the signal is synthesized to resemble the original signal. Linear interpolation is employed usually in the decoder between successive frames to smooth the transitions of the parameters across frames. Successive frames are analyzed independently of each other and they usually contain some redundant information. Additional compression can be obtained by exploiting this redundancy across successive frames.
Coding of Frame Parameter

An efficient method to perform coding of frame parameters is based on matrix quantization. Such methods are suitable for the quantization of fixed-length segments. A whole block of frame vectors is constituted as a matrix. Matrix quantization techniques require a larger amount of data. This poses a problem for longer segments of speech due to the higher spectral-temporal variability encountered in such segments and the sparseness of the data. A method to alleviate this problem uses polynomial approximation of the speech features included in such segments.
Continued..
There are two main advantages to using polynomial functions for approximation: First, they can approximate various shapes of feature trajectories with arbitrary accuracy, depending on the polynomial order Second, they are described by a relatively small number of parameters which is necessary to achieve a significant compression.
Approximation
One of the most popular approaches to function approximation is the least squares method. Thus, for any arbitrary function f(x), continuous on a closed interval [,], there exists an algebraic polynomial p(x), of order d, that can best approximate the function on that interval in the L2 norm.
It is assumed that there is a bidirectional transform between the original signal and its vector space representation. Such bidirectional transformation is also required in order to reconstruct the original signal from the vector-space representation.
Interpolation
interpolation is done in the least-squares sense by a polynomial function defined as follows: Fi,P(n)=ai,0+ai,1n+ai,2n2++ai,pnP where ai,0 ai,pnP are the polynomial coefficients, and P is the polynomial order for feature element .
The maximum order of the polynomial is limited to P=N-1 because there are only N data points available to estimate the coefficients in the leastsquares sense. The lower the polynomial order P in the range[0,,N-1], the higher the approximation error for an arbitrary trajectory.
Compression
Thus, the condition for achieving compression is P+1<N which presupposes some approximation errors. A feature compression factor is defined as follows:
Encoding of Polynomial Coefficients

Instead of encoding the P+1 coefficients, these coefficients can be uniquely represented by sampling the polynomial function at P+1 arbitrary points and encoding these P+1 trajectory feature samples. These new P+1 feature samples can be encoded using the original VQ codebook because they are vectors in the original D dimensional feature space.
Polynomial Compression of Spectral Parameters

Among the popularly used spectral representations in speech coding are the LPC and LSF features.
Polynomial Compression of Gain Parameters

For good quality and intelligibility of the encoded speech other parameters such as gain and pitch are also important. The trajectory of the gain feature is approximated by a polynomial function of order P on a segment S containing N frames. Then instead of encoding the P+1 polynomial coefficients, they are transformed into P+1 gain feature samples by sampling the polynomial function at P+1 points. These gain feature samples can be encoded using the coders original codebook for gain.
Polynomial Compression of Pitch

Because pitch is measurable only in the voiced frames, an arbitrary speech segment S containing N frames can contain frames with no pitch measurements. A way of adapting the method to this special case is to build the polynomial function based on a possible reduced number of voiced frames(Nv) in the speech segment(Nv<N)
Conclusion
Polynomial approximation has proved to be a useful and efficient method for compression of the speech parameters. Such a method can be applied to both coding and storing of speech. The spectral parameters, especially those with low dynamics such as LSF parameters, are particularly suitable to supplementary compression by polynomial approximation. In addition, the gain and pitch parameters can also be compressed by polynomial approximation methods.

Speech Compression

Transféré par

Informations du document

Description originale:

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Speech Compression

Transféré par

Droits d'auteur :

Formats disponibles

SPEECH COMPRESSION USING POLYNOMIAL APPROXIMATION

Presented by Srijan Silwal(108671) Siddharth Sah(108672)

Properties of electrical signals

Digital Representation of Speech

Parameters of Digital Speech

POLYNOMIAL SPEECH COMPRESSION: FRAMES

Coding of Frame Parameter

Encoding of Polynomial Coefficients

Polynomial Compression of Spectral Parameters

Polynomial Compression of Gain Parameters

Polynomial Compression of Pitch

Vous aimerez peut-être aussi