Vous êtes sur la page 1sur 7




           

Multiple Descriptions Coding in


MELP Coder for Voice over IP
M.SAIDI, B.Boudraa

M.Bouzid, M.Boudraa

Laboratoire de communication parle et de traitement du


signal (LCPTS), universit USTHB
Algiers, Algeria
mosaidi@usthb.dz, bboudraa@usthb.dz

Laboratoire de communication parle et de traitement du


signal (LCPTS), universit USTHB
Algiers, Algeria
mbouzid@ustb.dz, mboudraa@usthb.dz

Abstract In VoIP systems, CELP coders, such as G.729, are


commonly used as they offer good speech quality in the
absence of packet loss. However, harmonic coders such as
MELP may be a good alternative for VoIP due to their higher
resilience to packet loss. In this paper we examine the problem
of the packet loss in the VoIP application using MELP coders.
The proposed packetization scheme based on Multiple
Description Coding (MDC) applied to the MELP coder is
presented. A packet will contain information on two MELP
coders operating at 2.4 and 1.2 Kbps respectively. The
packetization is achieved using 135 bits in 22.5 ms
corresponding to a total rate of 6 kbps. The results show that
under typical VOIP operating conditions, the method performs
well and outperforms CELP coders operating without MDC at
8 kbps.

This paper is organized as follow. First, we proceed in the


voice communication over IP, then we propose a method for
recovering lost packets. Afterward a Packetization scheme is
proposed, and then a packet recovery is presented. Finally, a
simulation results are shown and evaluation.

Keyword; VoIP, MELP, MDC, packet loss.

I.

INTRODUCTION

Voice over Internet (VoIP) is achieving by sending and


receiving packets. At the receiver, some packets are missing
because of delays, congestions or transmission errors. This
packet loss degrades the quality of service (QoS) at the
receiver. As the voice is transmitted in real time, the
receiver cannot request the retransmission of lost packets
due to the large transfer time induced. Concealing loss
techniques are then used at the transmitter or the receiver in
order to recover the loss packets. These techniques are
called redundant descriptions [1]. Among these techniques,
Multiple Description Coding (MDC), based on information
redundancy, increases the robustness against packet loss.
The goal is to keep a certain quality and intelligibility of
speech when the packet loss rate increases. Real-time
Transport Protocol (RTP) is often used.
In this work, we have developed and implemented a
packetization scheme made frame by frame and applied on a
harmonic coder designed for voice over IP. Hence, this
packetization is achieved on a mixed excitation linear
prediction (MELP) [2]. Each packet contains major
information on the current frame and its future
neighbouring. A received packet can recover until four lost
frames.

978-1-4673-1591-3/12/$31.00 2012 IEEE

II. VOICE OVER INTERNET PROTOCOL (VOIP)


It is widely accepted that VoIP is a technology emerging on
the internet and will dominate the field of voice
communications. It has begun to take place as a viable
alternative to the traditional voice communication systems.
In this technique, the analog call is converted to digital and
compressed to be then transformed into packets for
transmission over an IP network. At the other end, the
process is reversed: As the packets of information
circulating on the Internet take different paths and arrive
frequently out of order, the packets are firstly stored in
buffers to be re-sequenced and then decompressed and
transformed into a sound signal before being routed back
through the ordinary telephone equipment [5, 6]. Figure 1
shows the block diagram of a voice communication system
over IP network.

Original
speech

Coding

Packetization

IP network

Synthetic
speech

Decoding

De-packetization

Fig. 1: Block diagram of the transmission of voice over IP

III. OUR PROPOSED METHOD

IV. PACKETIZATION

In this work, we propose a method for combating efficiently


packet loss. The method is based on MDC technique, including
redundancy. Indeed, we code the signal on two descriptions in
the same packet. The first uses a MELP coder and encodes the
current frame Fn at 2.4 kbps while the second uses another
MELP encoder running at 1.2 kbps to encode the three frames
Fn+1, Fn+2, Fn+3, following Fn. Obviously, Fn is used to
reconstruct the signal with a good quality while Fn+1, Fn+2
and Fn +3 are used to recover the eventual loss packets.
Indeed, the second description thus formed contributes to
reconstruct the speech when one, two, three or even four
successive packets are lost. This redundant information has not
the same quality of the extracted one from Fn as it is roughly
quantified. It only helps to reconstruct an intelligible speech
when packets are lost. The packetization scheme is shown in
figure 2. Note that the MELP 2.4 operates on a frame of 22.5
ms, while the MELP 1.2 operations are achieved on a 67.5 ms
frame [3, 4].
Packet

MELP 2.4
kbps

MELP coder parameters are given in table 1. The two coders


encode the fundamental frequency (pitch), the flag of the
aperiodicity, the five bands of voicing, two gains
corresponding to the energy of two half-frames, ten LPC
coefficients converted into LSF and spectral amplitudes of ten
harmonics of the pitch.
A fine description at 2.4 kbps is required in order to provide
good speech quality in no-error conditions. The configuration
allows also a coarse description at 1.2 kbps with reasonable
quality to recover until three successive packet losses for larger
bursts. The packetisation scheme is shown in Figure 2. A
packet will contain both mentioned MELP coders and will be
coded using 135 bits (54 + 81) corresponding to a rate of 6
kbps. Hence, formation of a packet requires the presence of
four successive frames of 22.5 ms each. In a transmission
without packet loss, only the first 54 bits will be used to decode
the signal. Then, a packet is attributed to the current frame
corresponding to the MELP 2.4 (Figure 2). This causes a delay
of 22.5 ms. Note that forming and sending the first packet
request a delay of 90 ms. Afterwards, every 22.5 ms a packet
is sent.

MELP1.2
kbps
Fn

Fn+1

Fn+2

Fn+3

Fig. 2: Packetization using two descriptions

Table I: Bit allocation encoder MELP 2.4 kbps and 1.2 kbps
Bit Rate

MELP 2.4 kbps

Sampling frequency
Size of frame
Bit Rates of frame

8 kHz
180 samples (22.5 ms)
44,44 frames/seconde

MELP 1.2 kbps

Mode of voicing

N/V

VVV

10 LSFs
Pitch
10 Fourier amplitudes
5 Bands of voicing
2 Gains

25
7
8
4
8

25
7
8

43
12
8
6
10

Flag
Protection
Synchronisation

1
1

0
13
1

1
0
1

Total bits per frame


Total

54 bits
54*44,44= 2400 bps

8 kHz
3*180 samples (67.5 ms)
14.8148 frames/seconde
UVV
UUV
VUV
VVU
UVU
VUU
43
39
43
12
12
12
8
8
8
4
4
2
10
10
10
1
2
1

1
6
1

81bits
81*14.8148 = 1200 bps

1
4
1

UUU
27
12
0
0
10
0
31
1

V. PACKET RECOVERY
Figure 3 shows how our MDC system allows recovering lost
packets. Using this scheme, three successive lost packets can
be easily recovered. Even the fourth frame can be retrieved by
adopting a method of extrapolation [2]. From left to right on
the same figure, the respective cases of loss of a single packet,
2 packets, 3 packets and finally 4 packets are shown.
Explanation:
The first case corresponds to a single packet loss (F2 frame).
The current packet is lost and the MELP 2.4 is unable to
provide us with a good quality of the speech. The system falls
back to the previous received packet containing the
information the lost current frame. This frame is then
reconstructed with a coarse quality using the 81 bits
corresponding to the 1.2 MELP coder; i.e. to the three frames
F2, F3 and F4.
22.5 ms

F1

The second case corresponds to two successive loss packets,


namely F4 and F5 (figure 3). As in the first case, we cannot
reconstruct the signal using a 2.4 MELP. So, we proceed with
recovery using the previous packet which was received
correctly.
In the third case, three successive packets corresponding to
F7, F8 and F9 are lost. The decoding of the signal is obviously
achieved from the last received packet.
In the latter case, when four successive packets (F11, F12, F13
and F14) are lost, then F11, F12 and F13 will be directly
recovered from the packet No. 10, as this packet contains both
information on the frame 10 at 2.4 kbps and information about
successive frames F11, F12 and F13 which were coded at 1.2
kbps in this packet. The frame F14 will be retrieved using an
extrapolation method as in [2].

Original speech
F2

F3

F4

F5

F6

F7

F8

F9

F10

2.4 kbps

2.4 kbps

2.4 kbps

2.4 kbps

1.2 kbps

1.2 kbps

1.2 kbps

1.2 kbps

F2 F3 F4
Received

F3 F4 F5

F4 F5 F6

F5 F6 F7

Lost

Received

Lost

F6 F7 F8
Lost

F7 F8 F9
Received

F8 F9 F10
Lost

F1

F2

F3

F4

F5

F6

F7

2.4 kbps

1.2 kbps

2.4 kbps

1.2 kbps

1.2 kbps

2.4 kbps

1.2 kbps

F10

F11

F12

F14

F14

F8
1.2 kbps

2.4 kbps

2.4 kbps

1.2 kbps

1.2 kbps

F11F12 F13
Received

Lost

F15

F9 F10 F11

F10
2.4 kbps

F12 F13F14
Lost

F11
1.2 kbps

F13F14 F15

F14F15 F16

Lost

Lost

F12

F13

1.2 kbps

F15F16 F17

F16F17 F18

Lost

Received

F14

F15
2.4 kbps

1.2 kbps
Extrapolation

Synthetic speech
Fig. 3: Process of recovery of lost packets based on the proposed MDC

F10F11 F12

F11F12 F13

Lost

Received

F9

F10

1.2 kbps

2.4 kbps

VI. SIMULATION RESULTS AND EVALUATION


First, we evaluate the performance of the two MELP coders
implemented separately; the aim is to quantify the perceptual
quality of these coders before they are implemented using
MDC. A second evaluation will be conducted using MDC.
x

A. EVALUATION CORPUS
To assess and validate our method, we used a
multilingual corpus combining Arabic, French and
English. The first record is composed of Arabic
sentences phonetically balanced [7] developed in our
laboratory. This corpus contains a total of 60
sentences, 10 sentences spoken by 3 male and 3 female
speakers. For French and English, we used the known
phrases phonetically balanced, la bise et le soleil
and The wind and sun .

B. EVALUATION OF CODERS
The evaluation of the performance of the two MELP
coders implemented separately were designed using
the Recommendation P.862 of the ITU-T (International
Telecommunication Union) [8] called PESQ
(Perceptual Evaluation of Speech Quality). This
method describes an objective method for predicting
the subjective quality for telephony and for voice
coders. It is intended to evaluate the influence of
factors such as packet loss, the variable delay and
distortion due to channel errors that is poorly evaluated
by conventional methods. The PESQ is designed to
compare a reference version (original) to that obtained
by synthesizing this reference or after transmission and
have been adversely affected. The results are shown in
Table 2.

Concerning the assessment of the simulation of the technique


using MDC, we used different rates of packet loss using the
model random process of lost [9] and tested the robustness of
the MELP coder using the MDC. In this case, we found an
enhancement in the robustness of the system against packet
loss. The results from the objective tests of our simulation are
shown in Figure 5. They showed that our method based on a
loss concealment technique for VoIP application raises
significant quality loss rates for up to 30%. These results show
that the encoding technique used by multiple descriptions
provides a significant improvement in speech quality,
especially when the loss rate increases. We give in Figure 6,
Figure 7 and Figure 8 a sample result which shows the
reconstructed signal after a packet loss when one, two and
three frames consecutives were lost respectively. We observe
the correction of lost frames for different cases of packet loss.
For areas where voiced and after the correction, the voicing is
preserved.
VII. CONDUCT OF TESTS
This section presents the simulations used to perform our tests.
We simulated various losses to introduce degradation in the
synthetic signal. These losses were simulated randomly by use
the function that follows a uniform distribution.
Simulation of loss
Decoder

Coder

Reference signal
Table II: Results of objective tests of two MELP coders

Synthetic signal
corrected by MDC
Evaluation
Signal with loss

PESQ

MELP 2.4 Kbps

MELP 1.2 Kbps

3.20

2.71

Original Signal
PESQ Score
Fig. 4: Results of objective tests of two MELP coders

5
4.5
4
3.5

2.5
2
1.5
1

Original signal
MELP
MELP with MDC
G.729

0.5

10

15

20

25

30

Packets loss (%)


Fig. 5: Results of objective tests using PESQ

Amplitude

Original signal
0.5
0
-0.5

1.02

1.04

1.06

1.08

1.1

1.12

1.14

1.16

1.18

1.2
4

x 10

MELP signal at 2.4 kbps with packet loss


Amplitude

0.5
0
-0.5
1

1.02

1.04

1.06

1.08

1.1

1.12

1.14

1.16

1.18

1.2
4
x 10

1.14

1.16

1.18

1.2
4
x 10

MELP signal with MDC


Amplitude

PESQ

0.5
0
-0.5

1.02

1.04

1.06

1.08

1.1

1.12

Number of samples
Fig. 6: Sample output showing the correction made by the MDC

Amplitude

Original signal
0.5
0
-0.5
8000

8200

8400

8600

8800

9000

9200

9400

9600

9800

10000

9400

9600

9800

10000

9000 9200 9400


Number of samples

9600

9800

10000

Amplitude

Synthetic signal at 2.4 kbps with loss


0.5
0
-0.5
8000

8200

8400

8600

8800

9000

9200

Amplitude

Synthetic signal with MDC


0.5
0
-0.5
-1
8000

8200

8400

8600

8800

Fig. 7: Sample output showing the correction made by the MDC

Amplitude

Original signal
0.5
0
-0.5
-1

8200

8400

8600

880

9000

9200

940

9600

9800

9400

9600

9800

10000

9400

9600

9800

10000

Amplitude

Synthetic signal at 2.4 kbps with loss


0.5
0
-0.5
8200

8400

8600

8800

9000

9200

Amplitude

Synthetic signal with MDC


0.5
0
-0.5
8000

8200

8400

8600

8800

9000

9200

Number of samples
Fig. 8: Sample output showing the correction made by the MDC

VIII.

CONCLUSION

In this work, we have presented an original method using two


harmonics MELP coders, the first for the transmission over an
IP network speech encoded at 2.4 kbps. The second operating
at 1.2 kbps is added to the first in the same package in order to
compensate of the lost packets using a technique called
multiple description or MDC. The results of our simulations
for a VoIP application showed that our method based on this
concealment method enhances the quality for loss rates up to
30%. The results show that the proposed system of
concealment is effective and provides a significant
improvement in speech quality, especially outperforms when
the packet losses exceeded 15%. We have proved that the
redundancy added by MDC can ensure properly good quality
speech for any loss of packets. In this study, we obtained an
encoder that operates at 135 bits / frame of 22.5 ms
corresponding to a total rate of 6 kbps. So, we recommend the
use of this solution to replace advantageously the CELP G.729
standard currently used in VoIP applications and that has a
flow rate of 8 kbps without MDC.

REFERENCES
[1]

M. Rui and F. Labeau, Error-Resilient Multiple Description


Coding. Proceedings of the IEEE, Vol 56, No 8, 2008

[2]

[3]

[4]

[5]

[6]
[7]

[8]

[9]

E. Orozco, E. Orozco, and A.M.Kondoz. Multiple


Description Coding for Voice over IP using Sinusoidal
Speech Coding. In Proc. of IEEE- ICASSP, 2006.
McCree, K. Truong, E. B. George, T. P. Barnwell, V.
Viswanathan. A 2.4 kbits/s MELP Coder Candidate for the
New U.S. Federal Standard. In Proc. of IEEE-ICASSP,
1996.
T. Wang, K. Koishida, V. Cuperman and A. Gersho. A 1200
bps Speech Coder Based on MELP. Proc. IEEE. Inter.
Conf. Acoustics. Speech and Signal Processing, 2000.
A. Nagle. Enrichissement de la Confrence audio en Voix
sur IP au travers de l'amlioration de la qualit et de la
spatialisation sonore. Paris Tech, France, 2008.
L. Ouakil et G. Pujolle, Tlphonie sur IP . Edition
groupe Eyrolles, 2008.
M. Boudraa, B. Boudraa, B. Gurin. Twenty lists of Ten
Arabic Sentenses for assessment. Acustica, vol. 86, pp.870882, 2000.
ITU-T, Perceptual evaluation for speech quality (PESQ), an
objective method for end-to-end speech quality assessment of
narrow-band telephone networks and speech codecs,
Recommendation P.862, International Telecommunications
Union, 2001.
M. Yajnik, S. Moon, J. Kurose, and D. Towsley.
Measurement and modeling of the temporal dependence in
packet loss. In INFOCOM 99. Eighteenth Annual Joint
Conference of the IEEE Computer and Communications
Societies. Proceedings. IEEE, Vol 1, P. 345-352, 1999.