Vous êtes sur la page 1sur 7

Final DRAFT

Key themes: T1,T3,T4

THE ADAPTIVE MULTI-RATE SPEECH CODER - THE NEW FLEXIBLE WORLD-STANDARD FOR VOICE COMPRESSION
Erik Ekudden*, Stefan Bruhn*, and Patrik Srqvist**
{erik.ekudden, stefan.bruhn, patrik.sorqvist}@ericsson.com *Ericsson Research Ericsson Radio Systems AB SE-164 80 Stockholm, Sweden **Datacom Networks & IP Services Ericsson Telecom AB SE-126 25 Stockholm, Sweden

ABSTRACT
In this paper, we describe the recently standardised Adaptive Multi-Rate (AMR) speech coder, its implementation and performance in cellular systems, such as GSM and 3G UMTS/WCDMA, fixed circuit switched networks and IP-based networks. The coder is a multi-rate ACELP coder with 8 modes operating at bit rates from 12.2 kbit/s down to 4.75 kbit/s. In addition, an efficient high quality source controlled rate functionality which lowers the average source rate by voice activity detection is specified. The wide range of bit rates and the high speech quality makes it suitable not only for cellular applications where the rate can be controlled based on e.g. radio channel quality or cell load, but also to fixed network voice trunking applications and high quality voice over IP (VoIP) applications.

1.

INTRODUCTION

The speech service, and extensions to realtime multimedia, are important applications for network operators. Thus, there is a continued push for increased quality and capacity as technology advances and new transmission techniques emerge[1]. For the GSM system, ETSI conducted a feasibility study for next generation speech services in 1996. The goal was to provide wireline quality in the halfrate traffic channel and highly error robust operation in the fullrate traffic channel. The result of the study was an Adaptive Multi-Rate concept, where the speech coder bit rate was continuously adapted to radio channel conditions - no fixed rate solution would meet all the requirements. After a series of subjective tests, the AMR coder was selected in October 1998. Subsequently, AMR was adopted by 3GPP as the mandatory speech coder for UMTS/IMT-2000. Traditionally, voice compression in the circuit switched networks have been used on a per link basis. When low bandwidth links are connected, transcoding is applied for each link. These additional transcoding stages degrade quality significantly for low rate speech coders. In addition to the higher coding distortion, the transmission delay is usually increased whereby the overall quality is further reduced. Moreover, transmission capacity in the network is wasted. With the all-digital networks, end-to-end coding from one phone (end-point) to another, i.e. tandem-free operation is now a realistic goal for increased quality and transmission efficiency. The outline of the paper is as follows. Section 2 discusses transmission network aspects, and Section 3 the basic

Final DRAFT

Key themes: T1,T3,T4

building blocks of the AMR speech coder. Section 4, example applications. Section 5 provides performance data, and Section 6 the conclusions.

2.

TRANSMISSION NETWORKS

To fully optimise the connections in terms of quality and capacity, the end-to-end transmission should be considered. In Fig. 1. example connections are given, including calls from mobile, via IP transport to either PSTN/ISDN, a LAN PC phone or a second mobile network, showing that several interconnected networks are likely also for short distance (local) calls.

GW GW

IP Network

GW

PSTN/ISDN

GW

Figure 1. Network scenario involving GWs for voice transcoding

In Fig. 1, the need for media gateways (GWs), with the additional delay, complexity, cost and loss of quality in the transcoding, would be significantly reduced if the intermediate transport carried the compressed voice as far as possible in the transport networks. In wireless environments, speech coders must withstand high error rates caused by interference, fading etc. to obtain high network capacity and still provide the wireline speech quality expected by customers. As a result of this, speech coders designed for cellular networks are well suited also for e.g. fixed networks and IP networks with high jitter and packet loss rates. With the demand for higher capacity and lower cost for transmission, networks are more often operated at a point with relatively poor network quality. Only robust speech coders such as the AMR coder can provide satisfactory performance under such network conditions.

3.

AMR SPEECH CODING

AMR speech coding is specified in GSM 06.90 together with the related voice activity detector (VAD), source controlled rate system (SCR), and error concealment (ECU) of lost frames. The AMR coder can operate at 8 different source rates for speech, given in Table 1.
Table 1 Mode AMR-12.2 AMR-10.2 Rate (bits/s) 12200 10200 Remarks GSM EFR 06.60

Final DRAFT

Key themes: T1,T3,T4

AMR-7.95 7950 AMR-7.40 7400 TDMA EFR IS-641 AMR-6.70 6700 PDC-EFR RCR 27H AMR-5.90 5900 AMR-5.15 5150 AMR-4.75 4750 AMR-SID 220* Note: * Approximate average rate during non-speech

3.1 Speech Encoder and Decoder The AMR coder is a scaleable Multi-Rate Algebraic CELP (MRACELP) coder[2], capable of seamless switching between any of the 8 bit rates every frame. The frame size is 20 ms and the lookahead 5 ms, giving a total algorithmic delay of 25 ms. Interworking with existing high quality codecs is ensured since three of the modes are existing state-of-the-art coders. The 12.2 kbit/s mode is equivalent to the GSM Enhanced Fullrate (EFR) coder, the 7.40 kbit/s mode is the EFR coder for the IS136 system, and the 6.70 kbit/s mode is the EFR coder for the Japanese PDC system. The relatively low complexity of the algorithm, typically less than 15 MIPS in a DSP for encoder and decoder, and less than 10% of a Pentium PC, allows cost effective implementations. 3.2 Error Concealment The ECU algorithm is integrated in the decoder and uses a state-machine structure with extrapolation and gradually attenuated output when consecutive frames are erased. A feature of the algorithm is the source signal dependent actions, which provide enhancements for speech in background noise. The ECU is designed to handle both detected frame losses and non-detected bit errors in the least significant bits of the frame. Hence, the coder provides high performance for circuit and packet switched connections with FER and/or BER. 3.3 Source Controlled Rate The coder includes an SCR scheme to lower the average source rate by detecting speech pauses and encoding non-speech segments with a lower rate. During non-speech, every 8th frame is encoded giving an average rate of approximately 220 bits/s. For typical conversations the activity factor ranges from 35% to 80%, with an average somewhat below 50% depending on the situation. The AMR source rate is thus reduced to on average 45%-55% of the maximum rate used with maintained speech quality. The rate may therefore be as low as 2.2 - 2.8 kbit/s. The VAD is designed to provide low activity factors without clipping the speech, and still detect complex signals such as music-on-hold to avoid disturbing switching effects[5]. 3.4 External Rate Control The rate can be changed on a frame basis. This feature can be used to control the used rate externally based on static configuration parameters or dynamically during a call. The rate is controlled via an inband control channel running between the end-points. The inband channel has two functions.

Final DRAFT

Key themes: T1,T3,T4

In the forward direction, it is used to indicate the presently used mode, while in the backward direction it is used to signal requests for mode changes in the opposite direction. Requests can be issued by end-points as well as by network entities.

4.

APPLICATIONS

Examples using AMR in GSM, UMTS/WCDMA and IP networks are given below. Other AMR applications are expected in the future.

4.1 GSM SYSTEMS The AMR circuit switched speech service is defined both for halfrate (HR) and fullrate (FR) traffic channels. In the FR channel, a sub-set of up to 4 of the 8 modes can be used at any time. In HR, a sub-set of 4 of the 6 lowest modes may be used. During a call, the channel quality varies significantly, due to fading, interference variations and path loss variations. These variations can not be compensated by the slow power control. However, the fast (less than 150 ms latency) AMR mode adaptation, is able to track many of these changes. Hence, high source rate is used when the channel is good, and lower source rate (and higher channel coding rate) is used when the quality of the channel gets worse [3,4], see Fig. 2.
Speech quality

Channel quality Figure 2 The Multi-Rate trade-off using 3 modes

The configurability, in terms of used mode sets and adaptation logic, of the AMR speech service provides full flexibility for network operators to tailor the service to the network characteristics or other operator specific needs, such as capacity, highest possible quality etc. 4.1.1 Channel Coding and Link Adaptation Unequally punctured convolutional codes are used, providing optimised error protection for each mode. The inband channel is protected with a block code, and has a lower error rate than the speech FER. The link adaptation is based on receiver link quality estimates, and allows the receiver to request the most suitable mode by a return link request[4]. The link quality estimation scheme is open, but suitable measures are C/I estimates, channel BER estimates or FER estimates. An example solution using C/I measurements is standardised. The adaptation is based on comparing the measurements to thresholds. Hysteresis is applied to avoid too frequent mode switching.

Final DRAFT

Key themes: T1,T3,T4

4.2 UMTS / WCDMA SYSTEMS The AMR speech coder with its SCR system is identical for UMTS and GSM. The compressed voice stream is thus fully interoperable between the two systems. The differences for the service are that for UMTS, a more generic radio access bearer is used, and that the inband channel has a different realisation. Additional flexibility is given since all 8 rates are usable at any time.

4.2.1 Maximum Rate Control For wideband wireless systems with fast power control, such as WCDMA, the radio channel quality variations are generally significantly lower than for e.g. GSM. The rate control therefore has a different purpose. The main usage is as a relatively slow maximum rate control function to i) increase system capacity under high network load conditions; e.g. at certain time of day, in a geographical area, or dynamically based on measured network load, and ii) increase quality for MSs with power limitations at the cell border. In tandem-free connections with GSM, the system still accept rapid mode changes, initiated by the GSM system, between modes below the maximum allowed rate. 4.3. IP-BASED NETWORKS IP can be used over various networks to gain higher flexibility and save cost, and the concept is applicable to both wired and wireless IP networks. Voice may be transported over RTP/UDP/IP as a single media stream in a dedicated configuration. However, it is common to use existing general multimedia protocols, such as H.323 and SIP. For this purpose, and also to support AMR as part of H.324/H.324M, AMR capability is currently being standardised in the control protocol H.245. 4.3.1 Rate Control The quality of IP networks vary significantly, and applications should provide reasonable quality also under packet loss conditions. The robustness of the AMR coder to lost frames makes it well suited. The multiple rates of AMR can be used by operators to increase capacity in high traffic networks, or differentiate quality between user groups (business and private), e.g. by coupling pricing and user rate. These rate changes are expected to be relatively slow. More rapid rate changes may be beneficial under severe congestion conditions, when the rate can be lowered to reduce the required bandwidth. This may typically be initiated by an end-point which detects high packet loss or increased delay and jitter.

5.

PERFORMANCE

The performance has been assessed for clean speech, speech with background noise, channel impairments, tandem connections,

Final DRAFT

Key themes: T1,T3,T4

input level variations etc. The conclusions from the extensive testing is that the higher modes provide quality equivalent to, or exceeding that of, wireline G.726 at 32 kbit/s. There is a graceful reduction in quality as the rate is lowered, and at e.g. 5.9 kbit/s, the quality is still at the level of G.723.1 operating at 6.3 kbit/s. The lowest modes provide competitive quality at the level of G.723.1 at 5.3 kbit/s. Results from MOS tests on GSM FR channels are given in Fig. 3. The increase in C/I tolerance is approximately 6 dB for the lower rates. The increase in quality over the present EFR for dynamically varying channels is often exceeding 1 MOS (5-point scale) which is highly significant. In terms of FER, undistorted speech is obtained up to 1% FER, and acceptable quality is obtained for FER up to 10%.
4.5 4.0 3.5 3.0 2.5 2.0 1.5 No Errors 16 dB 13 dB 10 dB 7 dB 4 dB 1 dB C/I 12.2 6.7 4.75 MOS
4,00 3,50 3,00 2,50 2,00 1,50 1,00 AMR EFR MOS

DEC1

DEC2

DEC3

DEC4

DEC5

Figure 3 Left: Performance for C/I from 19 dB (0.5% channel BER) to 1 dB (30% BER) for three modes. Right: Performance for 5 dynamic channel (DEC) conditions.

6.

CONCLUSIONS

The main features of the AMR speech coder have been described. The coder provides a flexible toolbox for operators to design high quality, high efficiency voice services for 2G/3G mobile systems as well as fixed and IP-based networks. The use of AMR in 2G/3G mobile networks, as well as in fixed and IP-based networks with full interoperability enhances quality, and at the same time reduces transmission costs significantly. The wide range of bit rates spanned by the 8 rates, from 12.2 kbit/s down to an average rate below 2.5 kbit/s for the lowest rate using SCR, and the robustness to errors allows operators to optimise both QoS and capacity for mobile, fixed and VoIP applications. The AMR coder has been extensively characterised in internationally coordinated tests. The quality for the higher modes exceed that of wireline G.726 ADPCM at 32 kbit/s, with a graceful reduction of quality for the lower modes. The lowest modes provide competitive communication quality.

7.

REFERENCES

[1] T.B. Minde, S. Bruhn, E. Ekudden, and H. Hermansson, Requirements on Speech Coders Imposed by Speech Service Solutions in Cellular Systems, in Proc. IEEE Workshop on Speech Coding, Pocono, PA, pp.7-10, 1997.

Final DRAFT

Key themes: T1,T3,T4

[2] E. Ekudden, R. Hagen, I. Johansson, and J. Svedberg, The AMR Speech Coder, in Proc. IEEE Workshop on speech coding, Poorvo, Finland, pp. 117-119, 1999. [3] O. Corbun, M. Almgren, and K. Svanbro, Capacity and Speech Quality Aspects Using Adaptive Multi-Rate (AMR), in Proc. PIMRC-98, Boston, MA, 1998. [4] S. Bruhn, P. Blcher, K. Hellwig, and J. Sjberg, Concepts and Solutions for Link Adaptation and Inband Signaling for the GSM AMR Speech Coding Standard, in Proc. IEEE VTC-99, Houston, TX, pp. 2451-2455, 1999. [5] A. Vhtalo, I. Johansson, Voice Activity Detection for GSM Adaptive Multi-Rate, in Proc. IEEE Workshop on speech coding, Poorvo, Finland, pp. 55-57, 1999.