Vous êtes sur la page 1sur 28

Voice Service Over LTE Networks

(VoLTE) and the Implications for

An Ascom Network Testing White Paper
By Dr. Irina Cotanis and Anders Hedlund

Prepared by:
Dr. Irina Cotanis, Anders Hedlund

October 2012

NT12-13122., Rev. 1.0

Ascom (2012)
All rights reserved. TEMS is a trademark of Ascom. All other trademarks are the property of their respective holders.


Ascom (2012)

Voice as One of the Many LTE Services .................. 4

An In-Depth Look Into LTE Voice Service

Solutions..................................................................... 5


Circuit Switched Fallback (CSFB) Solution ............................5

Voice over IMS (VoLTE) Solution.............................................6
Over the Top (OTT) Voice Solution..........................................8

What Is LTEs Signature on Voice Service

QoE? ........................................................................... 9


Traditional Sources Impacting Voice QoE ............................10

Coding and Error Concealment.................................................10
Voice Enhancement Devices (VED)..........................................10
Devices (phones) and Clients ...................................................11
Network-Centric Factors ...........................................................11
Subscriber Experience View on VoIP vs. PSTN Voice
Quality .....................................................................................12

What Does It Take to Ensure High VoLTE

QoE? ......................................................................... 12


Device Implemented Jitter-Buffer and Time Scaling ............13

Network-Centric View .............................................................13
Protocol Optimization of the Radio Network ..............................15
Physical Layer Optimization ......................................................16
Customer Experience Perspective and the Significance
of On-Device Testing ..............................................................17
QoE Evaluation Metrics ............................................................18
On-Device Testing ....................................................................18

LTE Voice Quality Testing Scenarios .................... 19


VoLTE Troubleshooting .........................................................19

Radio Interface Aspects ............................................................20
Beyond the Radio Interface ......................................................24
Circuit Switched Fallback Testing Scenario Results ...........25

NT12-13122., Rev. 1.0


Ascom (2012)

Conclusions ............................................................. 27

References................................................................ 28

NT12-13122., Rev. 1.0


Voice as One of the Many LTE Services

The introduction of 3G networks and the ensuing evolution to 4G/LTE,

combined with the exponential growth of smart devices, have changed the
wireless ecosystem from every perspective: technical, business and the
customer experience.
Advanced wireless technologies and sophisticated signal processing
techniques have unleashed the technical feasibility of communication rates
of 100Mb/s and beyond. These high bit rates enabled an explosion of a
large variety of mobile broadband services (e.g., voice, data and
multimedia), which inevitably and irrevocably affected the customer mobile
experience raising subscribers expectations for fixed-like service delivery,
where mobility is no longer an acceptable excuse for poor quality of
Commercial wireless operators face an impending data tsunami, with
analysts estimating [1] 82.5% smart device penetration (phones, tablets,
notebooks and laptops) and a 78% increase in mobile data traffic
consumption by 2016, while being strapped by limited spectrum and
CAPEX/OPEX constraints.
Additional trends [1] indicate that voice (VoIP) will use only 0.3% of the
mobile spectrum, while mobile video will take 70.5% and mobile data 20%.
So even though revenue from voice services will progressively decline and
become a commodity, today it still accounts for 70% of carriers revenue,
and carriers will continue to face the challenge of meeting customer
expectations for the high quality voice services they have come to expect
from their 2G and 3G networks.
The difficulty of maintaining user expectations of wire line type voice quality
with spectrum and CAPEX/OPEX constraints is rooted in the convergence
and coexistence of voice, data and multimedia application services
delivered by a complex radio access air interface. This interface involves
multiple factors that produce new types of distortions that dynamically,
variably, and even randomly, affect voice quality.
In addition, the flat, all-IP LTE network, which was designed as an evolution
of HSPA+ (Evolved High Speed Packet Access) data networks, regards
voice service as just one of many data applications, albeit one with specific
requirements for real-time traffic, QoS and interoperation with existing
circuit switched (CS) voice infrastructure. The delivery of voice services
over the IP Multimedia Subsystem (IMS) technology [2] based on the
Session Initiation Protocol (SIP) [3] that is widely used in fixed line VoIP
networks also provides additional challenges specific to wireless issues.
These include unreliable radio connections, application servers for external
application development, international roaming, scalability, and security, all
of which require the provision of high-quality voice services over LTE.
Regardless of these technological challenges, the ultimate goal for any
voice service is to optimally utilize the low latency and quality of service
(QoS) features available within LTE to ensure that the voice service offers
an improvement over the standards available on 2G and 3G networks.

Ascom (2012)

NT12-13122., Rev. 1.0


This paper discusses the different LTE voice service solutions as well as
aspects of the key performance evaluation metrics that must be considered
when implementing them. It takes an in-depth look into the challenges that
accompany the delivery of high quality of experience (QoE) LTE voice
services, as well as what is required to cope with these challenges. The
paper concludes with several examples of LTE voice service
troubleshooting that can help carriers efficiently provide voice service at
exigent QoE levels, consequently easing the all-IP migration for the voice
service that still accounts for more than 70% of their revenue.

An In-Depth Look Into LTE Voice Service


For several years, the 3GPP and other wireless industry forums [4], [5], [6]
evaluated various voice service solutions that could optimally meet the
requirements imposed by the integration of voice within a data-oriented
network, such as LTE. Two 3GPP standardized solutions proved to be
feasible: Circuit Switch Fallback (CSFB) [7] and Voice over Internet
Protocol (VoIP) over IMS (or One Voice or Voice over LTE VoLTE) [8].
CSFB is seen as an interim and transitional solution until IMS technology is
fully deployed for wireless capabilities so it can then reliably offer complete
mobile voice support.
In addition, the deployment of packet switched (PS) voice (VoIP), and the
evolution of smartphones and broadband services, made it possible for
third-party Over the Top (OTT) voice solutions (e.g., Skype and Viber) to be
offered wirelessly over LTE, as well as 3G.


Circuit Switched Fallback (CSFB) Solution

To support voice service similar to 2G and 3G networks, LTE voice service

needs to enable mobile-originated and mobile-terminated voice and video
telephony calls. Therefore, mobile devices with an integrated telephony
client that camps on to the LTE radio access network can either originate or
terminate calls by performing a fallback to the legacy 2G/3G network. This
solution is known as Circuit Switched Fallback (CSFB). The user equipment
(UE) will not even camp on to LTE unless the core network provides a
suitable voice capable service such as CSFB. Therefore, the speech path
used in an actual established call is made via legacy radio access
technology rather than via LTE. Once a CSFB call has been completed, the
UE moves back to LTE coverage, if available, or continues camping on to
the 2G/3G cell. This high-level network architecture is presented in Figure
3GPP Release 8 of the LTE standard already specifies means to fall back
to a circuit switched voice service in GSM or WCDMA or CDMA network, if
available in the same coverage area. The specification also allows for SMS
to be carried along with voice. To achieve the fallback, the CSFB
functionality requires the availability of the SG interface, between the
Mobility Management Entity (MME) and the Mobile Switch Centre (MSC)
server, to enable it to provide circuit-switched paging to the LTE side, as
well as combined Evolved Packet Core (EPC) and international mobile

Ascom (2012)

NT12-13122., Rev. 1.0


subscriber identity (IMSI) attach and detach. These functionalities involve

updates of network elements such as the MSC as well as the Gs (MSC
Serving GPRS Support Node, or SGSN) interface. Besides these required
network changes, the terminals need to support the CSFB solution.










Figure 1

By design, the CSFB solution does not allow LTE functionality during voice
calls and generates interruption of an ongoing data connection. In addition,
it has weak support for multilayer networks (e.g., femto cells). The CSFB
solution does have minimal flexibility to be integrated with broadband voice
and multimedia services (e.g., presence, instant messaging, content
sharing) defined by the GSMA in the Rich Communication Suite Enhanced
(RCSe) [9] for LTE offerings.
As one would expect, because of the extensive signaling required to set up
the call, the CSFB solution comes with longer call setup times that could
significantly degrade the user experience. The call setup time also
increases with a change to another network. Estimated values show an
increase of 1.5 seconds in call setup time, regardless of call origination.
Some results of a live CSFB scenario are presented in section 5.2.
Therefore, the evaluation of the CSFB solutions performance requires
testing related to registration (e.g., MME translation of the Tracking Area
Indicator [TAI] of the LTE domain to the MSC Local Area Indicator [LAI] of
the 2G/3G domain) as well as the call setup; the latter potentially having a
significant negative impact on the QoE of the voice service. In addition,
evaluation of how much an incoming CSFB call impacts an ongoing user
data session is important in understanding the overall QoE.


Voice over IMS (VoLTE) Solution

The VoLTE solution initiated by the GSMA [8] is based on IMS technology
as defined by the 3GPP. The high-level architecture is presented in Figure
2. LTE radio access does not support direct connectivity to the circuitswitched core network and services, but rather radio is connected to an
Evolved Packet Core (EPC) that provides IP connectivity for the end user

Ascom (2012)

NT12-13122., Rev. 1.0


services and interworking toward existing circuit-switched networks. The

connectivity is achieved using the IMS-based new infrastructure: the
Telephony Application Server (TAS), which is designed to ensure seamless
service migration by using the MSC as the direct platform for TAS.















Operator's IP
(e.g. IMS, PSS)

Figure 2

VoLTE also defines a set of new interfaces (e.g., between the user's
equipment and the operators network, the Home and Visited Network
during roaming, and the networks of the two parties making a call).
On the network side, besides the new infrastructure and interfaces, VoLTE
standardization needs to address a series of functionalities required by the
integration within the LTE and the 2G/3G legacy networks. The subscribers
requirement for a seamless, anytime and anywhere call makes mobility and
handover to a non-LTE radio access technology (RAT, e.g., GSM, CDMA,
WCDMA) one of the most important functionalities. This is achieved by
using the Single Radio Voice Call Continuity (SRVCC) [10] function. Other
functionalities address optimal routing of bearers for voice calls when
customers are roaming, commercial frameworks and provisioning
capabilities for roaming and interconnect, as well as security and fraud
threat audit to prevent hacking and unauthorized entry into any area within
the network.
On the terminal side, the phone needs to have VoIP client software loaded
to provide the VoLTE functionality, which can be implemented at the
application layer of the phones protocol stack, in the form of an app. In this
case, the clients features such as time scaling of the voice signal which
regards the jitter-buffer handling can be controlled and tuned for improved
voice service quality by the phone vendors. The VoLTE functionality also
can be embedded in the phones chipset, in which case the modem-based
clients features are set by the chipset vendor (allowing less flexibility for
control and tuning). Details on the importance of the time-scaling feature on
the QoE of the voice service are presented in section 3.1.3.
Adopting the IMS-based specifications allows the VoLTE solution to be
integrated with the suite of applications that will become available on LTE
through the IMS core. A variety of services can run seamlessly, rather than
having several disparate applications operating concurrently. The GSMA
defined the multimedia communication suite (RCS) to run both over LTE
and other networks such as 2G/3G. It covers multimedia services in three
areas: rich address book, rich messaging, and rich call. As part of the rich
call, RCS includes the voice service, regardless of whether it is realized on

Ascom (2012)

NT12-13122., Rev. 1.0


circuit-switched or packet-switched networks. When using packet-switched

voice, RCS-e (RSC enhanced) aligns with the packet-switched voice
protocols as described in the GSMA recommendation [8], making VoLTE
the base for packet-switched voice in RCS. In this way, RCS-e and VoLTE
jointly provide a comprehensive set of communication services for the LTE
environment, from basic voice to a full set of rich multimedia services [22].
At a high level, the implementation of the VoLTE solution as well as its
functionality might appear straightforward. However, there are many
network- and terminal-related issues that are expected to impact VoLTE
QoE, especially the vagaries of the radio access network where time delays
and propagation anomalies add considerably to the complexity of delivering
high-quality voice services.
Radio access aspects of VoLTE are mainly related to the Radio Link
Control (RLC) mode functionality (acknowledge (ACK) or un-acknowledge
(U-ACK) mode) as well as the handover performance, especially for the
SRVCC operation. The key role for achieving a high VoLTE QoE is played
by the optimization of the VoIP protocol stack configuration, such as a
scheduling scheme (e.g., semi-persistent scheduling), VoIP bearer Quality
Class Indicator (QCI, e.g., QCI =1) as well as the usage of Transmission
Time Interval (TTI) bundling ensuring a more continuous transmission and,
therefore, shorter end-to-end delay. Non-RF-related aspects of VoLTE
pertain to the terminals, such as client implementation and jitter-buffer
handling, as well as issues related to the voice enhancement devices
(VEDs), such as echo and gain control (see section 3.1). All of these
represent potentially serious impediments to delivering and maintaining
good VoLTE QoE.
Details regarding these topics are presented in section 4.1, and a
demonstration case for troubleshooting problems related to these aspects
is in section 5.1. Section 3.1 describes in more detail the impact of terminal
and network performance on VoLTE service QoE.


Over the Top (OTT) Voice Solution

Similar to VoLTE, the third-party OTT solution relies on new infrastructure

(Telephony Application Server) and, therefore, comes with a series of
attractive advantages. These include being free from MSC legacy continuity
and IMS complexity, as well as being technically viable on LTE and
WCDMA networks. Integration with presence, support for nontraditional
voice apps, and the availability of app stores ensuring an easy user
installation, represent very attractive features which rival the ones offered
by RCS-e.
However, the OTT solution comes with important technical challenges that
can significantly impact the subscribers voice QoE.
First, unlike VoLTE, the OTT solution does not benefit from a voice
dedicated bearer and VoIP optimized protocol stack. Therefore, it is
expected that OTT voice will be delivered on a non-Guaranteed Bit Rate

Ascom (2012)

NT12-13122., Rev. 1.0


(GBR) bearer with something like a. Quality Class Indicator (QCI) =7 [11]. It
will more likely be a dynamic scheduler scheme that ensures optimal radio
resources for each transmission depending on the radio conditions and
load instead of an optimized VoIP scheduler (e.g., semi-persistent) and
dedicated bearer (e.g. QCI=1 [11]). Third-party voice service providers have
no control over these QoS aspects in the wireless network, and thus they
cannot ensure a good QoE under all load situations. This issue could
possibly be resolved by installing logic in the network that would ensure
QoS for data streams that are recognized as belonging to an external voice
service to which the user is subscribed. However, standardized work is
needed on this topic.
Second, third-party OTT calls cannot be handed over to a circuit-switched
2G/3G network when a user leaves the LTE coverage area since the
external applications cannot easily be tied into the wireless network
Therefore, testing OTT solution performance requires a careful analysis of
the aforementioned QoS aspects that could possibly generate poor or even
unacceptable QoE. In addition, the understanding of the OTT voice QoE
requires evaluation of the behavior and performance of OTT clients that
embed proprietary error concealment schemes and adaptive buffering
techniques on the subscribers device.

What Is LTEs Signature on Voice Service QoE?

Traditionally, mobile voice service QoE is known to be impacted by the

performance of codecs, voice enhancement devices (VEDs) that are both
network- and terminal-implemented, user device characteristics, and last
but not least the network.
During the past decade, the following trends caused the rapid technology
evolution from 2G to todays 4G/LTE networks:

Maintaining and improving wireless voice service QoE

Increasing voice capacity

Minimizing CAPEX/OPEX costs

Adding many more wireless services, e.g., data, Web, video

As expected, technical complexity along with the constraints and

requirements that come with the 4G wireless ecosystem bring with them
more challenges for delivering, assuring and maintaining voice service
QoE, both for circuit-switched and packet-switched networks. Sophisticated
speech processing techniques implemented in new codecs, voice
enhancement devices and terminals have been designed to cope with
complex network conditions (e.g., radio conditions, traffic load). However,
the complexity of the signal processing made codecs and VEDs more
vulnerable to performance artifacts, and therefore made them more
susceptible to significant degradation of voice service QoE.

Ascom (2012)

NT12-13122., Rev. 1.0


Therefore, understanding the evolution of the speech technology within the

4G/LTE network environment is very important for the evaluation and
control of the voice services quality. Also, understanding the interaction
between different components impacting the voice QoE is significant for
detecting and troubleshooting their individual contribution on the overall
perceived quality. This section discusses aspects related to the speech
technology and to the network evolution that impact the voice QoE.


Traditional Sources Impacting Voice QoE


Coding and Error Concealment

Coding and error concealment techniques are moving quickly from

narrowband (NB) voice to high-definition (HD) quality, wideband (WB) and
even super WB (SWB) for some OTT voice application such as Skype and
Google Talk. New codecs supporting these bandwidths and high bit rates,
whether standardized (like Adaptive Multi-Rate-Wide Band+, Enhanced
Variable Rate Codec, Evolved Voice Service, Advance Auto Coding) or
proprietary like SILK (Skype), need to ensure a broad range of
compression levels while ensuring high speech quality in error-free
conditions. However, a high level of compression removes almost all of the
redundancy in the speech signal, which in turn leaves voice quality
sensitive to transmissions errors. Although complex error concealment
schemes are implemented in these codecs to reconstruct the signal at the
receiving side, they are prone to errors especially in high transmission error
environments such as the OFDM radio environment of LTE radio access
technology. In these cases, the reconstructed speech frames can exhibit an
artificial, robotic sound.

Voice Enhancement Devices (VED)

Voice enhancement devices, or VEDs [12], [13] such as noise reduction

(NR), automatic gain control (AGC), and echo cancellation (EC), are
designed to maintain, and even increase, the voice quality in conditions
prone to noise, level variation, or echo. However, they come with their own
unique set of challenges that impact QoE. Noise reduction techniques that
do not properly balance between noise and speech periods, or that
completely remove the natural background noise, can drastically degrade
voice quality. Similarly, AGCs that are too aggressive or too slow can
deteriorate the speech signal, resulting in a further perception of
degradation. An LTE voice service that offers superior speech bandwidths,
such as wideband and super wideband, is more sensitive to noise reduction
and AGC design. This is a result of the fact that speech degradations within
larger bandwidths are more acutely perceived than those in narrowband
scenarios. Acoustic and hybrid echo cancellers are designed for
attenuations up to -45dB and for delays longer than 200ms, values wellknown to be perceived as annoying by subscribers. The high non-linearity
and time variance of the LTE radio environment challenges the echo
estimation and, thereby, can result in an annoying echo either not being
removed or compensated. In addition, specific scenarios of VoIP could

Ascom (2012)

NT12-13122., Rev. 1.0


exhibit a larger range of delays and/or attenuations than the ones the echo
canceller was designed to intervene with and compensate for.

Devices (phones) and Clients

Commonly used phones impact the voice quality due to time-variant linear
distortions, such as spectral shaping, and/ or non-linear distortions like
microphone and transducer interfaces and reverberations caused by
hands-free set-ups at acoustical interfaces. Todays smartphones designed
for HD voice, and with technologically advanced acoustical interfaces, are
expected to have less impact on voice quality. However, LTE requirements
for high bandwidth efficiency (to support a multitude of data and multimedia
services while coexisting with voice delivered on PS IMS support) drove
the necessity of adaptive buffering schemes. These buffering schemes can
use various time-scaling or speech-frequency re-sampling algorithms to
cope with challenging network behaviours affected by packet loss and
delays such as inter-RAT handovers and IP congestions. Time scaling can
be either stretching (under sampling) or compressing (over sampling) the
speech signal, depending on the rate with which it comes from the buffer
and/or if the buffer is over-run or empty [19].
There are two main categories of time-scaling algorithms: with speechpitch preservation or without ([14]); each exhibiting different trade-offs
between performance and speech processing complexity. The impact of
the algorithms performance on the overall speech quality is determined by
the distribution of the time scaling and its frequency of occurrence within
the speech sample, as well as its length. All these characteristics are given
by the network behavior (e.g., packet loss, variable delay), which requires
different levels and distributions of algorithmic error compensation.
The time-scaling algorithms are not standardized, leaving open the
possibility of various performances and, therefore, of different QoE trends.
The algorithms are implemented in the VoIP client that supports the voice
service in the network. The clients can be either software clients as
applications on the device, or modem-based implemented in the phones
chip. Therefore, the clients performance of coping with network behaviors
is client- and chipset vendor-dependent and phone-dependent.

Network-Centric Factors

These factors affecting the LTE voice service quality emerge from various
networks radio frequency (RF) and non-RF characteristics described in
more detail in section 4. Those that have immediate impact generate
interruptions or loss and delay. Not cancelled packet loss could generate
perceived interruptions, whether caused by reasons such as non-ACK RLC
mode of the VoIP dedicated bearer or IP congestion, and uncompensated
handover (inter-RAT). Delay especially when it is variable length and
randomly distributed during speech, rather than at the beginning of speech
is very annoying to subscribers. These scenarios could be caused by
uncompensated handover delays (LTE intra-RAT) or uncompensated RLC
retransmissions in scenarios for which the VoIP bearer is deployed using
ACK RLC mode. VoLTE might also show the delays and interruptions due
to the IMS technology solutions that cope with mobility and unreliable radio

Ascom (2012)

NT12-13122., Rev. 1.0


conditions, all still under standardization evolution. An important role is also

played by Discontinuous Transmission (DTX), which provides reduced RF
interference and, therefore, satisfies bandwidth efficiency requirements and
capacity constraints more drastically needed in LTE. Network DTX in
conjunction with Voice Activity Detection codec-based schemes could be
too aggressive and cause significant speech front-end time clipping,
strongly impacting the perceived voice quality.


Subscriber Experience View on VoIP vs. PSTN Voice


LTEs signature on voice service quality also comes with the human
perception dimension; that is, subscribers comparing the experience of the
VoIP quality and additionally VoLTE (VoIP over IMS) quality against the
circuit switched (CS) voice quality they have experienced for the past 20
PSTN/CS voice service with a dedicated 56k (or 64k) time slice allocated
for each channel/circuit is governed by highly optimized 2G/2.5G/3G
networks. In addition, CS voice service benefits from well established and
standardized codecs with highly efficient error concealment and rate
adaptation techniques encoding both NB and WB, as well as enhanced
speech processing procedures. All these factors contribute to raise voice
service quality to levels that are known to satisfy subscribers, and this is
categorized as providing a mean opinion score (MOS) of 4.2MOS to
4.4MOS for the entire calls length.
VoIP service is supported by packet switching that was not originally
designed for real-time sessions, such as voice and video traffic, and/or
mobility. VoIP requires the call to go through various transformations, such
as encoding/decoding at low and adaptive bit rates, changes in routing
during the call, packets out of sequence or lost, delays, and buffering/jitter
delays. To compensate for all of these challenges while sustaining the
customer experience, new protocols such as Real-time Transport Protocol
(RTP), Real-Time Transport Control Protocol (RTCP), and Session
Initiation Protocol (SIP), as well as new QoS strategies and policies such as
Multiprotocol Label Switching (MPLS) have been developed. In addition,
new codecs, with a large variety of rates and even variable rates, and
multiple bandwidths, from narrowband to super wideband with complex
error concealment techniques, create the foundation for a high quality voice
service, if provided at pre-established service level agreements (SLAs).
Therefore, given a dedicated bandwidth, minimum delay, HD voice and
efficient QoS policies such as MPLS, it is expected that the quality of VoIP
service will soon meet, and actually surpass, the voice service QoE that
subscribers are already used to and expect.

What Does It Take to Ensure High VoLTE QoE?

In the previous section are discussed aspects related to the speech

technology and to the network evolution that impact the voice QoE. This
section discusses the procedures required for ensuring a high QoE as well
as what, and how, measurements, evaluation and troubleshooting need to

Ascom (2012)

NT12-13122., Rev. 1.0


be performed in order to achieve an accurate subscriber perception of the

VoLTE service.


Device Implemented Jitter-Buffer and Time Scaling

Real-time services for packet-switched networks have to have a buffer

handling variations in the delays in the bit-pipe. Delays are introduced by
scheduling, cross traffic, retransmissions to cope with errors, and handover
among others. Typically these buffers (referred to as jitter-buffers) contain
80-100ms of buffered end-user data such as speech. Large delays cause
data to be consumed from the buffer at a faster pace than new data is
added. In this case the jitter-buffer could run empty and cause speech
degradation if not compensated for. However, in addition to this scenario,
the VoLTE service could experience cases in which speech comes at
higher bit rates than the play out rate mainly due to the traffic optimization
techniques implemented in the scheduler. Therefore, the jitter-buffer may
overflow, if it is not compensated for. As mentioned in section 3.1, the
empty or over-flown buffer is compensated for by using time scaling. Time
scaling of 40-80ms using different techniques is not perceived as
noticeable degradation by the end user.
The jitter-buffer is very tightly integrated within the real-time application
such as the VoLTE client. Feeding the jitter-buffer with data at the same
pace as its consumed is the main task for the lower layers of the radio
access network. Therefore, the performance of these network layers will
significantly impact VoLTE QoE. Too many packets lost or received out of
sequence, or long delays, can generate error levels and patterns which the
jitter-buffers time scaling scheme can either not cope with or plays back
the speech signal with annoyingly perceived stretch and compression


Network-Centric View

End-to-end VoLTE services involve several protocol layers, as shown in

Figure 3, for a mobile-to-mobile setup. The protocol stack (Figure 3) can be
divided into two parts, with the lowest three layers belonging to the radio
access network (RAN), and the higher layers travelling though the core
network to the other calling party.

Ascom (2012)

NT12-13122., Rev. 1.0


Figure 3

Figure 4

Therefore, achieving the best possible VoLTE QoE requires the

optimization of the VoLTE protocol stack. Details are presented in sections
4.2.1, 4.2.2 and 4.2.3. The process (Figure 4) is complex due to the fact
that the VoLTE protocol stacks configuration needs to be performed in a
completely different way than typical non-real-time service usage in order to
support VoLTE in an appropriate manner. Several network nodes and the
subscriber device (or user equipment aka UE) need to be involved. In
addition, per-service type configuration and optimization of the VoLTE
protocol stack is required. Otherwise, it is likely that the voice will travel
through a bit-pipe configured for a completely different type of service,
resulting in poor end-user-perceived quality, and non-optimal utilization of
the network investments resulting in a waste of precious frequency
In addition, the voice service delivered on devices travelling through the
network and experiencing constant serving cell changes require handovers
that should be less frequent and when taking place successfully performed
in the shortest possible time. This is because handovers cause delays

Ascom (2012)

NT12-13122., Rev. 1.0


impacting the quality of voice service. To achieve these kinds of handovers,

radio optimization needs to be performed differently than other services
which are less sensitive to long delays than real-time voice services.
Handovers to 2G/3G networks also require smooth SRVCC function

Protocol Optimization of the Radio Network

Radio Link Control (RLC)

The highest layer in the RAN part of the stack, the RLC layer, can be set up
in two different modes for voice; acknowledged (ACK) and unacknowledged (U-ACK). The third existing mode is transparent and it is
used for signaling broadcast-like system information.
The RLC acknowledge mode ensures an error-free radio interface since the
erroneous blocks from the Medium Access Control layer are retransmitted
by the RLC layer. However, the price paid for an error-free air interface can
result in delays caused by the retransmissions.
In LTE, the MAC layer handles retransmissions with a very short delay
(within 10ms in most cases). In this case, it is better to leave the eventual
remaining block errors (residual errors from the MAC layer) to be handled
by the error concealment mechanism in the UE-based VoIP client, rather
than introducing additional delays in the RLC layer.
The delay values larger than the jitter-buffer in the voice client will leave the
client with no speech to decode resulting in degraded speech quality. The
jitter-buffer consists of 20ms speech frames delivered from the RTP layer.
The jitter-buffer is typically around 80 to 100ms and could be dynamically
adjusted to the level of variations measured on the received packets from
the RTP layer.
Therefore, for a real-time, delay-sensitive service such as VoLTE, the RLC
layer should be set up in unacknowledged mode. A sequence numbering
mechanism is generally used to ease the RLC packet handling and ensure
an efficient, more reliable unacknowledged mode.

The MAC Layer

In the MAC layer, a set of parameters can be optimized and several

features can be used to improve the performance of a VoLTE service. The
most important one relates to how the radio resources are optimally shared
among all the end users in a cell. This function is accomplished by the
scheduler in various ways.


A VoLTE service can be scheduled with higher priority than other non-realtime services and prioritization between different services is possible if the
scheduler knows which type of services each user runs. This is referred to
as a QoS-aware scheduler, which is a feature of the scheduler, rather
than a configuration of the MAC protocol itself.

Ascom (2012)

NT12-13122., Rev. 1.0


Semi-Persistent Scheduling

The MAC layer allows for the possibility of persistent scheduling. However,
the VoLTE service follows a known pattern (typically 20ms blocks with a
limited size) and each radio block does not have to be assigned uniquely.
Therefore, it is possible to reserve and dedicate a part of the resources to a
particular end-user service. Called a Semi-Persistent Scheduling (SPS)
configuration, it reduces the signaling overhead significantly and thereby
the load on the Physical Dedicated Control Channel (PDCCH).

DRX Configuration

In addition, the VoLTE-specific pattern allows configuring a discontinuous

reception scheme for the UE in order to save battery life; the UE can be put
in sleep mode when there is nothing to receive.

MAC Retransmissions

Whenever RLC runs in U-ACK mode, it is likely that more than the network
default setting of four retransmissions is needed. The LTE MAC can
configure the Hybrid Automatic Repeat reQuest (HARQ) retransmission
handling mechanism so that it performs better for voice services.

Physical Layer Optimization


Keeping as small an overhead as possible, and consequently maximizing

transport block size, is critical for the LTE physical layer which needs to
ensure low latency. This is performed based on the physical layer
adaptation to the radio quality, which is itself based on the channel state
information (CSI) feedback from the UE and the eNodeB measurements on
UE reference signals. However, this is not optimal for low bit-rate services
such as voice.
As an example, lets assume a scenario in which the UE reports the highest
possible Channel Quality Indicator (CQI=15), and that a 20MHz channel
with 100 resource blocks per TTI (1ms) is available at the LTE cell to which
the UE connects. In a 100 Radio Blocks (RB) configuration, there is always
a minimum of 4 RBs allocated to a communication link. The typical situation
is now for the eNodeB to select 64QAM modulation and a very large
transport block size see Table 1, [20]. A new block with speech every
20ms means 2984/0.02 = 149kbit/s, which is much higher than the MAC
layer payload for the voice service (typically only 20 to 30kbit/s)
Therefore, in the trade-off between overhead and payload governed by the
CQI and the LTE cell configuration respectively, the modulation scheme is
the key to optimizing VoLTE service.

Ascom (2012)

NT12-13122., Rev. 1.0


Table 1

Frequency Hopping

Utilizing frequency diversity by using optimal parts of the spectrum over the
time domain is possible in LTE. In the downlink, the scheduler can use the
sub-band CQI feedback from the UE. For the uplink, methods for both interTTI and intra-TTI hopping exists, and the sounding signals from the UE can
be used to detect optimal parts of the spectrum for specific UEs.
This also helps in optimizing the VoLTE service.

VoLTE Setup for Protocols Above the Radio Access


The protocols above the radio layer are controlled mainly by the EPS
Session Management and the EPS Evolved Packet System, which sets up
a bearer context describing the characteristics of the service quality
requirements. The requirements of VoLTE service on delay, bandwidth,
and priority make the use of the QCI configuration 1 suitable for the
appropriate performance to be provided [7].
The Packet Data Convergence Protocol includes functions such as
encryption, header compression, and sequence numbering. The header
compression is very important in order to keep the protocol overhead at a
minimum level. The Robust Header Compression (RoHC) profile (e.g.,
0x0001 or 0x0101) is recommended for VoLTE services [15].


Customer Experience Perspective and the Significance

of On-Device Testing

The network elements and configurations that ensure an optimal VoLTE

service delivery are critical for understanding why VoLTE service
degradation happens. Equally important is accurately evaluating if the voice
service quality degradation happened at a statistically significant rate,
when, and how much it affected the customer experience. Cost-efficiently
answering these questions requires the most accurate QoE assessment
metric, such as the latest ITU-T P.863 standard (POLQA algorithm) as well
as a statistically significant processing of the metrics output. In addition,
the test design and setup for the quality evaluation needs to closely
emulate the real subscriber scenario.

Ascom (2012)

NT12-13122., Rev. 1.0



QoE Evaluation Metrics

POLQA [16], an algorithm based on human perception and cognition of the

voice quality, is specifically designed to handle disruptive effects caused by
multicomponent distortions which are characteristic of the 4G ecosystem
the convergence of LTE and IMS technologies. The algorithm provides
accurate MOS estimation on standard and high-definition voice quality that
will be challenged by LTE technologys implementations. These include
new codecs and error concealment techniques, variable delay and time
scaling, front-end speech clipping, as well as linear and nonlinear
distortions characteristic to various terminals; all described in section 3.1
Central to understanding and cost-efficiently controlling voice service
quality is the ability to correlate voice quality with network-centric metrics.
While the network dimension can be defined as described in section 4.1,
the speech-centric dimension is exposed by going beyond the MOS score
using QoE algorithms like POLQA. In addition to the MOS estimate, the
ITU-T P.863 standard provides a set of metrics to be used for diagnosing
speech quality degradation and correlating it to device performance (e.g.,
codecs, VEDs, error concealment schemes such as time scaling) and, to
some extent, to the behavior of the network (e.g., packet loss, delay).
Details on the POLQA algorithm, its additional speech quality
measurements and their importance, and the transition to POLQA from the
older speech quality evaluation technology called PESQ (ITU-T P.862), are
presented in previous Ascom white papers available on the Ascom Network
Testing website (www.ascom.com/networktesting.[17], [18]

On-Device Testing

An accurate evaluation of voice service quality requires setting up a test

design that closely emulates the subscriber experience. It is important,
therefore, to design and set up evaluation measurements using
measurement software that resides on the actual connected test device,
rather than being part of the PC software. In this way the phone-based
VoLTE client ensures that test voice calls will be prioritized in the network
as they would with real-time service, rather than be handled on a best
effort basis, which is the case when the testing relies on a PC-based
VoLTE client. Therefore, a device-based VoLTE client is assigned a VoLTE
logical channel with high-priority QoS class (QCI=1). The service
performance will not suffer from any limitations inherent in the test setup,
and it will therefore reflect the real-world subscriber experience in a way
that cannot be fully achieved with a PC-based VoLTE client. In addition, the
device-based VoLTE client will be adapted to, and optimized for, that
particular device, which ensures that no artificial delays are introduced in
the evaluation.

Ascom (2012)

NT12-13122., Rev. 1.0


LTE Voice Quality Testing Scenarios


VoLTE Troubleshooting

Troubleshooting such a complex setup as VoLTE requires a tool capable of

monitoring the complete protocol stack, from the RF level up to the VoLTE
application. Many different aspects have to be considered, from static
configurations of the radio network to details on the radio interface. In
addition, the ESM (EPC Session Management) configuration, as well as the
VoLTE client information, needs to be analyzed. While the radio and the
core part are network-centric, the VoLTE client information is device- and
speech-processing-centric. Figure 5 provides guidelines for finding possible
root causes of poorly performing VoLTE service. Troubleshooting requires
the analysis of several parameters specific to the network, and to the
device/client, as well as the correlation of the two.
Voice service troubleshooting should be performed only if QoE statistical
metrics (POLQA scores) show consistent and statistically significant
behavior below an acceptable level of QoE performance. [17] [18]
In addition, the distribution of QoE values over time and geographical area
should be considered. Troubleshooting local problems (over time and area)
is different than severe problems over large areas, or problems that occur
over a longer period of time. Severe problems of that magnitude may
dictate another approach with a more thorough investigation of parameter
settings of the protocol stack (see section 4.2., Figure 3, above).
The troubleshooting results must provide the perceived VoLTE service
quality values (MOS scores) as well as uncover the reasons for
unacceptable QoE performance.
The VoLTE troubleshooting demo case considers the radio access network
(e.g., coverage, interference, high load, bad configuration or handover) as
well as a few topics outside the radio path, such as the VoIP client.

Ascom (2012)

NT12-13122., Rev. 1.0


ESM configuration
QCI (PDN Connection)
RoHC (PDCP protocol)

Cell bandwidth
UE category
TDD UL/DL Config
MTU Size
Protocol stack configuration


Dynamic parameters


Transmission mode (MIMO)

Neighbouring cells detected
Distance to site


Speech & Client


RF Measurements


centric reasons


Network centric

Client Information
Buffer overrun, underrun
Time scaling
Packet loss

Figure 5


Radio Interface Aspects

Radio Access Network Related

The channel quality indicator (CQI) measured by UE on the downlink

channel incorporates many aspects of the signal path and has a lowest-tohighest scale of 0-15. A low value like CQI=6 indicates voice channel
problems that are more likely caused by the radio access network. The
block error rate on the physical layer can also give an indication of RF
problems resulting in a high number of retransmissions on the MAC layer.
The transmission distribution (Figure 6) of the radio blocks should be
verified against target expectations. In an unacknowledged RLC setup,
more retransmissions can be accepted than in an acknowledge setup.
However, high values of the residual block errors prevent blocks from being
sent to higher layers and, therefore, could cause lower voice QoE
performance. This is particularly true if triggering the error concealment
mechanism in the VoLTE client is not enough to compensate.

Ascom (2012)

NT12-13122., Rev. 1.0


Figure 6

Coverage and Interference

Signal strength RSRP (Reference Symbol Received Power) and signal to

interference ratio SINR/CINR (Signal or Carrier Noise to Interference Ratio)
are two of the most important RF parameters for identifying coverage and
interference problems. A coverage-plot based in RSRP distribution on the
map gives quick information if the problem is coverage-related. Interference
in LTE typically comes from surrounding cells. Therefore, if a low SINR
value is detected, any of the available neighboring cells should be checked
if they do not exhibit signal strength close to the serving cell. In LTE, it is
very important to have distinct cell borders, since all cells operate on the
same frequency. A demo case for the described scenario is presented in
Figure 7. Also, in a TD-LTE network, the timing between cells is very
sensitive for good performance and can be measured by drive-testing.
If the coverage is good and no neighbors are detected, it is important to
investigate interference potentially coming from other sources. Using
scanning receiver equipment enables in-depth analysis of the radio
It is also important to analyze the uplink and downlink balance by
comparing the downlink path loss with the UE output power. Additional
details on the coverage and interference problems are presented in
Ascoms previously published Handbook on LTE Optimization. [21]

Ascom (2012)

NT12-13122., Rev. 1.0


Figure 7

Handover and SRVCC

As already mentioned, excessive handovers generate delays which directly

impact VoLTE QoE. Measuring handover interruption time is important, but
tricky, due to the fact that many layers are buffered on top of one another;
thus, correlating the gap in the physical layer or MAC layer to the speech
quality degradation is not as obvious. A better approach is to measure
within the VoLTE client itself to detect the jitter-buffer level and error
concealment mechanisms setup. A handover-detected problem should be
checked if it is inter- or intra- sites, since the latter are faster and, therefore,
have less impact on QoE. In addition, the inter-RAT handovers could be
significantly affected by the SRVCC function which requires the use, and
definition, of drive testing measurement events to evaluate the functions
It is also important to verify the RACH (Radom Access Channel)
configuration, which needs to show a reasonably low number of preambles,
with an appropriate output power transmitted at each handover, in order to
ensure low handover interruption time.
A demo case showing the voice quality degradation due to handover (HO)
interruption is presented in Figure 8.

Ascom (2012)

NT12-13122., Rev. 1.0


Figure 8

Cell Load

Cell load can have a significant impact on the performance of the VoLTE
service. Three types of measurements should be monitored: RSRQ
(Reference Symbol Received Quality), CFI (Channel Format Indicator) and
resource block scheduling rate. RSRQ is a measure of the relationship
between signal strength (RSRP) measured on the reference symbols and
the total received signal strength received on all symbols (RSSI). It gives an
indication of the load in the cell, but can be difficult to use since small
variations in RSRQ can be caused by large differences in load. The CFI is
the control format indication and provides the number of symbols used for
PDCCH (Physical Dedicated Control Channel), which is the control channel
for the downlink. The number of active users in the cell can be monitored
by checking the distribution of the CFI values (0, 1, 2) (Figure 9);
distribution of the CFI toward higher values indicates more users. Last, the
scheduling can be checked via the PDSCH (Physical Downlink Shared
Channel) and PUSCH (Physical Uplink Shared Channel) resource blocks
allocation. These values show how many of the total available resource
blocks are assigned to the UE. Even though the typical usage of those
measurements is to troubleshoot high-bandwidth services, they can also
provide an indication of how many users there are in the cell.

Ascom (2012)

NT12-13122., Rev. 1.0


Figure 9


Beyond the Radio Interface

As discussed in previous sections, the analysis beyond the radio network in

the protocol stack relates to various sources that could impact VoLTE QoE;
the device and/or client, the voice codec and VEDs, the core networks
configuration, and the type of call (e.g., VoLTE mobile to/from VoLTE or
non-VoLTE mobile, VoLTE mobile to/from CS domain or to/from mobile
OTT). Therefore, it is important not only to perform an end-to-end QoE
evaluation, but also to be able to identify the main source of the voice
service degradation. In the section 5.1.1, radio issues are presented.
Section 3.1 discussed the significant role of codecs, VEDs (e.g., speech
signal levels, echoes) and error concealment schemes (such as time
scaling implemented in the device/client) for achieving high VoLTE QoE.
These aspects can be captured, and their operability detected, using either
POLQA MOS and its additional measurements or additional speech signal
evaluation, such asecho detection and measurement (see Figure 10).
Round-trip time and one-way delay can be also estimated with POLQA.
Device-related problems should also consider high CPU load and operating
system scheduling.
The core network configuration addresses the QCI and RoCH, as well as
possible subscriber subscription limitations (SIM). The QCI values could be
used to explain possible QoE differences between VoLTE and OTT calls,
since the latter will always be attributed a higher Quality Class Indicator, as
described in section 2.3.
It is helpful to analyze voice quality differences between types of calls,
especially in the mobile-to-mobile call scenario when the POLQA MOS
QoE values reflect the impact of the combined uplink and downlink. A low

Ascom (2012)

NT12-13122., Rev. 1.0


QoE value on the downlink of one of the devices can be the result of a
handover on the uplink of the other device involved in the call.
A demo screenshot of some of these measurements, collected in drive
testing, is presented in Figure 10.

Figure 10


Circuit Switched Fallback Testing Scenario Results

As already mentioned in section 2.1, call setup time as well as the impact
on data sessions running in parallel are significant for Circuit Switched
Fallback (CSFB) QoE troubleshooting.
Drive test results on these measurements are discussed in this section
(Figures 11, 12, 13). It can be seen that the call setup time shows an
average of about 6 seconds, with deviations of about +/- 2 seconds, which
can produce an annoying experience for the customer. In addition, the data
sessions have been interrupted for about 2.5 seconds on average with
minimums of 1 second and maximums reaching 3.5 seconds. However,
these values are not likely to annoy users since data sessions are
performed (and perceived) as background activities. The time needed to
move back to LTE once the CSFB call ends, which consistently shows
values larger than 25 seconds, can be far more troubling. Although this
doesnt have a direct impact on customer experience, it shows the
deficiency of continuous use of increased LTE data capacity.

Ascom (2012)

NT12-13122., Rev. 1.0


Call Setup Time (s)

CSFB Call Setup Time


11 13 15 17 19 21

Call Attempt

Figure 11

Data Session
Interruption Time (s)

Data Session Interuption

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
Call Attempt

Time to go back to LTE


Figure 12

Time to go back to LTE after CSFB Call End

1 2 3 4 5 6 7 8 9 101112131415161718192021
Call Attempt

Figure 13

Ascom (2012)

NT12-13122., Rev. 1.0



The evolution from 3G to 4G/LTE networks has accelerated the availability

of myriad services, resulting in an impending data tsunami. Market
analysts show a 78% wireless data traffic increase by 2016. In the
emerging all-IP networks, voice will become more and more one of the
many data services, but today it still accounts for approximately 70% of the
wireless operators revenue.
The 3GPP and other wireless industry forums defined solutions proved to
be feasible: Circuit Switch Fallback (CSFB) and Voice over Internet
Protocol (VoIP) over IMS (VoLTE). CSFB is seen as an interim and
transitional solution until IMS technology is fully deployed for wireless
capabilities and can reliably offer complete mobile voice support. However,
the deployment of packet switched (PS) voice (VoIP), and the evolution of
smartphones and of broadband services, made it possible for the thirdparty Over the Top (OTT) voice solutions (e.g., Skype, Viber) to be offered
wirelessly over LTE, as well as 3G.
Regardless of the solution, delivering voice over LTE, while ensuring and
maintaining the QoE that customers are accustomed to, raises a series of
technical challenges. These challenges arise from the complex LTE
network and its integration with the legacy 3G network, as well as from
codecs, VEDs and terminals (phones) designed to support the 4G voice
service. From the network perspective, the way to ensure the best VoLTE
QoE possible requires optimizing the VoLTE protocol stack. Therefore,
thorough voice service QoE troubleshooting requires comprehensive
analysis of the lower and upper layers of the protocol stack.
Achieving high VoLTE QoE also depends on a variety of other factors, such
as codecs, VEDs (e.g., speech signal levels, echoes) and error
concealment schemes (e.g., time scaling), implemented in the device/client.
These factors can be captured, and their operability can be detected, in a
straightforward manner using QoE metric (POLQA) MOS and its additional
measurements, as well as additional speech signal evaluation such as
echo detection and measurement.
However, cost-efficient VoLTE QoE troubleshooting needs to rely on a
thorough understanding of the human factor. Therefore, it is important to
perform end-to-end, accurate estimations of the subscribers perception of
voice service quality using standardized QoE metrics. Equally important is
designing and setting up tests that emulate, as closely as possible, how
subscribers will perceive the voice quality service. This can be ensured only
by using on-device testing on phones native VoLTE software clients.
Voice service over LTE is at its dawn. Today, various solutions are
available (e.g., VoLTE, CSFB, OTT), but new perspectives such as VoIP
over HSPA (VoHSPA) are very likely to be standardized and deployed.
Therefore, we should expect and anticipate that a variety of testing and
evaluation scenarios beyond what the paper discusses will soon also be

Ascom (2012)

NT12-13122., Rev. 1.0



Cisco VNI Mobile 2012


3GPP One Voice; Voice over IMS profile V1.0.0, Nov. 2009


IETF, RFC 3261, Session Initiation Protocol, SIP

3GPP TS 23.1xx, 2xx, 3xx series


VoLGA, Forum, www.volga-forum.com


GSMA Forum
3GPP TS 23.272 Circuit Switched (CS) fallback in Evolved Packet
System (EPS); Stage 2






3GPP TS 23.216 Single Radio Voice Call Continuity (SRVCC);

Stage 2


3GPP TS 23.203 Policy and Charging Control Architecture


ITU-T G.168, Requirements for network echo cancellers, 2009.


ITU-T, G.169, Automatic level control devices, 1999

Werner Verhelst and Marc Roelands, An Overlap-Add Technique
Based On Waveform Similarity (WSOLA) for High Quality Time
Scale Modification of Speech, Acoustic Speech and Signal
Processing Proceedings, 1993
3GPP specification 36.306, E-UTRAN-UE Radio Access
ITU-T P.863, Perceptual Objective Listening Quality Assessment
(POLQA) An advanced objective perceptual method for end-toend speech quality evaluation of fixed, mobile, and IP-based
networks and speech codecs covering narrowband, wideband, and
super-wideband signals, Jan 2012



Ascom (2012)


I.Cotanis, POLQA technology, Ascom white paper, September

I.Cotanis, Understanding the move from PESQ to POLQA, Ascom
white paper, November 2011
S. Chakraborty and others, IMS Multimedia Telephony over
Cellular Systems, Willey & Sons, 2007
3GPP TS 36.213, E-UTRAN Physical Layer procedures
Handbook on LTE Optimization, Ascom, 2011
Andrew Chisholm, RCS, Ascom white paper, August 2012

NT12-13122., Rev. 1.0