Vous êtes sur la page 1sur 51

Master Thesis

Software Engineering
Thesis no: MSE-2005:16
October 2005

Voice over IP for Sony Ericsson Cellular


Phones

Petter Theander, Thomas Hultgren

School of Engineering
Blekinge Institute of Technology
Box 520
SE - 372 25 Ronneby
Sweden
This thesis is submitted to the School of Engineering at Blekinge Institute of Technology
in partial fulfillment of the requirements for the degree of Master of Science in Software
Engineering. The thesis is equivalent to 2 x 20 weeks of full time studies.

Contact Information:
Author(s):
Petter Theander
E-mail: di00pth@student.bth.se

Thomas Hultgren
E-mail: di00thu@student.bth.se

External advisor(s):
Tobias Åkesson
Company/Organisation: Sony Ericsson Mobile Communications AB
Address: Nya Vattentornet, SE - 221 83 Lund
Phone: +46 46 193 986

Pär Olsson
Company/Organisation: Sony Ericsson Mobile Communications AB
Address: Nya Vattentornet, SE - 221 83 Lund
Phone: +46 46 212 67 03

University advisor(s):
Håkan Grahn
School of Engineering, BTH

School of Engineering Internet : www.bth.se/tek


Blekinge Institute of Technology Phone : +46 457 38 50 00
Box 520 Fax : +46 457 271 25
SE - 372 25 Ronneby
Sweden
ABSTRACT
This report presents an investigation of the
possibilities to implement voice over IP (VoIP) in
Sony Ericsson cellular phones. The results from
this investigation show that it is partially possible
to implement such a solution. The best option for
doing so is to make use of the support for the Session
Initiation Protocol and the Real-time Transport
Protocol offered by the architecture. Another goal is
to evaluate if Bluetooth is able to handle the require-
ments needed for the solution. The whole concept is
proven by implementing a prototype. Measurements
on this prototype show that Bluetooth will be able to
handle the requirements of most IP-based voice com-
munication, i.e., in respect to latency and bandwidth.

Keywords: VoIP, Cellular phone, SIP, RTP


Contents

Contents ii

1 Introduction 2

2 A Need For New Communication Technologies 3


2.1 Circuit-switched Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Packet-switched Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.3 The Internet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

3 The Initial Idea 5


3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.2 Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.3 The Basic Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.3.1 Making an Outgoing Call . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.3.2 Handling Incoming Calls . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.4 Technical Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.4.1 The Cellular Phone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.4.2 The Base Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

4 Investigating the Options 9


4.1 Interview Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.2 Interview Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.3 Investigating the Current Architecture . . . . . . . . . . . . . . . . . . . . . . . 10
4.4 IP Multimedia Subsystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.4.1 The SEMC IMS Architecture . . . . . . . . . . . . . . . . . . . . . . . 10

5 Design of the VoIP Prototype 12


5.1 Solution Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.1.1 Maintaining Flexibility and Modularity using SIP . . . . . . . . . . . . . 12
5.1.2 Using SIP and SDP for Negotiating the Media Format . . . . . . . . . . 13
5.1.3 Bluetooth with IP Capabilities . . . . . . . . . . . . . . . . . . . . . . . 13
5.1.4 Overview of the SIP Solution . . . . . . . . . . . . . . . . . . . . . . . 13
5.2 Prototype Design and IMS Relationship . . . . . . . . . . . . . . . . . . . . . . 14
5.2.1 SEMC IMS Client Interaction . . . . . . . . . . . . . . . . . . . . . . . 14
5.2.2 IMS SL and the VoIP Server . . . . . . . . . . . . . . . . . . . . . . . . 15
5.2.3 The VoIPCore Component . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.2.4 The VoIPMediaHandler Component . . . . . . . . . . . . . . . . . . . . 16
5.2.5 The VoIP Callback Interface . . . . . . . . . . . . . . . . . . . . . . . . 16
5.3 Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.3.1 Registering with a SIP Registrar . . . . . . . . . . . . . . . . . . . . . . 17
5.3.2 Sending a SIP Invite Request . . . . . . . . . . . . . . . . . . . . . . . . 17
5.3.3 Starting the Media Session . . . . . . . . . . . . . . . . . . . . . . . . . 18
5.3.4 Requesting to Talk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5.3.5 Incoming Request Talk . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5.3.6 Incoming SIP Invite Request . . . . . . . . . . . . . . . . . . . . . . . . 20
5.3.7 Sending a SIP Bye Request . . . . . . . . . . . . . . . . . . . . . . . . 21
5.3.8 Incoming Bye Request . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

ii
6 Prototype Implementation 23
6.1 Bluetooth Connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
6.2 The VoIP Prototype . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
6.2.1 Changes in the Underlying Architecture . . . . . . . . . . . . . . . . . . 23
6.2.2 No Support for Full-duplex Audio . . . . . . . . . . . . . . . . . . . . . 24

7 Evaluation of the Prototype 25


7.1 Answers to the Research Questions . . . . . . . . . . . . . . . . . . . . . . . . 25
7.1.1 Reasonable Response Times . . . . . . . . . . . . . . . . . . . . . . . . 25
7.1.2 Possible to Implement IP-Telephony . . . . . . . . . . . . . . . . . . . . 26
7.1.3 Support for New Communication Technologies . . . . . . . . . . . . . . 26
7.2 Suggestions for Further Research . . . . . . . . . . . . . . . . . . . . . . . . . . 26

8 Discussion and Related Work 28


8.1 Network Address Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
8.1.1 VoIP in NAT Situations . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
8.1.2 Avoiding the NAT Problem . . . . . . . . . . . . . . . . . . . . . . . . 29
8.2 VoIP Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
8.3 Public Safety . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
8.4 Related Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

9 Conclusions 32

Acknowledgements 33

Bibliography 34

A The Session Initiation Protocol 36


A.1 Introduction to SIP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
A.2 The Architecture of a SIP Network . . . . . . . . . . . . . . . . . . . . . . . . . 36
A.2.1 User Agents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
A.2.2 Registrars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
A.2.3 Location Servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
A.2.4 Redirect Servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
A.2.5 Proxy Servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
A.3 Signaling in SIP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
A.3.1 Responses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
A.3.2 Requests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
A.4 SIP Message Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
A.4.1 Request Line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
A.4.2 Status Line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
A.4.3 Headers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
A.4.4 Bodies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
A.5 Bridging SIP and the PSTN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

B The Session Description Protocol 45

C The Real-time Transport Protocol 46

D Glossary 47

1
Chapter 1

Introduction

This master thesis work was undertaken to investigate the possibilities of introducing a new
communication technology into an already established communication interface. As new com-
munication technologies are emerging more rapidly today than a couple of years ago, the need to
merge these is also becoming greater. The general trend amongst emerging technologies is that
they are more or less exclusively developed to fulfill the needs of voice communication in an IP-
based packet-switched network as the Internet. Such technologies are commonly known as Voice
over IP (VoIP). Traditional telephony technologies, like the Public Switched Telephony Network
(PSTN), were on the other hand designed to work in circuit-switched networks.
The motivation for undertaking this investigative work was that we saw a general disappoint-
ment of the fact that a new communication technology often meant that one, as a user, were forced
to use a computer without any other really good alternatives. Thus, there was a need for a solution
that made it possible to use the emerging technologies in a more comfortable way, as for example
through a cellular phone.
To us, this lacking was a major drawback, and probably one of the facts that imposes a prob-
lem when introducing a new communication technology. It was these facts that led to the initial
solution proposal presented in chapter 3. This proposal was sent to Sony Ericsson Mobile Com-
munication (SEMC), and earned us the opportunity to undertake more extensive research of what
is actually needed in order to introduce support for a new communication technology in a cellular
phone.
This report presents an investigation of the possibilities for introducing a new communication
technology, like VoIP, into a Sony Ericsson cellular phone. The investigation is based on the
following research questions:

1. Will Bluetooth be able to handle the communication between the cellular phone and the
base unit in accordance to what is seen as "normal" response times and quality in traditional
telephony?

2. Is it possible to integrate IP-telephony support into a cellular phone based on the Sony
Ericsson architecture?

3. Is it possible to use any pre-existing techniques from the Sony Ericsson mobile phone ar-
chitecture in order to ease the implementation?

4. Is it possible to integrate support for more communication technologies based on the se-
lected communication protocols and the Sony Ericsson mobile phone architecture?

Interviews and implementation of a prototype was used in order to evaluate whether the SEMC
architecture supports new technologies. We find that the best option is to use the Session Initiation
Protocol (SIP) and the Real-time Transport Protocol (RTP), which both are supported through the
use of the SEMC IP Multimedia Subsystem (IMS) architecture. We will also see that a SIP and
RTP based solution will support interaction with other voice communication technologies through
the use of gateways. The evaluation of the prototype showed that Bluetooth will suffice for most
voice communication, i.e., in respect to latency and bandwidth.

2
Chapter 2

A Need For New Communication


Technologies

In order to understand why new voice communication technologies are introduced, when there
in fact already exists a working and well accepted system, one must understand the main dif-
ferences between the traditional Public Switched Telephony Network (PSTN), which is circuit-
switched, and the new IP-based technologies which are used in packet-switched networks. Due to
this reason there will be a short summary of the most important aspects of both circuit-switched
networks and packet-switched networks, along with their respective benefits and drawbacks.

2.1 Circuit-switched Networks


There exist different types of circuit-switched networks. The first, and probably the simplest
one, is a dedicated cable between two users. This system is however not very flexible when it
comes to adding more users, as each user would need a dedicated cable to every other user. This
would in fact mean that the number of cables in the network would grow exponentially [1]. To
solve this issue a switch could be introduced. This means that adding a new user only implies
connecting the new user to the switch. In the simplest case one could say that the task of the
switch is to form a connection between two users, and in this way attach the two, as if they were
actually connected to the same dedicated cable [1].
Although simplified, this is the main concept of a circuit-switched network; the network sim-
ply allocates resources along a path, between two or more end users, to form a dedicated line [1].
Over the years this paradigm has of course been refined and developed. Today’s circuit-switched
networks uses, e.g., Frequency Division Multiplexing (FDM), Digital transmissions, and Time
Division Multiplexing, to better utilize the capacity of its bearer (cables) [1]. The main task of
the circuit-switch telephony network is still the same, i.e., to manage and setup dedicated paths
and resources between end users, without any care being taken to what is actually being sent over
the connection. This means that much of the intelligence in the system resides in the network,
as it is the network that decides how to setup the path and manage the path throughout an entire
call-session [1].

2.2 Packet-switched Networks


Packet-switched networks were designed with focus on data transmission, i.e., with care taken
to the bursty nature (the amount of data sent during a session is not constant) of data transmissions
[1].
In a packet-switched network a packet of data is created by one node in the network, and the
address of the receiver is attached to the packet. The packet is then sent to the first network node,
or router as it is called in packet-switched networks. The packet’s address field is examined by the
router and passed on to the next appropriate node on the network. When the packet arrives at its
destination the data in the packet is processed [1]. One could say that a packet-switched network
operates in a very similar way as the traditional postal service.
The nature of a packet-switched network means that there are no dedicated resources allocated
from the network, i.e., the network offers no quality of service (QoS) [1]. This in turn means that

3
no resources will be wasted when sending bursty data, as is the case in circuit-switched networks.
The fact that most packet-switched networks do not offer any QoS means that a client using
the network can not assume that a sent packet actually is received by the recipient. It therefore
becomes the client’s responsibility to handle the QoS aspects of a session [1]. This is however
only true when using UDP and not TCP, as TCP adds transport control functionality to handle
these issues.

2.3 The Internet


The Internet was designed as a dumb network which soul purpose is to provide connectivity
between senders and receivers, no matter what type of data is carried [1]. Internet is constructed
as a packet-switched network with the Internet Protocol (IP) as its base for addressing and routing.
Therefore the structure of Internet is independent on the actual bearer of the data, as long as the
endpoints of each network support the IP paradigm.
As the Internet is a dumb network, and only provides unreliable transmissions, it is left to
the sender and receiver of the data to handle retransmissions, flow control, error detection, etc.
The network (Internet) itself is almost stateless and does not care for the arrival of the packets
sent [1]. This very fact makes the network itself very failure safe, as if one node in the network
malfunctions, this is only perceived by the receiver as a loss of packets, and a resend can be issued.
The packets can this time take another way through the network [1]. This is a great step away from
what is seen as normal conditions in traditional circuit-switched networks, as the PSTN, where
QoS is central. However, as the utilization of resources is better in a packet-switched network, and
the fact that the Internet has grown so large, along with the fact that its more or less free to use, has
led to that voice communication is shifting towards solutions for packet-switched networks [1].
It is with these facts at hand that we first became interested in the evolving communication
technologies, and thus started to think about the possibilities to integrate the new communications
into an already established communication interface.

4
Chapter 3

The Initial Idea

3.1 Background
In this chapter the initial idea will be presented. This idea was used as reference material when
we applied for a master thesis project at SEMC. As stated, this is the initial idea, and as can be seen
throughout this report there will be adaptations and modifications to it. Why there is deviation
from this initial idea is quite natural, as the idea presented in this section was not derived from
any pre-study, but rather out of creative thinking and logical reasoning. In short, it was quite clear
to us from the very start that this material would mostly be used as a means of describing one
potential solution to implement IP-telephony in cellular phones. This means that this idea was
derived without any insight on what possibilities were available in the SEMC architecture. For
now we will leave it at this, and describe the idea which earned us a position within SEMC to
investigate the true possibilities for IP-telephony within their architecture.
The source for the idea was that we felt dissatisfied with the fact that one was more or less
forced to either buy a new phone or get stuck in front of a computer, if one should use a new
communication technology, like for instance IP-telephony. This of course leads to that one has
to change phone dependent on which communication technology one would like to use. The fact
that a new communications technology imposes the need to use new physical equipment is in
our opinion one of the main obstacles when introducing new technologies, as people are often
reluctant to change their behavioral patterns [bok].

3.2 Vision
To address the problems described above, we conclude that it would be a good idea to gather
all communication technologies under one physical interface. In order to overcome the problem
with people’s reluctance to change, it was decided that a cellular phone could be a good hardware
interface for all different technologies. This decision was based on that the cellular phone is
already a well accepted way to handle communication, both voice and video. It also has the
advantage, compared to other solutions, that it is mobile. This means that one would always have
the choice to choose freely among the supported communication technologies, independent of the
physical location.
The freedom to choose communication technology and the possibility to fairly easy support
new technologies, without changing the physical equipment, would also lead to economical ben-
efits. This would be true for both companies and home users, as they can easily shift to the most
cost effective communication technology. The greatest economical gains would of course be for
large companies, due to the larger traffic volumes.
The value of having a solution like this will only increase in the future, as new technologies
and communication protocols will emerge more rapidly. Therefore being able to support these
new technologies without major hardware modifications will be even more important than is the
case today. Another benefit with having this solution available on the market is that it can con-
tribute to the development of new communication technologies and protocols, as they can more
easily be introduced to the market.

5
3.3 The Basic Idea
The general idea, which can be seen in figure 3.1, revolves around a cellular phone (1), which is
connected via Bluetooth (2) to a base unit (3), which in turn is connected to an appropriate bearer
for that specific media type (4).

Figure 3.1: An overview of the basic idea

3.3.1 Making an Outgoing Call


If one looks at the flow when making a call using this solution it would mean that the cellular
phone first checks to see if it is within coverage of the base unit. If it does not have coverage it
initiates the call as a normal call for a cellular phone, i.e., using GSM, UMTS, etc. In the case
that the cellular phone does have coverage from a base unit, it passes the connection information
to the base unit, which in turn selects the most appropriate bearer, i.e., based on the connection
information given. The base unit then sets up the call between the cellular phone and the intended
recipient.

3.3.2 Handling Incoming Calls


When the base unit receives an incoming call, on one of the connected bearers, one of the
following things can happen: If the cellular phone has coverage by the base unit, the base unit
sets up the call with the specific cellular phone. If the cellular phone however would not be within
the coverage area of the base unit, the call could for instance be connected to the reception or
forwarded to, e.g., an answering machine.

3.4 Technical Requirements


In this section the technical requirements for the solution will be presented. There will also be a
more technically detailed presentation of the different components that are needed by the solution
and suggestions on how these components could be implemented.

6
3.4.1 The Cellular Phone
The main requirement for the cellular phone, in this solution, is that it has Bluetooth capa-
bilities. This is quite natural as Bluetooth is the bearer for all data traffic between the cellular
phone and the base unit. However, the exact Bluetooth requirements are not fixed. There are
some alternative ways to solve the actual data transfer over Bluetooth. One of these is to let the
cellular phone implement the Bluetooth profile normally used for headsets. This solution means
that the base unit can communicate with the cellular phone using the same standard as it were just
sending audio to an ordinary headset. This solution would however also require that the cellular
phone is able to communicate the connection information using one of the Bluetooth profiles for
data communication. The second alternative is to simply handle all communication, i.e., control
information and voice packets using normal data communication and not separating the two. Ex-
cept for the requirement already mentioned there will of course also be requirements for codec
support, coverage handling, etc.

3.4.2 The Base Unit


The base unit could almost be seen as a router between different bearers and communication
technologies. This means that the main purpose of the base unit is to redirect and repack the data
received. This further means that there are real-time requirements when handling these packets
if not to introduce unacceptable delays. The handling and repacking of voice data must also be
done without any noticeable loss of sound quality.
In order to make the base unit as flexible as possible, a modular design is suggested. This will
mean that the base unit could support new communication technologies just by adding a software
module. Figure 3.2 describes the module-based base unit.

Figure 3.2: Overview of the module-based base unit

Bluetooth Interface. This part of the base unit represents the communication interface towards
the cellular phone, and is used when receiving and sending data. This data could be both control
and voice packets.

Packet Handling. This layer is used to filter the incoming packages, which are received on
the Bluetooth interface, according to their type, i.e., control- and audio packets. These packets
are then forwarded to the appropriate module. The packet handling layer is also responsible
for repacking of the data received by the base unit to the correct Bluetooth packet type, before
forwarding these to the Bluetooth interface.

Communication Logic. This module is responsible for handling connection logic, i.e., the logic
needed for setting up and maintaining the connection between the incoming and outgoing inter-
face. This means that it is this module that handles the selection of which bearer to use and
manages the connection with the cellular phone. The choice of which bearer to use is based on
the connection information given. The intention is to make it possible to manually configure this
routing table.

Audio Transformation. This module handles the incoming Bluetooth voice packets and trans-
forms these into an intermediate format. When packets are received by the base unit, this module
transforms the intermediate format into voice packets for Bluetooth.

7
Bearer Packing. These modules are represented in figure 3.1 as "PSTN", "IP-telephony" and
"...". This type of modules are used to repack to and from the intermediate audio format to the
format expected by the specific bearer. This means that it is these modules that decide which
communication technologies and protocols that are supported. The intention is to make this mod-
ule layer easy to expand, and thereby introduce support for new technologies. It should also be
mentioned that care must be taken when choosing the intermediate format, in order to maintain
flexibility.

Bearer Interfaces. These modules are the physical interfaces needed by the software models
discussed in the previous section, this could, e.g., be hardware interface for PSTN, LAN, and
WLAN, etc. The hardware interfaces that are available also affect which communication tech-
nologies that can be supported.

8
Chapter 4

Investigating the Options

In order to understand the problem domain and the options, the first thing undertaken was a
series of interviews with people who have insight in the current phone architecture and the future
development of the cellular phones at SEMC. Interviews were a quite natural means of obtaining
initial knowledge about the capabilities offered by today’s phone architecture at SEMC, as we had
no previous personal knowledge about the internal architecture of their cellular phones. This lack
of previous knowledge means that the ideas presented so far in this report will be modified quite
a bit. However, it is our opinion that the initial idea presented previously may be of interest, as it
presents at least our visions about the project, and this was in fact what earned us the possibility
to conduct this master thesis at SEMC. This said, it should be pointed out that many of the ideas
presented in the initial proposal will be possible to implement using the technology we finally
decided to use. In the rest of this section the main focus will be on the options offered by the
SEMC architecture, i.e., which parts of the architecture that can be used in order to implement a
solution that fulfills the vision for this master thesis.

4.1 Interview Methodology


We had no previous knowledge of what was offered by the SEMC architecture at all, and
this influenced the way the interviews were conducted quite a bit. This fact made us decide
to use an iterative interview process to investigate the options offered by the architecture. This
means that the first interviews were conducted with SEMC personnel, whom had a fairly good
system overview, but did not posses detailed knowledge about all parts of the system. These initial
interviews gave us the needed initial knowledge of the architecture. After having gained initial
understanding of which parts of the architecture that could be of interest, the interviews entered
a new phase. As the architecture is quite complex, this phase more or less lasted throughout the
entire project. The interviews in this new phase had the goal to get in-depth knowledge about
different capabilities offered by the architecture. Because of this reason, the interviews were
conducted with different persons, depending on who would be most likely to have the needed
information. As some parts of the needed architecture is developed abroad, some interviews
were conducted using telephone conferences, or when people from the concerned sections were
visiting.

4.2 Interview Results


After conducting the initial interviews it became apparent that the solution for implementing
IP-telephony in SEMC’s cellular phones was to be closely connected to SEMC’s IP Multimedia
Subsystem (IMS) architecture. In fact, it became quite clear that this was the best, and maybe
only option, if we were to implement a working IP-telephony prototype within the time frame for
the master thesis project.
Another issue that was revealed during the other phases of the interviews was that there may
be complications handling IP-based connections over Bluetooth in an satisfactory manner, as this
is something that have not really been used extensively. According to the interviewee this should
not pose any major problems, as it probably should be rather straightforward to fix.

9
4.3 Investigating the Current Architecture
Even though the indications from the initial interviews were quite synonymous, i.e., IMS was
the way to go, we still decided to look into the phone architecture first hand. The reason for
doing so was two-folded, one reason was to investigate the options, and the other reason was to
familiarize ourselves with the phone architecture. This insight knowledge was also used to direct
the interview process and questions in its next phases.
This investigation proved to be quite valuable for two reasons. First and foremost we learned
how applications in a cellular phone is generally designed and implemented. This may seem
trivial, but the truth is that the internal architecture of a phone differs quite a bit from what is seen
as normal application development. In a Windows based environment, for instance, one does not
really need to care about process registration and process intercommunication in the same way as
in an embedded system.
The other reason was that we became certain that IMS really was the only option, i.e., with the
time frame in mind. This became clear as the architectural investigations found no good support
for redirecting and managing voice calls in a packet-switched manner. The reason for this was
that there simply was no design support in the current base architecture for manipulating, or even
getting hold of, audio streams in a satisfying manner. The investigations also showed that there
were no good enough native support for media protocols, which could be used for transporting
media data over IP-connections.
These facts meant that if we were to implement a solution with only the support found in the
current base architecture, we would have to first of all make modifications to the current architec-
ture, and secondly develop, or at least implement, a whole new protocol stack. As this would have
shifted the attention away from the initial goals, and would have taken too long to actually realize,
the focus from now on were to make further investigations of IMS and the capabilities offered by
the SEMC IMS architecture.

4.4 IP Multimedia Subsystem


IP Multimedia Subsystem (IMS) is a term used for merging Third Generation (3G) mobile
cellular networks with the Internet [2]. IMS is in fact one of the first steps away from the tra-
ditional circuit-switched domain. Although there have been data and Internet capabilities in the
circuit-switched networks, like PSTN and the mobile 2G networks, these networks are optimized
for handling voice transmissions, and only offer custom data capabilities by the use of a modem.
IMS, on the other hand, follows the current trend, and makes use of the packet-switched capa-
bilities in the third generation networks [2]. It should be noted that it is not the IMS that brings
packet-switched capabilities to the phone, as this is a feature of the third generation network. The
IMS is rather a term used for a system managing QoS, billing, and mobility aspects that is needed
in addition to the packet-switched capabilities of the third generation network, in order to make it
appealing to for both network operators and end users. In short, IMS is a system to make use of
the IP-protocol in a mobile network.

4.4.1 The SEMC IMS Architecture


IMS is a quite general term, but represents the transition towards an architecture that better
conforms to the capabilities needed in a packet-switched data network like the Internet. In this
section there will be a presentation of the capabilities offered by the SEMC implementation of the
IMS. Focus will be on the aspects of the SEMC IMS that will be of value for implementation of
IP-telephony. The presentation below is however only a summary of the capabilities offered. For
a complete overview we refer to appendix A, B, and C, which were constructed as an investigation
and pre-study of the internal capabilities offered by each relevant part of the IMS architecture.

Session Initiation Protocol. Along with the SEMC implementation of the IMS architecture,
there will be support for the Session Initiation Protocol (SIP) [3], which is a standard for initiating
and managing media sessions over an IP-network [3]. For more detailed information about SIP
please look at appendix A.

10
Session Description Protocol. In the SEMC IMS architecture there will also be support for the
Session Description Protocol (SDP) [4], which is used in combination with SIP. SDP is actually
carried in a SIP message, and is used to describe the actual media that is going to be used after that
the session has actually been established with the help of the SIP signaling. For more information
about SDP please look at appendix B.

Real-time Transport Protocol. The SEMC IMS architecture also facilitates the Real-time Trans-
port Protocol (RTP) [5], which is a protocol used to actually carry real-time data streams like audio
and video, over an IP network. RTP employs real-time capabilities by the use of timestamps and
sequence numbers, which are applied to the packet header. Parallel to every RTP session there
is also a Real-time Control session, which uses the Real-time Control Protocol [5]. The RTCP is
used for synchronization between sender and receiver, as well as handling other session specific
control information. For more detailed information about RTP and RTCP please look at appendix
C.

11
Chapter 5

Design of the VoIP Prototype

This chapter describes the design of the VoIP prototype that needs to be created. First there will
be a description of how the protocols investigated in the pre-study (appendix A, B, and C) can be
used in order to fulfill the goals for this project. After this there will be a detailed description of
the VoIP prototype and its relation to the SEMC architecture. In order to illuminate the design, a
set of scenarios showing the interaction between the different parts (VoIP UI, VoIP-server, IMS
SL, etc.) are described in the last section of this chapter.

5.1 Solution Design


As could be seen by the initial investigation, there were some architectural restrictions that
narrowed the options for implementing a working IP-telephony prototype within the given time
frame. As a result, the focus shifted towards making use of the capabilities offered to us by the
SEMC IMS architecture, i.e., SIP, SDP, and RTP. This said, it is however our opinion that the
capabilities offered by these protocols are really powerful and would be one of the best solutions
for the prototype implementation, even if there would have been other options to consider.
One of the goals of this project was to investigate and make use of the possibilities offered
by the SEMC architecture. The main option offered, is to use of the IMS architecture. The other
goals were to have a solution that was flexible and could easily be adopted to make use of new
communication technologies. The chosen solution should furthermore be able to use Bluetooth
as the communication interface. These are all capabilities offered by the initial idea, which is not
very strange as the initial idea proposal was constructed to really stress these capabilities.
In the remainder of this chapter there will be a presentation of the possibilities offered by
SIP, SDP and RTP, and how these protocols can be used to fulfill the goals of this project. The
solutions presented will be put in contrast to what was proposed by the initial idea. This is done
in order to show that a solution, which is based on the IMS capabilities, can really fulfill the goals
for this project, and to some extent even surpass the visions we had for this project.
When reading this chapter it is assumed that the reader is familiar with the capabilities offered
by SIP, SDP and the RTP protocols. The needed background information can be obtained by
reading appendix A, B, and C.

5.1.1 Maintaining Flexibility and Modularity using SIP


As will be shown in this section, it is fully possible to maintain the modularity concept presented
in the initial idea, by the use of SIP. In fact almost all aspects of the base unit presented in the initial
idea, can be constructed by the facilities provided by a normal SIP solution. The main difference
from the initial idea would be that instead of having one central base unit with many different
capabilities, there would in a SIP solution be a ŞvirtualŤ base unit, with the same capabilities,
but these would be distributed among the different servers found in a normal SIP network, i.e.,
registrar, proxy and gateway servers.
The registrar and proxy sever will handle user registrations and the passing of communication
logic or call signaling to and from the end users of the system. Gateways are used as a bridge
between different communication technologies. As a matter of fact there already exists well
tested and accepted bridging between PSTN and SIP by the use of PSTN-gateways. These are all
capabilities offered by the base unit presented in our initial idea.

12
In short, by using SIP, there will be the possibility to add new technologies by adding a new
type of gateway to the network. In fact, the SIP solution allows for the separation of the different
servers and gateways in a network, and thus there is much better load balancing, reliability and
flexibility than was actually the case with the initial idea.

5.1.2 Using SIP and SDP for Negotiating the Media Format
Instead of using a fixed intermediate format for communication between the user interface and
the base unit as described in the initial idea, and then translate this intermediate format into the
bearer specific media format and protocol, one could with a SIP/SDP solution simply skip this
translation, as SDP and SIP allows for communication and negotiation of which media format
and protocol to use. This is done by the parties of the call telling each other their capabilities
and matching these. This means that when communicating there is no need for intermediate
processing of the media format or protocol, as in the initial idea. This is of course only true if the
recipient is also connected to a technology capable of handling SIP and SDP. If, e.g., the recipient
is using PSTN, the actual SIP and SDP communication takes place between the user interface
(in this case a cellular phone) and the PSTN-gateway, and the gateway handles the conversion
between SIP/SDP and its negotiated format to and from the PSTN.

5.1.3 Bluetooth with IP Capabilities


SIP is an IP-based protocol. This means that in order to have direct communication between
the cellular phone and the recipient using SIP, there is also a need to have an IP-connection be-
tween the cellular phone and the rest of the network. This requirement makes it quite obvious
that the best protocol, or Bluetooth profile, to use would be one that allows for normal IP-based
communication over Bluetooth, i.e., a Bluetooth connection which tunnels IP-based traffic. In the
IMS-based solution we decided on using the Bluetooth Network Access Profile (NAP) in order to
provide the needed connection.

5.1.4 Overview of the SIP Solution


As can be seen in figure 5.1, the entity depicted as the base unit in the initial idea, is now
represented by several network connected servers. It should however be noted that the solution
still offers the same possibilities as the initial idea. There is for instance still the option to initiate
a call to and from different communication technologies, through the use of gateways. The fact
that communication bridging between technologies are done through the use of special purpose
gateways servers, actually have some benefits that did not exist in our initial idea. First and fore-
most there will be even greater flexibility for new technologies, as there is actually no requirement
to add a gateway for the new technology in one’s own domain (or base unit). That is, the only
requirement is that the service is offered by someone connected to the Internet, and that access to
this service is allowed. Another benefit is that this enables better load balancing than was offered
by the initial solution.
Figure 5.1 also shows the option to communicate with other SIP capable entities on the net-
work or Internet. This is done using normal SIP signaling (appendix A) between the caller and
the recipient. After the call session has been established, the session data is transmitted over a
peer-to-peer (P2P) connection. The exact protocol being used is negotiated using SIP and SDP.
The scenario is almost the same when a call is initiated to a different technology, e.g., PSTN.
The main difference is that the SIP signaling and P2P session establishment takes place between
the gateway and the caller, i.e., the SIP-enabled cellular phone. If one uses a solution that en-
ables PSTN entities to initiate the call, a similar thing happens, the gateway is informed of the
incoming call, and then initiates SIP signaling and session establishment towards the recipient.
In this case the SIP enabled cellular phone. The gateway then answers the PSTN call, and starts
the intermediate processing of the media data. The scenario is similar for communication with
other technologies; the difference simply resides in the translation and protocol capabilities of the
gateway.

13
Figure 5.1: Overview of the SIP solution

5.2 Prototype Design and IMS Relationship


The VoIP prototype is a client-server based solution, i.e., there is one application running as a
server, the VoIP-server. The client, or user interface, interacts with the VoIP-server to get infor-
mation about incoming calls as well as to initiate calls. It is the VoIP-server that in turn interacts
and uses the SIP capabilities offered by the IMS Service Layer (SL).
AAs time was limited and as the purpose was to create a prototype rather than a finished
product, the main focus was on the VoIP-server. This means that no great effort was taken to
implement a neat user interface. The VoIP-server, however, offers support for a client, and there
should thus be little work integrating a user interface at a later stage.
This section will describe the internal structure of the VoIP prototype. First, there will be a
description of the IMS architecture and how it generally interacts with its clients and vice versa.
This is needed in order to better understand the other design descriptions in this section

5.2.1 SEMC IMS Client Interaction


The SEMC IMS architecture parts that are of interest for the VoIP-solution can be split into
two categories: the IMS SL (service layer) and RTP. The IMS SL is the part of the underlying
architecture that supplies the VoIP server with support for handling SIP sessions. This means
that it is quite easy to make SIP requests like register and invite; the only thing needed is to set
the SIP-specific parameters and call the specific functionality in the IMS SL. In the same easy
manner, by implementing the IMS SL callback interfaces, the VoIP server will be notified by the
IMS SL when incoming SIP requests, like invites and byes, are received and will therefore be able
to act accordingly.
Not only does the IMS SL offer support for sending and receiving SIP requests and responses,
it also actually helps the overlying application, in this case the VoIP server, with setting up the
media session that has been offered by the SIP invites. This is achieved using the IMS SL specific
interfaces that an overlying application should, directly or indirectly, implement. Through the
different stages of an invite, or other request, the IMS SL calls the application-specific implemen-
tations of the IMS interfaces in order to handle the current operation.

14
5.2.2 IMS SL and the VoIP Server
The VoIP-server can be split up into two parts: the VoIPCore, which is the actual running
application, and the VoIPMediaHandler, which handles the media sessions. The VoIP server uses
the IMS SL for all SIP requests and responses. As said, the IMS SL also helps the overlying
application to setup the negotiated media session. In figure 5.2 can be seen that the VoIPCore
component uses the IMS SL to handle SIP requests. Incoming SIP requests are received by the
VoIPCore as events sent by the IMS SL.

Figure 5.2: Interaction between the VoIP Server and the SEMC IMS Architecture

Figure 5.2 also shows that the IMS SL uses the VoIPMediaHandler component. This is done
using the IMS specific interfaces implemented by the VoIPMediaHandler. The VoIPMediaHan-
dler’s responsibility is to set up the actual media sessions. This is done by using other parts of
the IMS architecture, mainly the RTP and CStreamingMedia. Once the connections between the
two peers have been established using RTP, it becomes the VoIPMediaHandler’s responsibility to
make sure that data is being recorded and sent as well as received and played.
The actual recording and playback of data is done by using the StreamingMedia component.
This is a component that allows for recording and playback to and from a memory buffer, which
is really a must for this solution. The StreamingMedia component also supports full duplex audio,
i.e., simultaneous recording and playback. This will however prove to not be completely true, but
more about this in the implementation chapter.

5.2.3 The VoIPCore Component


This component is the part of the VoIP-server solution that is the actual running server applica-
tion, i.e. it is this component that a user of the VoIP-server, i.e., a VoIP-client (GUI), uses to make
outgoing calls and to receive incoming calls. Therefore, a public interface called IVoIPCore was
created, which defines the functionality needed by a client, e.g., registering with a SIP registrar
or ending a VoIP-call. All of the methods defined by the IVoIPCore interface are asynchronous,
which means that in order for the client to know what happened with their request (function call) a
public callback interface is needed. Another fact of why the callback interface is needed is that the
VoIPCore component must notify the client when incoming calls are received. The VoIP callback
interface is further explained in section 5.2.5.
In short, the VoIPCore component notifies the client about the status of ongoing SIP transac-
tions. In order to be able to do this, it needs to implement callback interfaces offered by the IMS

15
SL. Figure 5.3 shows what interfaces the VoIPCore component implements and also some of its
methods.

Figure 5.3: The main functionality of the VoIPCore component

5.2.4 The VoIPMediaHandler Component


The VoIPMediaHandler component handles the media sessions. This means that it is respon-
sible for sending and receiving the voice data that is transmitted between the peers using the
Real-time Transport Protocol. To do this, the VoIPMediaHandler uses a utility component offered
by the SEMC IMS, called RTP.
Besides making sure that data is sent and received, the VoIPMediaHandler component also
has the responsibility of recording as well as playing this data. This is accomplished using the
StreamingMedia component, which is able to record as well as to play streaming media.
In order for the VoIPMediaHandler to be able to do all this, it first needs to be informed by
the VoIPCore that a new session is about to start. Therefore, the VoIPMediaHandler component
implements the IVoIPMedia interface. Using the functionality provided the IVoIPMedia interface,
the VoIPCore component can allocate (and deallocate when that is needed) resources that are
needed before the media session is started. The design of the VoIPMediaHandler interface along
with the functionality that should be offered by implementing the IVoIPMedia interface can be
seen in figure 5.4.

Figure 5.4: The main functionality of the VoIPMediaHandler Component

5.2.5 The VoIP Callback Interface


In order to notify a client using the VoIP-server about ongoing SIP requests, as well as about
incoming SIP requests, the client needs to implement the ICBVoIP interface. This is because of
the fact that the functionality that the VoIP-server offers the client is asynchronous. The need for
this is quite obvious, the client UI should not be locked while it is waiting for a specific function
to complete. Therefore, the results of such an operation are provided using a callback interface,
in this case ICBVoIP. The functionality that the ICBVoIP offers can be seen in figure 5.5.

Figure 5.5: The functionality that the VoIP Callback Interface provides

16
5.3 Scenarios
This section will show, with help of scenarios, how the VoIP-server interacts with the rest of
the system, and vice versa, in its most crucial parts. Each scenario contains a sequence diagram
and a descriptive text explaining the scenario.

5.3.1 Registering with a SIP Registrar


In order to send and receive invites (make a call and receive a call) it is necessary to first have
registered with a SIP server. Figure 5.6 is a sequence diagram of the register scenario.

1. When the register method in the VoIPCore component is called, it sets up the register pa-
rameters needed for a successful SIP registration.

2. After this setup has been complete, the register method is called, and upon a response from
the SIP server (or some other network error) a response code is received. The user of the
VoIPCore component is notified with a callback method.

Figure 5.6: The VoIPCore component uses the IMS SL to perform a SIP registration

5.3.2 Sending a SIP Invite Request


Having registered, it should be possible to send and receive invitations to media sessions via
SIP. This section describes what happens when a SIP invite is sent to another user that accepts the
invitation.

Figure 5.7: The VoIPCore component uses the IMS SL to initilize a SIP invite request

1. When the Invite method is called in the VoIPCore component, it sets up the invite parame-
ters needed for a SIP invite request.

2. The next thing it does is to request that the IMS SL sends the invite.

17
Figure 5.8: The IMS SL uses the VoIPMediaHandler component to create the SIP invite and to
setup the media streams

3. The IMS SL uses implemented functionality in the VoIPMediaHandler component to both


create the SDP part of the SIP invite (GetSupportedMedia), as well as to prepare the to-be
media session by creating and opening sockets (OpenMediaSockets).

4. After the invite has been sent and a response has been received from the remote end, the
IMS SL uses the VoIPMediaHandler to figure out which media sessions that matched (Com-
pareMedia). Using that information, the IMS SL closes the sockets that will not be used
(CloseMediaSockets), and completes the setup of the media session sockets (SetConnec-
tionInfo).

5. The IMS SL notifies the VoIPCore component about the status of the sent SIP invite request,
and the status is forwarded to the user of the VoIPCore component.

5.3.3 Starting the Media Session


After an invite has been sent (or received) and it has been accepted, all of the pre-conditions
are set (i.e,. the correct sockets for sending and receiving data have been set-up) to finally start
having a conversation. A normal phone call using either a cellular phone, a standard PSTN-
connect phone, or a VoIP-phone, are usually full-duplex, i.e., it is possible for both participants to
talk at the same time. Because of the current limitations in the architecture mentioned in chapter
6, we have been forced to half-duplex conversations, i.e., only one participant may talk at the
same time.

Figure 5.9: Preparing the VoIPMediaHandler for the actual media session

1. When an invite-process has been successfully completed, the VoIPCore component calls
the StartSession method in the VoIPMediaHandler in order to get it ready to either start
listening or talking.

2. The VoIPMediaHandler creates the necessary components in order to record and playback
audio.

18
5.3.4 Requesting to Talk
When the user wants to say something to the other participant, he must make a talk "request".

Figure 5.10: Interaction between the different components when requesting to talk

1. Once the request talk has been received by the VoIPMediaHandler component, it requests
an audio channel used for recording.

2. When the request has been approved (happens immediately unless some other part uses that
channel) and thus opened, the recorder is configured.

3. Once a successful configuration of the recording has been completed, a message represent-
ing a request-talk is sent to the remote end.

4. When an ack from the remote end is received, the recorder is started and the VoIPCore
component’s user is notified.

5. Every time there is new data available to send to the remote end, an RTP-packet is created
and sent. This happen frequently until a request talk is received from the remote end,
signaling that it is time to start listening instead (see the incoming request talk scenario).

5.3.5 Incoming Request Talk


This scenario will describe what happens when the remote end wants to talk, i.e., when an
incoming request-talk is received.

19
Figure 5.11: Interaction between the different components when a "request talk" is received

1. When a request-talk message is received the current recording is stopped (if there is a cur-
rent recording) and an audiochannel used for playback is requested.

2. Once the request has been approved (happens immediately unless some other part uses that
channel) and thus opened, the player is configured.

3. Upon a successful configuration of the playback has been completed, a message represent-
ing an ack is sent to the remote end.

4. When the first data packet (RTP) arrives, a buffer holding temporary RTP packets is created.
The data from the packet is unpacked and sent to the player for playback.

5. Every time that a new RTP packet is received it is put in the buffer holding the temporary
packets.

6. Whenever the player runs out of data, the next packet is retrieved from the buffer holding
the temporary packets, unpacked, and sent to the player.

5.3.6 Incoming SIP Invite Request


This scenario deals with what happens when an incoming SIP invite has been received by the
IMS SL.

20
Figure 5.12: The interaction between the IMS SL and the VoIP-server when a SIP invite is re-
ceived

1. When an incoming SIP invite request is received by the underlying architecture it notifies
the VoIPCore component, which in turn notifies its user.

2. Should the user accept the incoming invite, this is forwarded to the underlying architecture,
which sets up the media session sockets in a manner very much alike the one shown in
the Invite scenario above. Once this is completed, the StartSession method is called in the
VoIPMediaHandler (see Start media session above), and the VoIPCore component’s user is
notified with the results.

3. If the user chooses to reject the incoming SIP invite, this is merely forwarded to the un-
derlying architecture, which notifies the VoIPCore component when it is completed. This
result is forwarded to the user of the VoIPCore.

5.3.7 Sending a SIP Bye Request


Whenever the user feels that the conversation is over, he may terminate the media session.
There is also a possibility that the remote user terminates the conversation, but that is covered in
the next scenario (Incoming bye request).

Figure 5.13: The interaction between the VoIP-server and the IMS SL when sending a SIP bye

1. When the VoIPCore component receives a terminate request from its user, it simply for-
wards this request to the underlying architecture.

2. The IMS SL makes sure that all the media specific sockets are closed by calling imple-
mented functionality in the VoIPMediaHandler component.

3. When the termination is completed, the StopSession method in the VoIPMediaHandler is


called, which de-allocates resources, and the the VoIPCore componentŠs user is notified
about the finished termination.

21
5.3.8 Incoming Bye Request
This scenario describes what happens when a SIP bye request is received from the remote end.

Figure 5.14: The interaction between the VoIP-server and the IMS SL when a SIP bye is received

1. When an incoming SIP bye request destined for VoIPCore is received, the StopSession
method in the VoIPMediaHandler is called in order de-allocate resources and the VoIPCore
component’s user is notified.

2. The underlying IMS SL architecture makes sure that the media specific connections are
closed bye calling implemented functionality in the VoIPMediaHandler.

22
Chapter 6

Prototype Implementation

In this chapter there will be a brief presentation of what was implemented in order to make a
working prototype. The aim of this chapter is simply to give a slight insight on some of the more
important things that had to be implemented in order to make the prototype reality. Focus will
thus be on the most important aspects and issues that were encountered during the implementation
of the prototype.

6.1 Bluetooth Connectivity


One of the first things done, after having realized that the IMS would be one of the key factors
for making our VoIP prototype a reality, was to look into what requirements the IMS had on the
data connection it should use. This was especially important as one of the goals of the master
thesis was to see if Bluetooth would suffice as data carrier for the chosen solution.
The investigations of what was needed by the IMS, in order to use a certain data carrier,
soon revealed that it could handle any type of normal data accounts, like GSM/UMTS based
packet-switched and circuit-switched accounts. However, as was indicated during the investiga-
tive interviews, there proved to be no current support for handling and managing Bluetooth based
accounts. This obstacle was remedied by implementing a module that created Bluetooth based
accounts for every paired device in the vicinity that provide a service for network access. After
having managed to create the accounts, the focus shifted towards manipulating the "connection
manager", which is a module used for setting up the connection described by the data accounts.
When this module had been altered to support Bluetooth accounts, there were no longer any ob-
stacles for using a Bluetooth connection in the same way as any other connection. It was possible
for the IMS to use it as well as for any other service on the phone, e.g., the web browser.

6.2 The VoIP Prototype


As have been seen in the design chapter the actual VoIP solution is implemented as two major
blocks, i.e., the VoIPCore and the VoIPMediahandler. As could also be seen in the design, these
parts interact with the SEMC IMS in order to get things done. This section describes the issues
reveled during the implementation of these components.

6.2.1 Changes in the Underlying Architecture


One thing that influenced the implementation of the VoIPCore and VoIPMediaHandler, was the
fact that some parts of the needed architecture were undergoing some changes. This meant that
there were some uncertainties when work first started on the VoIP-solution. For the implemen-
tation of the VoIP-solution this meant that working code sometimes had to be discarded, as the
implementation became obsolete by an update of the underlying architecture. This fact meant that
development took longer time than would have been the case if all design aspects could have been
accounted for from the very start.

23
6.2.2 No Support for Full-duplex Audio
Another thing that was revealed during the implementation was that there was no actual support
for full-duplex audio in the base platform. This meant that it would only be possible to either
record or playback audio, but not both at the same time. The fact that this lack resided in the base
architecture of the platform meant that there were very little to do about it, as the base platform
is developed by a third-party company. As the goal for this thesis was to investigate and develop
a prototype to prove the possibilities for supporting new communication technologies with the
cellular phone as the interface, this was obviously a major drawback as it limits the scope to only
half-duplex solutions.
However, it was our opinion, after having implemented large parts of the VoIP-solution, that
when this lack in the architecture is removed there will be no problem handling full-duplex audio
conversations. In order to temporarily avoid the problem, and still be able to provide some form of
proof that a VoIP-solution with the cellular phone as the interface will still be possible, we shifted
towards a half-duplex solution.
We decided that the simplest way would be to pass an application specific token between
the recipient and the caller, using the RTCP-protocol. This solution was chosen as there is good
support for this kind of token passing through the use of RTCP. In fact, the RTCP-protocol already
provides the possibility to create and pass application specific data with different subtypes.
This token passing solves the problem by only letting the one with the token speak, while the
other party listens. The token passing is described in the design scenarios "Requesting to Talk"
and "Incoming Request Talk" in the previous chapter. However, it must be understood that these
scenarios are not part of the actual VoIP design, but were added as a workaround for the fact that
the platform does not handle full-duplex audio. It should also be noted that it is our belief that
when the audio problem is fixed, there will be little problem shifting from the half-duplex solution
to a real full-duplex VoIP solution. In fact, less work is needed for a full-duplex solution as no
token passing and state handling is needed, which is the case with the half-duplex workaround.

24
Chapter 7

Evaluation of the Prototype

In this chapter we will look back at the initial research goals and see what was actually con-
cluded. During the evaluation of the possibilities for IP-telephony in SEMC cellular phones, we
have also come across a topic that we feel might need further investigation. This topic will also
be presented below.

7.1 Answers to the Research Questions


In this section there will be a presentation of the answers to the research questions. As will
be seen most of the results have been tested and answered with the help of the VoIP-prototype
developed, while others questions have been logically derived, i.e., from the capabilities offered
by SIP and related concepts already available on the market.

7.1.1 Reasonable Response Times


Question: Will Bluetooth be able to handle the communication between the cellular phone and
the base unit in accordance to what is seen as "normal" response times and quality in traditional
telephony?

Answer: With help of the prototype the Bluetooth connection has been empirically to see
whether it provides acceptable latency for voice communication. However as have been stated
before our current prototype only operates with half-duplex audio, i.e., audio in only one direc-
tion at a time. This means that the studies do not actually test if Bluetooth is able to handle real
VoIP communication. To provide a likely answer to the question we refer to what is normally seen
as acceptable latencies when dealing with real-time voice communication. The general opinion is
that latencies below 400 ms will be acceptable for the parties of a conversation, however latency
below 150 ms is recommended [6].
The Bluetooth connection has been empirically tested in respect to this criterion. This was
done by measuring the one way latency between the cellular phone and the PC providing the
IP-connection, i.e., the latency imposed on the data when traveling over the actual Bluetooth
connection.
The results presented below are from tests with two different packet-sizes. These were chosen
in respect to the real sizes of the data packages traversing the link during normal conversation,
i.e., best and worst quality when using the codec in question.
The test was conducted in a normal open office environment, over distances of 1-12 meters.
The fact that the test was conducted in an office environment means that the results should not
be interpreted as a true test of Bluetooth, but rather as an indication of the capabilities offered
to the solution in question, i.e., the results may be influenced by sources of disturbance in the
surrounding environment, such as wireless LAN, performance variations of the cellular phone
and/or PC, etc.

25
Packet Size (bytes) Distance (m) Average (ms) Max. (ms) Min. (ms)
194 1 36 65 17
424 1 40 61 35
Diff. - -4 4 -18
194 4 34 61 19
424 4 40 63 32
Diff. - -6 -2 -13
194 8 33 62 18
424 8 39 62 24
Diff. - -6 0 -6
194 12 37 75 22
424 12 44 72 32
Diff. - -7 3 -10

Table 7.1: Bluetooth latency measurement results

As can be concluded from table 7.1, there seems to be little impact when examining factors
like packet size and distance. This is however just true up to a certain point, but what is actually
showed is that the Bluetooth connection in itself should be able to handle voice communication,
even for the larger packets, in a "normal" open office environment.

7.1.2 Possible to Implement IP-Telephony


Question: Is it possible to integrate IP-telephony support into a cellular phone based on the
Sony Ericsson architecture?
Is it possible to use any pre-existing techniques from the Sony Ericsson mobile phone architecture
in order to ease the implementation?

Answer: As have been described in this report there is evolving support for implementing so-
lutions like VoIP in the SEMC architecture. This is mainly because of the features of the SEMC
IMS. However, it is at today’s date not possible to implement a fully working VoIP-solution, due
to the lack of support for full-duplex audio in the base architecture, i.e., the architecture on which
the current phones are built. This is however just a temporary problem and as soon as it have been
remedied there will be little work to actually convert the current solution to actually work as a true
full-duplex audio VoIP-prototype.

7.1.3 Support for New Communication Technologies


Question: Is it possible to integrate support for more communication technologies based on the
selected communication protocols and the Sony Ericsson mobile phone architecture?

Answer: Regarding the support for new communication technologies, we have already seen that
it is possible to support new technologies through the use of media gateways. Even if the support
is not directly part of the SEMC architecture, it is however a support that comes from the fact that
the SEMC architecture supports the Session Initiation Protocol (SIP). The fact that SIP supports
new communication technologies through the use of gateways, means that the support is separated
from the internal architecture, and this leads to some nice features like extended flexibility, load
balancing etc.

7.2 Suggestions for Further Research


During the investigation of the possibilities for a VoIP-solution based on the SEMC architec-
ture, we have come across a topic that we feel might need further research and attention. As
we have seen the transition from traditional telephony (Mobile or ordinary PSTN), towards a
packet-switched technology is quite a big step. One fact that makes IP-telephony even more in-
teresting is that packet-switched communication is supported by the Third Generation Networks
(3G). However as the focus shifts more and more towards VoIP communication transported over

26
the Internet, the main requirement becomes that each entity or cellular phone maintains a con-
stant IP-connection. This in turn leads to that the cellular phone will potentially be as exposed
to malicious attacks as every other entity connected to the Internet. Adding to our concerns, we
believe that parts of the SEMC architecture and base architecture were not designed with focus on
handling potentially unsafe data. It is our opinion that there needs to be further resources devoted
to investigating the potential gaps in the design and implementation of the IMS and base architec-
ture, in order to make sure that all unsafe data communication is treated as if it were potentially
harmful.

27
Chapter 8

Discussion and Related Work

When conducting this master thesis we have naturally had a general interest in what is happen-
ing in different areas related to VoIP. What has been noticed is that the general interest in VoIP has
more or less exploded during the last years (2004-2005). The fact that the general public becomes
more and more interested in using this new technology means that there is also an increased focus
on the strengths and weaknesses of VoIP. In this chapter there will be a short presentation of the
issues that we have found the most interesting. The topics have been selected in order to address
aspects that are important to both developers and the general public.

8.1 Network Address Translation


One major problem that one faces when dealing with SIP based VoIP are NAT situations. NAT
is short for Network Address Translation, and is a method used for mapping IP addresses with
different address scopes [7]. This often means translating between a public globally recognized
IP-address and an IP-address residing within a private network. One of the main reasons for em-
ploying NAT is the fact that there actually is a shortage of public global IP-addresses on Internet.
This has lead to the creation of private network with own private IP-addresses where NAT is used
to allow computers on the private networks to access Internet [8]. To make things worse there
exist different types of NATs [9], this means that there can be different behavioral patterns from
one situation to another.
The general idea, although simplified, is that when an entity on the private network whishes to
exchange information with an entity residing on another network, the outgoing request is passed
through the NAT. The NAT then creates a session for the outgoing request. It also changes the
address and port field in the outgoing request from the internal IP address and source port of the
initiating entity, to the public address of the NAT entity, the port will also be redefined by the
NAT. This means that the recipient will perceive the incoming request as if originated from the
NAT entity. The answer to the incoming request will therefore be sent to the address of NAT entity,
with the destination port that was previously set by the NAT in the outgoing request. When the
NAT entity receives the answer on the specified port from the address of destination it remembers
that it assigned that session (port) to a request belonging to a certain entity on the private network
and can thus forward the answer to the intended recipient. For more extensive information on
NAT see [10].

8.1.1 VoIP in NAT Situations


The NAT issue is quite troublesome when it comes to VoIP, if one looks at the general problem
NAT imposes on a SIP/SDP based VoIP-solutions, like the one described in this paper. The
problem is that IP and port information are not only stored in the IP header of the packet, but also
in the message body of both SIP and SDP. This means that addressing information is embedded
in the application layer, and thus a normal NAT would be unaware of this. This leads to the
fact that a message retrieved from a VoIP-client residing behind a NAT would contain addressing
information belonging to the private domain of that client. When the recipient tries to reply it will
fail as the address it is trying to reach does not exist in its addressing realm.

28
8.1.2 Avoiding the NAT Problem
There exist different types of NATs. This is one of the reasons that makes the NAT issue, or
rather issues, even harder to solve, as a solution which is fully functional in one situation might
be inadequate in another. Because of this reason there exist different types of solutions, all with
their own benefits and drawbacks. Some solutions can handle every type of NAT, but this comes
at the expense of complexity.
Common solutions for handling the VoIP NAT problem are application layer aware firewalls
and NATs, MIDCOM, TURN and STUN [8].
MIDCOM is an architecture for controlling and modifying firewalls and NATs from a trusted
MIDCOM agent [8]. This of course means that this must be supported by the firewalls and NATs.
In a MIDCOM architecture these entities are referred to as middleboxes. In short a SIP client
residing inside a NAT, should also implement an MIDCOM agent. This agent should thus be
allowed to modify the settings and port forwarding of the NAT (middlebox), i.e., providing it is
trusted by the middlebox [11]. This means that the actual address information written in the SIP
and SDP messages are provided by the MIDCOM agent, and should thus be valid.
STUN and TURN approaches the problem in a slightly different manner. Instead of actually
trying to control the NAT, they try to use the properties of NAT to avoid the problem. The general
idea is that there is a STUN or TURN server residing on the public network. The SIP client then
exchanges information with this server in order to find out which public IP and port it should write
in the SIP and SDP. The exact configuration and information exchange between the SIP client and
STUN or TURN server varies dependent on the type of NAT being used.
Which solution to use depend much on which scale one is operating in, and which NAT
situation one is trying to solve. For more information about the NAT issue and proposed solutions
please see [12].

8.2 VoIP Security


As VoIP solutions are becoming more widely spread, the concern for security has also become
more apparent. The main concern is eavesdropping, which means that someone might listen to
your call [13]. In VoIP, this listening might be more that just listening to the actual voice data,
it could also mean that someone is picking up on the metadata being sent in order to actually
set up the call. This data can then be used for denial of service attacks, unwanted advertising,
and hijacking of services as it contains system specific information about ports and capabilities
[13, 14].
To solve these issues, there is a need to secure both the signaling protocol and the media
session, i.e., both the call setup and the actual sending of voice data [14]. To solve these issues
in a solution like the one in this report, which is based on the session initiation protocol (SIP),
one could use either end-to-end security or hop-by-hop security [14]. End-to-end point security
is achieved using SIP features specifically developed for the purpose of establishing a secure
connection between the caller and the callee. SIP does, however, not provide any mechanisms for
supporting hop-by-hop security, i.e., providing security between different SIP entities that take
part in the call signaling [14]. Security between hops is instead handled by the use of IPsec (IP
Security) or TLS (Transport Layer Security) [14]. The actual need for hop security arises from
the fact that intermediate SIP entities, like SIP-proxies, might need to read or write information
to or from the SIP message [14]. As security between different hops along the signaling path
is handled separately between entities, the entire messages can be encrypted and secured. This
means that information, like the via, from, and to headers, will not be visible to outside parties, as
is the case with end-to-end security. This means that it will not be possible for outside parties to
figure out who is calling who and through which servers [14]. It should however be noted that the
usage of IPsec and TLS has limitations. IPSec can only work between SIP entities which have a
static relation, i.e., IPSec enables a secure connection between known entities, while TLS has the
limitation that it does not work with UDP [14]. In order to secure an entire call, one might need to
combine different techniques, end-to-end or hop-by-hop security for the call signaling and when
exchanging information over the secure connection established, and then use for example SRTP
(Secure Real-time Transport protocol) for the actual media [15]. One should also be aware of the
fact that security impacts the performance aspects of the call session [14].

29
8.3 Public Safety
An issue that has gained more attention as VoIP solutions has become more widely used is the
public safety issue. This issue arises when using VoIP in emergency situations. The basis for
being able to use VoIP at all in an emergency situation is that the VoIP service provider offers
some form of emergency handling. This handling could be more or less advanced, i.e., the service
provider could offer a special emergency solution or just put you through to the emergency service
by using PSTN bridging. The problem with forwarding emergency calls using PSTN bridging is
that this may actual confuse, as the number being received by the emergency services is the phone
number of the PSTN gateway. This is troublesome as emergency services use the caller’s phone
number to find out the geographical position of the caller. This is something that works when
a call really originates from a real PSTN phone, but in the case of the call originating from a
VoIP phone, this may lead to that the emergency response is directed to the address of the PSTN
gateway and not to the actual caller’s address [16].
Another cause for concern, when it comes to using VoIP for placing critical calls, is the fact
that one can not expect the same quality of service from VoIP as from PSTN, as Internet, over
which the call is placed, is a best effort network. Even though the quality of service, when it
comes to VoIP, is getting better all the time, it is something that must be considered. As VoIP
uses normal computer networks for placing calls there is also an extended risk for not being able
to place an emergency call in case of power outs [17]. To understand the severity of a power out
one could just look at a normal power out situation. In case of power failure in a normal family
home, all computer communication will not work as the computer based home network relies on
power to function properly, i.e., all equipment, like cable modems, routers, and VoIP boxes, need
an external power supply in order to function.
As VoIP becomes more widely used, the public safety issue has also received more focus, and
different solutions have been proposed in order to handle the safety issues. The proposed solutions
vary in complexity; everything from manually entering your location when signing on the VoIP
network [17], to solutions like direct truncating, where emergency calls are automatically routed
to public safety answering points [18], have been proposed.
It is however our opinion that the solution presented in this report, where the actual VoIP
client is implemented in a cellular phone, offers some good solutions to these issues, as it enables
the option to route all emergency calls through the cellular network instead of relying on the
capabilities offered by the VoIP network. Although there are opinions that one should not rely on
other services to provide emergency handling, as this will slow down the development of VoIP, we
still believe that a solution like the one presented in this report will serve a purpose until the VoIP
emergency handling have matured and there have been a standard developed for public safety
using VoIP.

8.4 Related Solutions


In this section there will be a presentation of products and solutions that have capabilities related
to the solution presented in this report. The goal is to provide a simple comparison between the
capabilities offered by the solution presented in this report and other solutions.
When work first started on this master thesis project, the products on the VoIP market were
quite immature, at least this was our opinion, and this was basically what made us interested in
the subject for this project.
Most solutions revolved around software phones for different platforms or different USB
based solutions for handling the voice. The USB solutions were often based on handset that con-
nects to a computer, which in turn runs a soft phone application. The handsets, in such solutions,
are more or less only used as a microphone and loudspeaker [19, 20].
Other solutions that were common, and still are quite common, are solutions that use ATA
boxes. These boxes are used to translate from the VoIP protocol (like SIP), to standard PSTN
signaling, which enables the use of standard PSTN telephones, wired or wireless [21]. Although
these solutions are still used, the general trend is towards using WIFI enabled Pocket PC’s [22].],
or even more, solutions which combine VoIP capabilities with the capabilities of cellular phones
[23], much like the solution in this report. Most of these products have however not yet hit the
market.

30
The reason for the transition towards solutions like the one presented in this report is, accord-
ing to us, that such solutions offer much greater flexibility and usability than was achieved by
the earlier solutions. Earlier solutions had the intention to make use of VoIP for handling nor-
mal PSTN communication. This means that most of the earlier solutions offer capabilities for
accepting incoming calls and in some case also possibilities to initiate call to others by simply
entering the PSTN phone number on the handset. This is fine in the perspective as an replacement
for PSTN, but such solutions are not intended to handle outgoing calls to pure VoIP entities, i.e.,
recipients using soft phones and SIP accounts. These simpler solutions will also be excluded from
other achievements within the VoIP area, such as combinations of voice and video, as the handsets
will not have the needed capabilities.
When implementing VoIP support in a cellular phone, one solves many of these issues. The
solution will be easy to use; users can call pure VoIP as well as traditional PSTN numbers by
simply choosing the proper account, for the intended recipient i.e., SIP-URI or PSTN number.
The fact that the VoIP capabilities reside in a cellular phone also means that the solution is highly
portable, and as WIFI becomes more common in cellular phones this means that one will be able
to use VoIP everywhere there is a hotspot. In situations where one, for one reason or the other,
can not use VoIP communication, one will not be left stranded as there is still the possibility to
use the cellular phone in a traditional way.
In the longer run there will probably also be the possibility to use VoIP over cellular networks,
like UMTS. When this option becomes really interesting, will, in our opinion, depend much on
the development of the cellular networks and the cost for using them.

31
Chapter 9

Conclusions

In this report there has been a presentation of the investigative work on the possibilities for
implementing VoIP in the Sony Ericsson cellular phones, using the SEMC architecture. The
possibilities to support such a solution over Bluetooth have also been investigated.
The investigations in this report have shown that there is partial support for VoIP in the SEMC
architecture. In order to have full VoIP-support, the issue of the base architecture only handling
half-duplex audio must be addressed. It has also been concluded that the best option for im-
plementing a VoIP-solution, in a Sony Ericsson cellular phone, is to use the Session Initiation
Protocol (SIP) for call signalling and the Real-time Transport Protocol (RTP) for media stream-
ing. The SIP and RTP protocols are supported through the use of the SEMC IMS architecture. It
has also been concluded that a SIP and RTP based solution could support other communication
technologies like PSTN, through the use of gateways.
The VoIP-support in the SEMC architecture was empirically tested by implementing a proto-
type. Measurements performed on this prototype show that Bluetooth will fulfil the requirements
for most VoIP-solutions, i.e., in respect to latency and bandwidth.

32
Acknowledgements

First of all we would like to thank everyone at UMTS and GSM Services at Sony Ericsson
Mobile Communications AB in Lund, Sweden. Everyone have been very understanding, helpful,
and willing to spend time answering questions regarding the SEMC mobile phone architecture
and the development environment.

We would like to give special thanks to Anna Göransson, Gary Cole, Håkan Grahn, Mikael
Kanstrup, Pär Olsson, Suri Maddhula, and Tobias Åkesson, as they have been particularly helpful.

33
Bibliography

[1] Gonzalo Camarillo. SIP Demystified. McGraw-Hill Inc., 2002.


[2] Gonzalo Camarillo and Miguel A. Garcia-Martin. The 3G IP Multimedia Subsystem. John
Wiley & Sons Ltd., 2004.
[3] J. Rosenberg, H. Schulzrinne, G. Camarillo, A. Johnston, J. Peterson, R. Sparks, M. Handley,
and E. Schooler. SIP: Session Initiation Protocol. RFC 3261 (Proposed Standard), June
2002. Updated by RFCs 3265, 3853.
[4] M. Handley and V. Jacobson. SDP: Session Description Protocol. RFC 2327 (Proposed
Standard), April 1998. Updated by RFC 3266.
[5] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson. RTP: A Transport Protocol for
Real-Time Applications. RFC 3550 (Standard), July 2003.
[6] Recommendation ITU-T G.114 – One-Way Transmission Time. International Telecommu-
nication Union, 1996.
[7] P. Srisuresh and M. Holdrege. IP Network Address Translator (NAT) Terminology and
Considerations. RFC 2663 (Informational), August 1999.
[8] Michael Stukas and Douglas C. Sicker. An Evaluation of VoIP Traversal of Firewalls and
NATs within an Enterprise Environment. Information Systems Frontiers, 6(3):219–228,
2004.
[9] V. Paulsamy and S. Chatterjee. Network convergence and the NAT/Firewall problems. Pro-
ceedings of the 36th Annual Hawaii International Conference on System Sciences, 2003.
[10] P. Srisuresh and K. Egevang. Traditional IP Network Address Translator (Traditional NAT).
RFC 3022 (Informational), January 2001.
[11] Y.Itoh and Y. Fukuda. A study on the applicability of MIDCOM method and a solution to
its topology discovery problem. The 9th Asia-Pacific Conference on Communications, 2003,
3:1133–1137, 2003.
[12] G. Camarillo J. Rosenberg. NAT and Firewall Scenarios and Solutions for SIP. Internet
Draft, Internet Engineering Task Force, 2003.
[13] Johna Till Johnson. VoIP security concerns cannot be ignored. Network World, 22(31):28,
2005.
[14] Stefano Salsano, Luca Veltri, and Donald Papalilo. SIP Security Issues: The SIP Authenti-
cation Procedure and its Processing Load. IEEE Network, 16(6):38–45, 2002.
[15] K. Ono and S. Tachimoto. SIP signaling security for end-to-end communication. The 9th
Asia-Pacific Conference on Communications, 3:1042–1046, 2003.
[16] Colleen Boothby. Liability Issues In A VOIP Environment. Business Communications
Review, 35(2):43–45, 2005.
[17] Anna Henry. VoIP AND E-911: IS HELP ON THE WAY? Rural Telecommunications,
24(1):14–19, 2005.
[18] James Q. Crowe. A wake-up call for VoIP. Telephony, 246(12):34–35, 2005.

34
[19] Jenny Levine. Product Pipeline. Library Journal, 130:22–24, 2005.

[20] Wayne Rash. Two IP Phones Worth Picking Up. InfoWorld, 26(4):26, 2004.

[21] Vince Vittore. VoIP-enable CPE market fills with new product entries. Telephony,
245(24):17–18, 2004.

[22] John R. Quain and Marc Silver. Phones that love Wi-Fi. U.S. News and World Report,
137(9):75, 2004.

[23] Bob Brewin. Mobile Phones Move Toward Combined Calling Capabilities. Computerworld,
38(13):6, 2004.

[24] Alan B. Johnston. SIP: The Session Initiation Protocol. Artech House Inc., 2001.

[25] R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, and T. Berners-Lee.


Hypertext Transfer Protocol – HTTP/1.1. RFC 2616 (Draft Standard), June 1999. Updated
by RFC 2817.

[26] J. Klensin. Simple Mail Transfer Protocol. RFC 2821 (Proposed Standard), April 2001.

35
Appendix A

The Session Initiation Protocol

A.1 Introduction to SIP


The Session Initiation Protocol (SIP) is a signaling protocol that is used to control, i.e. to
establish, modify and tear down, multimedia sessions over IP networks [2, 24]. SIP can be used
to set up practically any type of session, e.g. audio calls, video conferences, games, etc [1].
The SIP protocol is thus used to send invitations to multimedia sessions, to modify these
sessions and ultimately to tear them down. To be able to describe the multimedia session that is to
be set-up, there is a need for a description of this session. One of the advantages of SIP is that it
has been designed to be independent of the protocol that is used to describe the session, and hence
also the actual multimedia session [1]. The most common protocol used to describe multimedia
sessions is the Session Description Protocol (SDP) [1, 2, 24]. SDP will be is described in more
detail in appendix B.
SIP is designed based on the two most commonly-used and popular protocols, namely HTTP
(Hypertext Transfer Protocol, see [25]) and SMTP (Simple Mail Transfer Protocol, see [26])
[2, 24] The design taken from HTTP is the request/response, i.e. client-server, approach [24].
From SMTP the headers, e.g. To, From, and Subject, were re-used [24].

A.2 The Architecture of a SIP Network


There exist different entities in a SIP network, shown in figure A.1. It is important to understand
that entities are mere roles, and one physical SIP server may play one or more of these roles, i.e.
one SIP server may act as both a registrar and a proxy.

Figure A.1: The different entities in a SIP network

The purpose of this section is to give a description on each one of these entities as well as an
understanding of how these entities are connected and used by each other.

36
A.2.1 User Agents
A user agent (UA) is an entity that a user interacts with [1]. It may be an application in a
computer or in a cellular phone, a telephone dedicated to internet telephony, etc.

A.2.2 Registrars
A registrar is a SIP server that accepts registration requests, i.e. a server that a user tells where
they can be reached [1].

A.2.3 Location Servers


When a user registers with a registrar, the registrar usually stores this information in a location
server. A location server is actually not a SIP entity per se, but it plays an important role in the
SIP architecture. Communication with a location server is not done through the use of SIP and
that is actually why it is not a SIP entity [1]. This means that the whereabouts of a user can be
stored in e.g. a database, and updates of the information and to get a hold this information can be
performed using SQL.
Although the location server is not a SIP entity, it is depicted as such in figures A.1 to A.4
since it is an important part of a SIP network.

Figure A.2: The interation between a SIP registrar and a location server when user A registers

1. User A registers and says that he will be reachable at A@computer.domain.com.

2. The registrar updates the location of User A in the location server.

A.2.4 Redirect Servers


The idea behind redirect servers is to give user agents information about alternative locations
that a user can be reachable at [1]. Figure A.3 shows an example when User A tries to call User
B.

Figure A.3: The interaction between the SIP entities when user A sends a SIP invite to user B,
when redirect servers are used

1. User A wants to invite User B to a multimedia session and tries to reach user B at his public
address, which is B@domain.com. At domain.com there is a redirect server that handles
incoming calls

37
2. The redirect server at domain.com asks its location server where User B is.

3. The location server reports that User B is at B@domain2.com.

4. The redirect server tells User A that User B can be reached at B@domain2.com

5. User A then sends the invitation to B@domain2.com. Domain2.com also has a redirect
server that handles incoming calls.

6. The redirect server at domain2.com asks its location server where User B is.

7. The location server reports that User B is at B@computer.domain2.com.

8. The redirect server tells User A that User B can be reached at B@computer.domain2.com.

9. User A then sends the invitation to B@computer.domain2.com.

10. User B sends a response to User A.

A.2.5 Proxy Servers


A proxy server pretty much has the same functionality as a redirect server. A domain will use
either a redirect server or a proxy server. The difference between the two is that the proxy server
does not answer the UA about where a specific user is, but rather forwards the request to the
domain that the specific user is currently at [1]. Figure A.4 shows an example when User A tries
to call User B.

Figure A.4: The interaction between the SIP entities when user A sends a SIP invite to user B,
when proxy servers are used

1. User A wants to invite User B to a multimedia session and tries to reach User B at his public
address, which is B@domain.com. At domain.com there is a proxy server that handles the
incoming request.

2. The proxy server at domain.com asks its location server where User B is.

3. The location server reports that User B is at B@domain2.com.

4. The proxy server at domain.com forwards the invitation from User A to B@domain2.com.
Domain2.com also has a proxy server that handles the incoming request.

5. The proxy server at domain2.com asks its location server where User B is.

6. The location server reports that User B is at B@computer.domain2.com.

7. The proxy server at domain2.com forwards the invitation from User A to B@computer.domain2.com

8. User B sends a response to User A

38
A proxy can try more than one location for a user. This is called forking and can be done
either parallel or sequential depending on how the proxy is configured [1]. Parallel means that
the proxy tries to reach all of the locations at the very same time whereas sequential means that it
tries one after another.
Parallel forking implies that many UAs retrieve a SIP invitation at once, e.g. more than one
phone rings when someone is calling. Sequential forking, on the other hand, can be seen as some
kind of forwarding. If the user is not available (does not answer) at one location, the next location
is tried, until there are no locations left or until the user answers.

A.3 Signaling in SIP


As mentioned in section A.1 SIP has borrowed its signaling approach (request/response) from
HTTP. SIP therefore has a client/server approach. When a UA sends a request to another UA, it
acts as user agent client (UAC), whereas the receiver and thus the UA sending a response acts as
a user agent server (UAS) [1].
This chapter delves deeper into SIP requests and responses. Although it may feel unnatural to
start out with describing responses there is a meaning behind it. In order to give examples in form
of scenarios for the different requests it is necessary to first understand the different responses that
can be sent.

A.3.1 Responses
When a UAS receives a request it will send out one or more responses. The meaning of the
response is to give the UAC information about the status of the transaction. This results in that
there are a vast amount of responses that can be sent.
A response contains both a status code and a reason phrase. The status code is an integer
between 100 and 699, whereas the reason phrase is a humanly-readable translation of that status
code. The responses are divided into six different classes, as seen in table A.1. A list of responses
that the authors have deemed as the most commonly used, and thus the most interesting, can be
found in table A.2. A complete list of the responses defined in the SIP core can be found in the
SIP rfc [3].

Range Response class Description


100-199 Provisional For information purpose. These indicate that the server is performing
some action and does not have a definitive response yet. A server should
send such a response if it expect it to take more than 200 ms to send a
final response.The responses in this class are not transmitted reliably, i.e.
the client will not send an ACK upon receiving these responses.
200-299 Success The action was successfully received, understood and accepted.
300-399 Redirection Further actions needs to be taken in order to complete the request.
400-499 Client error The request contains bad syntax or cannot be fulfilled by this server.
500-599 Server error The server failed to fulfill an apparently valid request.
600-699 Global failure The request cannot be fulfilled at any server.

Table A.1: The Different Response Classes

39
Status code Reason phrase Comment
100 Trying -
180 Ringing -
181 Call is being forwarded -
182 Queued -
200 OK -
301 Moved permanently -
302 Moved temporarily -
305 Use proxy -
400 Bad request -
401 Unauthorized Used only by registrars, proxys should use 407
404 Not found User not found
405 Method not allowed -
406 Not acceptable -
407 Proxy authentication required -
408 Request timeout Could not find user in time
415 Unsupported media type -
480 Temporarily unavailable -
486 Busy here -
487 Request terminated -
491 Request pending -
502 Bad gateway -
505 Version not supported -

Table A.2: The most common SIP responses

A.3.2 Requests
A specific request is referred to as a method. The SIP core specification defines six types
of methods. There are, however, extensions to SIP that define additional requests. The request
method is denoted in a specific field in SIP [1]. SIP requests may also contain a body, which is
the packet’s payload. This payload is usually a session description [1].

Invite. As the name implies, the invite request invite users to a session [1]. The payload in this
request contains a session description, and can e.g. be describing an audio session.

Ack. An ack request is a final response to an invite request, i.e. the UAC that sent the invite
request will send an ack after it has received the final response [1]. Figure A.5 shows an example
of an invite-response-ack.

Figure A.5: Example of a SIP invite

Cancel. A cancel request will cancel any pending transaction if the server processing the request
has not sent a final response. In that case the cancel request will be ignored [1].

40
Figure A.6: Example of a cancelled SIP invite

Figure A.6 shows an example of User A sending an invite request to User B. The request first
passes a forking proxy which sends out the invites in parallel to two locations that User B can be
reached at. User B answers at domain2.com and sends its final response (200 OK) to the proxy
which forwards it to User A. User A then responds with an ack. When the proxy receives the final
response from User B at domain2.com, it wants to cancel all other invitations, so it sends a cancel
request to User B at domain.com, which first sends a response for the invite (200 OK) and then a
response for the cancel (487 Cancelled). The proxy then sends an ack for the latter response.
The example described above shows the use of the cancel request. It should be mentioned
that the proxy could have sent a cancel request to all of the peers it sent an invite to, as the cancel
request will not affect an ongoing transaction [1].

Bye. When a person wants to leave a multimedia session, he sends a bye request (see figure
A.7). If the session is only consisting of two persons, it means that the session itself will be
terminated. If there are more persons involved in the session, it merely means that the user leaves
the session [1].

Figure A.7: Example of a SIP bye

Register. A user sends a register request when it wants to update his current location at a server,
i.e. the user tells a server where he wants to be reached. It is possible to add multiple locations to
be reached at. It is also possible to inform the server how long the registration should last. If User
A wants to be reached at domain2.com until one o’clock he can register that, and at one o’clock
that registration is no longer valid [1].

Options. The option request is used to ask a server about its capabilities, i.e. what session
description protocols that it understands, what requests it can answer to, which encodings it un-
derstands [1].

41
A.4 SIP Message Format
SIP is a text-based protocol. The format of the message depends on if it is a request or a
response. These are however quite similar and consist of a request-list (request) or a status-line
(response), one or more header fields, an empty line (carriage return, line feed), and finally an
optional body [3].

A.4.1 Request Line


A request line is only a part of a SIP request and it consists of three parts (separated by a single
space character): the method, Request-URI, and protocol version. The first part, the method,
represents what request method (invite, bye, cancel, etc). The Request-URI is the next hop that
the message should be routed to. The protocol version obviously says what protocol and what
version the message contain [3].
A request line could look like this: "INVITE sip:userA@domain.com SIP/2.0", where "IN-
VITE" is the method, "sip:userA@domain.com" is the Request-URI, and "SIP/2.0" is the protocol
and version. This is thus an invite request that will be sent to the SIP address userA@domain.com
and the protocol and version is SIP 2.0.

A.4.2 Status Line


A status line is only a part of a SIP response and it consists of three parts: protocol version,
status code, and reason phrase [1]. The last part, the reason phrase, is a human-readable translation
of the status code and is thus redundant information and is actually not used by SIP entities.
A status line could look like this: "SIP/2.0 200 OK", where "SIP/2.0" is the protocol version,
"200" is the status code, and "OK" is the reason phrase.
It should be mentioned that provisional responses (status code 100-199) are not reliably trans-
mitted, i.e. one cannot be sure whether these are received by the client or not.

A.4.3 Headers
After the request line (for requests) or status line (for responses) there are one or more header
fields. A header field provides information about the request, or response, and about the body of
the message. A complete list of the different header fields and information on where and whether
or not they should be used can be found in tables A.5.
The format of a header field is the field-name followed by a colon, a space character, and one
or more field-values and to it possible parameters and values, i.e. "field-name: field-value1;parameter1=value,
field-value2;parameter2=value, ..., field-valueN" [3]. It should be noted that this is the preferred
format; there can actually be any number of spaces between the field-name and the colon and
between the colon and the field-value [3].

Where Description
Req This header field may only appear in a request.
Resp This header field may only appear in a response.
xxx A numerical value that represents the status code with which the header
field can be used.
Copy The header field is copied from the request to the response.
All The header field may be present in all requests and responses.

Table A.3: Explanation of the "Where" Column of Table A.5

42
Where Description
c Conditional. Requirements on the header field depend on the context of
the message.
m The header field is mandatory.
m* The header field should be sent, but clients/servers need to be prepared to
receive messages without the header field.
o The header field is optional.
t Same as m*, with the addition that if a stream-based protocol, e.g. TCP,
is used as a transport, the header field must be sent.
* The header field is required if the message body is not empty.
- The header field is not applicable.

Table A.4: Explanation of the Method Columns of Table A.5

Header Field (Compact Form) Where ACK BYE CAN INV OPT REG
Authorization Req o o o o o o
Call-ID(i) Copy m m m m m m
Call-Info All - - - o o o
Req o - - m o o
1xx - - - o - -
Contact (m) 2xx - - - m o o
3xx - o - o o o
485 - o - o o o
Content-Encoding (e) All o o - o o o
Content-Language All o o - o o o
Content-Length (l) All t t t t t t
Content-Type (c) All * * - * * *
CSeq Copy m m m m m m
Date All o o o o o o
From (f) Copy m m m m m m
Subject (s) Req - - - o - -
Req - o o m* o o
Supported (k)
2xx - o o m* m* o
To (t) Copy m m m m m m
User-Agent All o o o o o o
Via (v) Copy m m m m m m

Table A.5: Some of the supported header fields in SIP core specification

The field-name is case-insensitive, the same goes for the field-value, parameter names and
parameter values unless otherwise is stated in the definition of a specific header field [3]. Field-
values that are expressed as quotes are however case-sensitive.
Some header field-names have compact (abbreviated values). It is only the most common
headers that has a compact form, and the idea meaning behind it is too prevent messages from
becoming too large [rfc3261 chap 7.3.3]. SIP entities must accept both the normal header-field
and the compact equivalence [3].

A.4.4 Bodies
Message bodies may be contained in requests as well as in responses [3]. The type of internet
media in the message body must be given in Content-Type header field, and the same goes for
Content-Encoding if the body is encoded in a specific way [3]. The most common media type in
the body is a session description [1], e.g. SDP. A SIP message can contain several bodies [1, 3],
e.g. a session description and an audio-file.

43
A.5 Bridging SIP and the PSTN
The fact that SIP supports bridging between PSTN-SIP and SIP-PSTN, is one of the capabilities
that have had much impact when it comes to using SIP for VoIP solutions. This interoperability
is achieved by using gateways. The gateways’ job is to translate between the PSTN protocol and
SIP. In fact from a SIP perspective there is no difference if the call originates from PSTN via
gateway or form a native SIP entity. Figure A.8 shows a sample of the signaling taking place
when communicating between PSTN and SIP.

Figure A.8: Signaling example in a PSTN-SIP situation

44
Appendix B

The Session Description Protocol

As mentioned in appendix A, SDP (Session Description Protocol) is the most commonly used
description protocol to establish multimedia sessions. What makes SDP powerful is that it is
independent of the actual transport protocol that is used for the media session. It is thus possible
to use SDP to set up any type of multimedia session.
The information needed to establish a session is what type (e.g. audio or video) and format
(e.g. amr or mpeg) of the media that is going to be sent, what transport protocol that is going to
be used, from what IP address and port that the media should be sent to. Besides being able to
provide this information, SDP may also be used to define additional information that can be of
interest, e.g. bandwidth information, session name, etc [4].

45
Appendix C

The Real-time Transport Protocol

The Real-time Transport Protocol (RTP) is a protocol that is used to transport real-time data,
such as audio in a VoIP, and can be used over an unreliable transport protocol like UDP [5]. To
every RTP connection there exist a corresponding RTCP (Real-time Transport Control Protocol)
connection, which can give QoS statistics, the ability to synchronize the media, and to create
application specific messages [5].
When using IP to send packets, one can not be certain that the packets arrive in the order that
they were being sent. This also means that if two packets are being sent over IP with 100 ms
between each other, it does not mean that the second packet arrives 100 ms after the first. In order
to ensure that the data sent is used in the order intended, one need to be able to control the jitter
effect described. RTP does this by the use of timestamps.
In order to make use of these timestamps, the one receiving the RTP packets must place these
in a buffered sorted according to the timestamps of the packets. The packets can thus be retrieved
when needed. If a packet is needed but has not arrived yet, it is up to the receiver to take whatever
action it deems necessary, which means that it can either just ignore the fact that a packet is
missing, or try to do something about it using interpolation techniques. Should a packet with a
late timestamp arrive, it may just be dropped as such a packet does not have any purpose at the
given time.

46
Appendix D

Glossary

SEMC Sony Ericsson Mobile Communications


PSTN Public Switched Telephony Network
IP Internet Protocol
SIP Session Initiation Protocol
RTP Real-time Transport Protocol
RTP Session Description Protocol
IMS Internet Multimedia Subsystem
QoS Quality of Service
LAN Local Area Network
WLAN Wireless Local Area Network
TCP Transport Control Protocol
UDP User Datagram Protocol
NAT Network Address Translation
GSM Global System for Mobile Communications
UMTS Universal Mobile Telecommunications System
VoIP Voice over IP

47

Vous aimerez peut-être aussi