Académique Documents
Professionnel Documents
Culture Documents
Software Engineering
Thesis no: MSE-2005:16
October 2005
School of Engineering
Blekinge Institute of Technology
Box 520
SE - 372 25 Ronneby
Sweden
This thesis is submitted to the School of Engineering at Blekinge Institute of Technology
in partial fulfillment of the requirements for the degree of Master of Science in Software
Engineering. The thesis is equivalent to 2 x 20 weeks of full time studies.
Contact Information:
Author(s):
Petter Theander
E-mail: di00pth@student.bth.se
Thomas Hultgren
E-mail: di00thu@student.bth.se
External advisor(s):
Tobias Åkesson
Company/Organisation: Sony Ericsson Mobile Communications AB
Address: Nya Vattentornet, SE - 221 83 Lund
Phone: +46 46 193 986
Pär Olsson
Company/Organisation: Sony Ericsson Mobile Communications AB
Address: Nya Vattentornet, SE - 221 83 Lund
Phone: +46 46 212 67 03
University advisor(s):
Håkan Grahn
School of Engineering, BTH
Contents ii
1 Introduction 2
ii
6 Prototype Implementation 23
6.1 Bluetooth Connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
6.2 The VoIP Prototype . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
6.2.1 Changes in the Underlying Architecture . . . . . . . . . . . . . . . . . . 23
6.2.2 No Support for Full-duplex Audio . . . . . . . . . . . . . . . . . . . . . 24
9 Conclusions 32
Acknowledgements 33
Bibliography 34
D Glossary 47
1
Chapter 1
Introduction
This master thesis work was undertaken to investigate the possibilities of introducing a new
communication technology into an already established communication interface. As new com-
munication technologies are emerging more rapidly today than a couple of years ago, the need to
merge these is also becoming greater. The general trend amongst emerging technologies is that
they are more or less exclusively developed to fulfill the needs of voice communication in an IP-
based packet-switched network as the Internet. Such technologies are commonly known as Voice
over IP (VoIP). Traditional telephony technologies, like the Public Switched Telephony Network
(PSTN), were on the other hand designed to work in circuit-switched networks.
The motivation for undertaking this investigative work was that we saw a general disappoint-
ment of the fact that a new communication technology often meant that one, as a user, were forced
to use a computer without any other really good alternatives. Thus, there was a need for a solution
that made it possible to use the emerging technologies in a more comfortable way, as for example
through a cellular phone.
To us, this lacking was a major drawback, and probably one of the facts that imposes a prob-
lem when introducing a new communication technology. It was these facts that led to the initial
solution proposal presented in chapter 3. This proposal was sent to Sony Ericsson Mobile Com-
munication (SEMC), and earned us the opportunity to undertake more extensive research of what
is actually needed in order to introduce support for a new communication technology in a cellular
phone.
This report presents an investigation of the possibilities for introducing a new communication
technology, like VoIP, into a Sony Ericsson cellular phone. The investigation is based on the
following research questions:
1. Will Bluetooth be able to handle the communication between the cellular phone and the
base unit in accordance to what is seen as "normal" response times and quality in traditional
telephony?
2. Is it possible to integrate IP-telephony support into a cellular phone based on the Sony
Ericsson architecture?
3. Is it possible to use any pre-existing techniques from the Sony Ericsson mobile phone ar-
chitecture in order to ease the implementation?
4. Is it possible to integrate support for more communication technologies based on the se-
lected communication protocols and the Sony Ericsson mobile phone architecture?
Interviews and implementation of a prototype was used in order to evaluate whether the SEMC
architecture supports new technologies. We find that the best option is to use the Session Initiation
Protocol (SIP) and the Real-time Transport Protocol (RTP), which both are supported through the
use of the SEMC IP Multimedia Subsystem (IMS) architecture. We will also see that a SIP and
RTP based solution will support interaction with other voice communication technologies through
the use of gateways. The evaluation of the prototype showed that Bluetooth will suffice for most
voice communication, i.e., in respect to latency and bandwidth.
2
Chapter 2
In order to understand why new voice communication technologies are introduced, when there
in fact already exists a working and well accepted system, one must understand the main dif-
ferences between the traditional Public Switched Telephony Network (PSTN), which is circuit-
switched, and the new IP-based technologies which are used in packet-switched networks. Due to
this reason there will be a short summary of the most important aspects of both circuit-switched
networks and packet-switched networks, along with their respective benefits and drawbacks.
3
no resources will be wasted when sending bursty data, as is the case in circuit-switched networks.
The fact that most packet-switched networks do not offer any QoS means that a client using
the network can not assume that a sent packet actually is received by the recipient. It therefore
becomes the client’s responsibility to handle the QoS aspects of a session [1]. This is however
only true when using UDP and not TCP, as TCP adds transport control functionality to handle
these issues.
4
Chapter 3
3.1 Background
In this chapter the initial idea will be presented. This idea was used as reference material when
we applied for a master thesis project at SEMC. As stated, this is the initial idea, and as can be seen
throughout this report there will be adaptations and modifications to it. Why there is deviation
from this initial idea is quite natural, as the idea presented in this section was not derived from
any pre-study, but rather out of creative thinking and logical reasoning. In short, it was quite clear
to us from the very start that this material would mostly be used as a means of describing one
potential solution to implement IP-telephony in cellular phones. This means that this idea was
derived without any insight on what possibilities were available in the SEMC architecture. For
now we will leave it at this, and describe the idea which earned us a position within SEMC to
investigate the true possibilities for IP-telephony within their architecture.
The source for the idea was that we felt dissatisfied with the fact that one was more or less
forced to either buy a new phone or get stuck in front of a computer, if one should use a new
communication technology, like for instance IP-telephony. This of course leads to that one has
to change phone dependent on which communication technology one would like to use. The fact
that a new communications technology imposes the need to use new physical equipment is in
our opinion one of the main obstacles when introducing new technologies, as people are often
reluctant to change their behavioral patterns [bok].
3.2 Vision
To address the problems described above, we conclude that it would be a good idea to gather
all communication technologies under one physical interface. In order to overcome the problem
with people’s reluctance to change, it was decided that a cellular phone could be a good hardware
interface for all different technologies. This decision was based on that the cellular phone is
already a well accepted way to handle communication, both voice and video. It also has the
advantage, compared to other solutions, that it is mobile. This means that one would always have
the choice to choose freely among the supported communication technologies, independent of the
physical location.
The freedom to choose communication technology and the possibility to fairly easy support
new technologies, without changing the physical equipment, would also lead to economical ben-
efits. This would be true for both companies and home users, as they can easily shift to the most
cost effective communication technology. The greatest economical gains would of course be for
large companies, due to the larger traffic volumes.
The value of having a solution like this will only increase in the future, as new technologies
and communication protocols will emerge more rapidly. Therefore being able to support these
new technologies without major hardware modifications will be even more important than is the
case today. Another benefit with having this solution available on the market is that it can con-
tribute to the development of new communication technologies and protocols, as they can more
easily be introduced to the market.
5
3.3 The Basic Idea
The general idea, which can be seen in figure 3.1, revolves around a cellular phone (1), which is
connected via Bluetooth (2) to a base unit (3), which in turn is connected to an appropriate bearer
for that specific media type (4).
6
3.4.1 The Cellular Phone
The main requirement for the cellular phone, in this solution, is that it has Bluetooth capa-
bilities. This is quite natural as Bluetooth is the bearer for all data traffic between the cellular
phone and the base unit. However, the exact Bluetooth requirements are not fixed. There are
some alternative ways to solve the actual data transfer over Bluetooth. One of these is to let the
cellular phone implement the Bluetooth profile normally used for headsets. This solution means
that the base unit can communicate with the cellular phone using the same standard as it were just
sending audio to an ordinary headset. This solution would however also require that the cellular
phone is able to communicate the connection information using one of the Bluetooth profiles for
data communication. The second alternative is to simply handle all communication, i.e., control
information and voice packets using normal data communication and not separating the two. Ex-
cept for the requirement already mentioned there will of course also be requirements for codec
support, coverage handling, etc.
Bluetooth Interface. This part of the base unit represents the communication interface towards
the cellular phone, and is used when receiving and sending data. This data could be both control
and voice packets.
Packet Handling. This layer is used to filter the incoming packages, which are received on
the Bluetooth interface, according to their type, i.e., control- and audio packets. These packets
are then forwarded to the appropriate module. The packet handling layer is also responsible
for repacking of the data received by the base unit to the correct Bluetooth packet type, before
forwarding these to the Bluetooth interface.
Communication Logic. This module is responsible for handling connection logic, i.e., the logic
needed for setting up and maintaining the connection between the incoming and outgoing inter-
face. This means that it is this module that handles the selection of which bearer to use and
manages the connection with the cellular phone. The choice of which bearer to use is based on
the connection information given. The intention is to make it possible to manually configure this
routing table.
Audio Transformation. This module handles the incoming Bluetooth voice packets and trans-
forms these into an intermediate format. When packets are received by the base unit, this module
transforms the intermediate format into voice packets for Bluetooth.
7
Bearer Packing. These modules are represented in figure 3.1 as "PSTN", "IP-telephony" and
"...". This type of modules are used to repack to and from the intermediate audio format to the
format expected by the specific bearer. This means that it is these modules that decide which
communication technologies and protocols that are supported. The intention is to make this mod-
ule layer easy to expand, and thereby introduce support for new technologies. It should also be
mentioned that care must be taken when choosing the intermediate format, in order to maintain
flexibility.
Bearer Interfaces. These modules are the physical interfaces needed by the software models
discussed in the previous section, this could, e.g., be hardware interface for PSTN, LAN, and
WLAN, etc. The hardware interfaces that are available also affect which communication tech-
nologies that can be supported.
8
Chapter 4
In order to understand the problem domain and the options, the first thing undertaken was a
series of interviews with people who have insight in the current phone architecture and the future
development of the cellular phones at SEMC. Interviews were a quite natural means of obtaining
initial knowledge about the capabilities offered by today’s phone architecture at SEMC, as we had
no previous personal knowledge about the internal architecture of their cellular phones. This lack
of previous knowledge means that the ideas presented so far in this report will be modified quite
a bit. However, it is our opinion that the initial idea presented previously may be of interest, as it
presents at least our visions about the project, and this was in fact what earned us the possibility
to conduct this master thesis at SEMC. This said, it should be pointed out that many of the ideas
presented in the initial proposal will be possible to implement using the technology we finally
decided to use. In the rest of this section the main focus will be on the options offered by the
SEMC architecture, i.e., which parts of the architecture that can be used in order to implement a
solution that fulfills the vision for this master thesis.
9
4.3 Investigating the Current Architecture
Even though the indications from the initial interviews were quite synonymous, i.e., IMS was
the way to go, we still decided to look into the phone architecture first hand. The reason for
doing so was two-folded, one reason was to investigate the options, and the other reason was to
familiarize ourselves with the phone architecture. This insight knowledge was also used to direct
the interview process and questions in its next phases.
This investigation proved to be quite valuable for two reasons. First and foremost we learned
how applications in a cellular phone is generally designed and implemented. This may seem
trivial, but the truth is that the internal architecture of a phone differs quite a bit from what is seen
as normal application development. In a Windows based environment, for instance, one does not
really need to care about process registration and process intercommunication in the same way as
in an embedded system.
The other reason was that we became certain that IMS really was the only option, i.e., with the
time frame in mind. This became clear as the architectural investigations found no good support
for redirecting and managing voice calls in a packet-switched manner. The reason for this was
that there simply was no design support in the current base architecture for manipulating, or even
getting hold of, audio streams in a satisfying manner. The investigations also showed that there
were no good enough native support for media protocols, which could be used for transporting
media data over IP-connections.
These facts meant that if we were to implement a solution with only the support found in the
current base architecture, we would have to first of all make modifications to the current architec-
ture, and secondly develop, or at least implement, a whole new protocol stack. As this would have
shifted the attention away from the initial goals, and would have taken too long to actually realize,
the focus from now on were to make further investigations of IMS and the capabilities offered by
the SEMC IMS architecture.
Session Initiation Protocol. Along with the SEMC implementation of the IMS architecture,
there will be support for the Session Initiation Protocol (SIP) [3], which is a standard for initiating
and managing media sessions over an IP-network [3]. For more detailed information about SIP
please look at appendix A.
10
Session Description Protocol. In the SEMC IMS architecture there will also be support for the
Session Description Protocol (SDP) [4], which is used in combination with SIP. SDP is actually
carried in a SIP message, and is used to describe the actual media that is going to be used after that
the session has actually been established with the help of the SIP signaling. For more information
about SDP please look at appendix B.
Real-time Transport Protocol. The SEMC IMS architecture also facilitates the Real-time Trans-
port Protocol (RTP) [5], which is a protocol used to actually carry real-time data streams like audio
and video, over an IP network. RTP employs real-time capabilities by the use of timestamps and
sequence numbers, which are applied to the packet header. Parallel to every RTP session there
is also a Real-time Control session, which uses the Real-time Control Protocol [5]. The RTCP is
used for synchronization between sender and receiver, as well as handling other session specific
control information. For more detailed information about RTP and RTCP please look at appendix
C.
11
Chapter 5
This chapter describes the design of the VoIP prototype that needs to be created. First there will
be a description of how the protocols investigated in the pre-study (appendix A, B, and C) can be
used in order to fulfill the goals for this project. After this there will be a detailed description of
the VoIP prototype and its relation to the SEMC architecture. In order to illuminate the design, a
set of scenarios showing the interaction between the different parts (VoIP UI, VoIP-server, IMS
SL, etc.) are described in the last section of this chapter.
12
In short, by using SIP, there will be the possibility to add new technologies by adding a new
type of gateway to the network. In fact, the SIP solution allows for the separation of the different
servers and gateways in a network, and thus there is much better load balancing, reliability and
flexibility than was actually the case with the initial idea.
5.1.2 Using SIP and SDP for Negotiating the Media Format
Instead of using a fixed intermediate format for communication between the user interface and
the base unit as described in the initial idea, and then translate this intermediate format into the
bearer specific media format and protocol, one could with a SIP/SDP solution simply skip this
translation, as SDP and SIP allows for communication and negotiation of which media format
and protocol to use. This is done by the parties of the call telling each other their capabilities
and matching these. This means that when communicating there is no need for intermediate
processing of the media format or protocol, as in the initial idea. This is of course only true if the
recipient is also connected to a technology capable of handling SIP and SDP. If, e.g., the recipient
is using PSTN, the actual SIP and SDP communication takes place between the user interface
(in this case a cellular phone) and the PSTN-gateway, and the gateway handles the conversion
between SIP/SDP and its negotiated format to and from the PSTN.
13
Figure 5.1: Overview of the SIP solution
14
5.2.2 IMS SL and the VoIP Server
The VoIP-server can be split up into two parts: the VoIPCore, which is the actual running
application, and the VoIPMediaHandler, which handles the media sessions. The VoIP server uses
the IMS SL for all SIP requests and responses. As said, the IMS SL also helps the overlying
application to setup the negotiated media session. In figure 5.2 can be seen that the VoIPCore
component uses the IMS SL to handle SIP requests. Incoming SIP requests are received by the
VoIPCore as events sent by the IMS SL.
Figure 5.2: Interaction between the VoIP Server and the SEMC IMS Architecture
Figure 5.2 also shows that the IMS SL uses the VoIPMediaHandler component. This is done
using the IMS specific interfaces implemented by the VoIPMediaHandler. The VoIPMediaHan-
dler’s responsibility is to set up the actual media sessions. This is done by using other parts of
the IMS architecture, mainly the RTP and CStreamingMedia. Once the connections between the
two peers have been established using RTP, it becomes the VoIPMediaHandler’s responsibility to
make sure that data is being recorded and sent as well as received and played.
The actual recording and playback of data is done by using the StreamingMedia component.
This is a component that allows for recording and playback to and from a memory buffer, which
is really a must for this solution. The StreamingMedia component also supports full duplex audio,
i.e., simultaneous recording and playback. This will however prove to not be completely true, but
more about this in the implementation chapter.
15
SL. Figure 5.3 shows what interfaces the VoIPCore component implements and also some of its
methods.
Figure 5.5: The functionality that the VoIP Callback Interface provides
16
5.3 Scenarios
This section will show, with help of scenarios, how the VoIP-server interacts with the rest of
the system, and vice versa, in its most crucial parts. Each scenario contains a sequence diagram
and a descriptive text explaining the scenario.
1. When the register method in the VoIPCore component is called, it sets up the register pa-
rameters needed for a successful SIP registration.
2. After this setup has been complete, the register method is called, and upon a response from
the SIP server (or some other network error) a response code is received. The user of the
VoIPCore component is notified with a callback method.
Figure 5.6: The VoIPCore component uses the IMS SL to perform a SIP registration
Figure 5.7: The VoIPCore component uses the IMS SL to initilize a SIP invite request
1. When the Invite method is called in the VoIPCore component, it sets up the invite parame-
ters needed for a SIP invite request.
2. The next thing it does is to request that the IMS SL sends the invite.
17
Figure 5.8: The IMS SL uses the VoIPMediaHandler component to create the SIP invite and to
setup the media streams
4. After the invite has been sent and a response has been received from the remote end, the
IMS SL uses the VoIPMediaHandler to figure out which media sessions that matched (Com-
pareMedia). Using that information, the IMS SL closes the sockets that will not be used
(CloseMediaSockets), and completes the setup of the media session sockets (SetConnec-
tionInfo).
5. The IMS SL notifies the VoIPCore component about the status of the sent SIP invite request,
and the status is forwarded to the user of the VoIPCore component.
Figure 5.9: Preparing the VoIPMediaHandler for the actual media session
1. When an invite-process has been successfully completed, the VoIPCore component calls
the StartSession method in the VoIPMediaHandler in order to get it ready to either start
listening or talking.
2. The VoIPMediaHandler creates the necessary components in order to record and playback
audio.
18
5.3.4 Requesting to Talk
When the user wants to say something to the other participant, he must make a talk "request".
Figure 5.10: Interaction between the different components when requesting to talk
1. Once the request talk has been received by the VoIPMediaHandler component, it requests
an audio channel used for recording.
2. When the request has been approved (happens immediately unless some other part uses that
channel) and thus opened, the recorder is configured.
3. Once a successful configuration of the recording has been completed, a message represent-
ing a request-talk is sent to the remote end.
4. When an ack from the remote end is received, the recorder is started and the VoIPCore
component’s user is notified.
5. Every time there is new data available to send to the remote end, an RTP-packet is created
and sent. This happen frequently until a request talk is received from the remote end,
signaling that it is time to start listening instead (see the incoming request talk scenario).
19
Figure 5.11: Interaction between the different components when a "request talk" is received
1. When a request-talk message is received the current recording is stopped (if there is a cur-
rent recording) and an audiochannel used for playback is requested.
2. Once the request has been approved (happens immediately unless some other part uses that
channel) and thus opened, the player is configured.
3. Upon a successful configuration of the playback has been completed, a message represent-
ing an ack is sent to the remote end.
4. When the first data packet (RTP) arrives, a buffer holding temporary RTP packets is created.
The data from the packet is unpacked and sent to the player for playback.
5. Every time that a new RTP packet is received it is put in the buffer holding the temporary
packets.
6. Whenever the player runs out of data, the next packet is retrieved from the buffer holding
the temporary packets, unpacked, and sent to the player.
20
Figure 5.12: The interaction between the IMS SL and the VoIP-server when a SIP invite is re-
ceived
1. When an incoming SIP invite request is received by the underlying architecture it notifies
the VoIPCore component, which in turn notifies its user.
2. Should the user accept the incoming invite, this is forwarded to the underlying architecture,
which sets up the media session sockets in a manner very much alike the one shown in
the Invite scenario above. Once this is completed, the StartSession method is called in the
VoIPMediaHandler (see Start media session above), and the VoIPCore component’s user is
notified with the results.
3. If the user chooses to reject the incoming SIP invite, this is merely forwarded to the un-
derlying architecture, which notifies the VoIPCore component when it is completed. This
result is forwarded to the user of the VoIPCore.
Figure 5.13: The interaction between the VoIP-server and the IMS SL when sending a SIP bye
1. When the VoIPCore component receives a terminate request from its user, it simply for-
wards this request to the underlying architecture.
2. The IMS SL makes sure that all the media specific sockets are closed by calling imple-
mented functionality in the VoIPMediaHandler component.
21
5.3.8 Incoming Bye Request
This scenario describes what happens when a SIP bye request is received from the remote end.
Figure 5.14: The interaction between the VoIP-server and the IMS SL when a SIP bye is received
1. When an incoming SIP bye request destined for VoIPCore is received, the StopSession
method in the VoIPMediaHandler is called in order de-allocate resources and the VoIPCore
component’s user is notified.
2. The underlying IMS SL architecture makes sure that the media specific connections are
closed bye calling implemented functionality in the VoIPMediaHandler.
22
Chapter 6
Prototype Implementation
In this chapter there will be a brief presentation of what was implemented in order to make a
working prototype. The aim of this chapter is simply to give a slight insight on some of the more
important things that had to be implemented in order to make the prototype reality. Focus will
thus be on the most important aspects and issues that were encountered during the implementation
of the prototype.
23
6.2.2 No Support for Full-duplex Audio
Another thing that was revealed during the implementation was that there was no actual support
for full-duplex audio in the base platform. This meant that it would only be possible to either
record or playback audio, but not both at the same time. The fact that this lack resided in the base
architecture of the platform meant that there were very little to do about it, as the base platform
is developed by a third-party company. As the goal for this thesis was to investigate and develop
a prototype to prove the possibilities for supporting new communication technologies with the
cellular phone as the interface, this was obviously a major drawback as it limits the scope to only
half-duplex solutions.
However, it was our opinion, after having implemented large parts of the VoIP-solution, that
when this lack in the architecture is removed there will be no problem handling full-duplex audio
conversations. In order to temporarily avoid the problem, and still be able to provide some form of
proof that a VoIP-solution with the cellular phone as the interface will still be possible, we shifted
towards a half-duplex solution.
We decided that the simplest way would be to pass an application specific token between
the recipient and the caller, using the RTCP-protocol. This solution was chosen as there is good
support for this kind of token passing through the use of RTCP. In fact, the RTCP-protocol already
provides the possibility to create and pass application specific data with different subtypes.
This token passing solves the problem by only letting the one with the token speak, while the
other party listens. The token passing is described in the design scenarios "Requesting to Talk"
and "Incoming Request Talk" in the previous chapter. However, it must be understood that these
scenarios are not part of the actual VoIP design, but were added as a workaround for the fact that
the platform does not handle full-duplex audio. It should also be noted that it is our belief that
when the audio problem is fixed, there will be little problem shifting from the half-duplex solution
to a real full-duplex VoIP solution. In fact, less work is needed for a full-duplex solution as no
token passing and state handling is needed, which is the case with the half-duplex workaround.
24
Chapter 7
In this chapter we will look back at the initial research goals and see what was actually con-
cluded. During the evaluation of the possibilities for IP-telephony in SEMC cellular phones, we
have also come across a topic that we feel might need further investigation. This topic will also
be presented below.
Answer: With help of the prototype the Bluetooth connection has been empirically to see
whether it provides acceptable latency for voice communication. However as have been stated
before our current prototype only operates with half-duplex audio, i.e., audio in only one direc-
tion at a time. This means that the studies do not actually test if Bluetooth is able to handle real
VoIP communication. To provide a likely answer to the question we refer to what is normally seen
as acceptable latencies when dealing with real-time voice communication. The general opinion is
that latencies below 400 ms will be acceptable for the parties of a conversation, however latency
below 150 ms is recommended [6].
The Bluetooth connection has been empirically tested in respect to this criterion. This was
done by measuring the one way latency between the cellular phone and the PC providing the
IP-connection, i.e., the latency imposed on the data when traveling over the actual Bluetooth
connection.
The results presented below are from tests with two different packet-sizes. These were chosen
in respect to the real sizes of the data packages traversing the link during normal conversation,
i.e., best and worst quality when using the codec in question.
The test was conducted in a normal open office environment, over distances of 1-12 meters.
The fact that the test was conducted in an office environment means that the results should not
be interpreted as a true test of Bluetooth, but rather as an indication of the capabilities offered
to the solution in question, i.e., the results may be influenced by sources of disturbance in the
surrounding environment, such as wireless LAN, performance variations of the cellular phone
and/or PC, etc.
25
Packet Size (bytes) Distance (m) Average (ms) Max. (ms) Min. (ms)
194 1 36 65 17
424 1 40 61 35
Diff. - -4 4 -18
194 4 34 61 19
424 4 40 63 32
Diff. - -6 -2 -13
194 8 33 62 18
424 8 39 62 24
Diff. - -6 0 -6
194 12 37 75 22
424 12 44 72 32
Diff. - -7 3 -10
As can be concluded from table 7.1, there seems to be little impact when examining factors
like packet size and distance. This is however just true up to a certain point, but what is actually
showed is that the Bluetooth connection in itself should be able to handle voice communication,
even for the larger packets, in a "normal" open office environment.
Answer: As have been described in this report there is evolving support for implementing so-
lutions like VoIP in the SEMC architecture. This is mainly because of the features of the SEMC
IMS. However, it is at today’s date not possible to implement a fully working VoIP-solution, due
to the lack of support for full-duplex audio in the base architecture, i.e., the architecture on which
the current phones are built. This is however just a temporary problem and as soon as it have been
remedied there will be little work to actually convert the current solution to actually work as a true
full-duplex audio VoIP-prototype.
Answer: Regarding the support for new communication technologies, we have already seen that
it is possible to support new technologies through the use of media gateways. Even if the support
is not directly part of the SEMC architecture, it is however a support that comes from the fact that
the SEMC architecture supports the Session Initiation Protocol (SIP). The fact that SIP supports
new communication technologies through the use of gateways, means that the support is separated
from the internal architecture, and this leads to some nice features like extended flexibility, load
balancing etc.
26
the Internet, the main requirement becomes that each entity or cellular phone maintains a con-
stant IP-connection. This in turn leads to that the cellular phone will potentially be as exposed
to malicious attacks as every other entity connected to the Internet. Adding to our concerns, we
believe that parts of the SEMC architecture and base architecture were not designed with focus on
handling potentially unsafe data. It is our opinion that there needs to be further resources devoted
to investigating the potential gaps in the design and implementation of the IMS and base architec-
ture, in order to make sure that all unsafe data communication is treated as if it were potentially
harmful.
27
Chapter 8
When conducting this master thesis we have naturally had a general interest in what is happen-
ing in different areas related to VoIP. What has been noticed is that the general interest in VoIP has
more or less exploded during the last years (2004-2005). The fact that the general public becomes
more and more interested in using this new technology means that there is also an increased focus
on the strengths and weaknesses of VoIP. In this chapter there will be a short presentation of the
issues that we have found the most interesting. The topics have been selected in order to address
aspects that are important to both developers and the general public.
28
8.1.2 Avoiding the NAT Problem
There exist different types of NATs. This is one of the reasons that makes the NAT issue, or
rather issues, even harder to solve, as a solution which is fully functional in one situation might
be inadequate in another. Because of this reason there exist different types of solutions, all with
their own benefits and drawbacks. Some solutions can handle every type of NAT, but this comes
at the expense of complexity.
Common solutions for handling the VoIP NAT problem are application layer aware firewalls
and NATs, MIDCOM, TURN and STUN [8].
MIDCOM is an architecture for controlling and modifying firewalls and NATs from a trusted
MIDCOM agent [8]. This of course means that this must be supported by the firewalls and NATs.
In a MIDCOM architecture these entities are referred to as middleboxes. In short a SIP client
residing inside a NAT, should also implement an MIDCOM agent. This agent should thus be
allowed to modify the settings and port forwarding of the NAT (middlebox), i.e., providing it is
trusted by the middlebox [11]. This means that the actual address information written in the SIP
and SDP messages are provided by the MIDCOM agent, and should thus be valid.
STUN and TURN approaches the problem in a slightly different manner. Instead of actually
trying to control the NAT, they try to use the properties of NAT to avoid the problem. The general
idea is that there is a STUN or TURN server residing on the public network. The SIP client then
exchanges information with this server in order to find out which public IP and port it should write
in the SIP and SDP. The exact configuration and information exchange between the SIP client and
STUN or TURN server varies dependent on the type of NAT being used.
Which solution to use depend much on which scale one is operating in, and which NAT
situation one is trying to solve. For more information about the NAT issue and proposed solutions
please see [12].
29
8.3 Public Safety
An issue that has gained more attention as VoIP solutions has become more widely used is the
public safety issue. This issue arises when using VoIP in emergency situations. The basis for
being able to use VoIP at all in an emergency situation is that the VoIP service provider offers
some form of emergency handling. This handling could be more or less advanced, i.e., the service
provider could offer a special emergency solution or just put you through to the emergency service
by using PSTN bridging. The problem with forwarding emergency calls using PSTN bridging is
that this may actual confuse, as the number being received by the emergency services is the phone
number of the PSTN gateway. This is troublesome as emergency services use the caller’s phone
number to find out the geographical position of the caller. This is something that works when
a call really originates from a real PSTN phone, but in the case of the call originating from a
VoIP phone, this may lead to that the emergency response is directed to the address of the PSTN
gateway and not to the actual caller’s address [16].
Another cause for concern, when it comes to using VoIP for placing critical calls, is the fact
that one can not expect the same quality of service from VoIP as from PSTN, as Internet, over
which the call is placed, is a best effort network. Even though the quality of service, when it
comes to VoIP, is getting better all the time, it is something that must be considered. As VoIP
uses normal computer networks for placing calls there is also an extended risk for not being able
to place an emergency call in case of power outs [17]. To understand the severity of a power out
one could just look at a normal power out situation. In case of power failure in a normal family
home, all computer communication will not work as the computer based home network relies on
power to function properly, i.e., all equipment, like cable modems, routers, and VoIP boxes, need
an external power supply in order to function.
As VoIP becomes more widely used, the public safety issue has also received more focus, and
different solutions have been proposed in order to handle the safety issues. The proposed solutions
vary in complexity; everything from manually entering your location when signing on the VoIP
network [17], to solutions like direct truncating, where emergency calls are automatically routed
to public safety answering points [18], have been proposed.
It is however our opinion that the solution presented in this report, where the actual VoIP
client is implemented in a cellular phone, offers some good solutions to these issues, as it enables
the option to route all emergency calls through the cellular network instead of relying on the
capabilities offered by the VoIP network. Although there are opinions that one should not rely on
other services to provide emergency handling, as this will slow down the development of VoIP, we
still believe that a solution like the one presented in this report will serve a purpose until the VoIP
emergency handling have matured and there have been a standard developed for public safety
using VoIP.
30
The reason for the transition towards solutions like the one presented in this report is, accord-
ing to us, that such solutions offer much greater flexibility and usability than was achieved by
the earlier solutions. Earlier solutions had the intention to make use of VoIP for handling nor-
mal PSTN communication. This means that most of the earlier solutions offer capabilities for
accepting incoming calls and in some case also possibilities to initiate call to others by simply
entering the PSTN phone number on the handset. This is fine in the perspective as an replacement
for PSTN, but such solutions are not intended to handle outgoing calls to pure VoIP entities, i.e.,
recipients using soft phones and SIP accounts. These simpler solutions will also be excluded from
other achievements within the VoIP area, such as combinations of voice and video, as the handsets
will not have the needed capabilities.
When implementing VoIP support in a cellular phone, one solves many of these issues. The
solution will be easy to use; users can call pure VoIP as well as traditional PSTN numbers by
simply choosing the proper account, for the intended recipient i.e., SIP-URI or PSTN number.
The fact that the VoIP capabilities reside in a cellular phone also means that the solution is highly
portable, and as WIFI becomes more common in cellular phones this means that one will be able
to use VoIP everywhere there is a hotspot. In situations where one, for one reason or the other,
can not use VoIP communication, one will not be left stranded as there is still the possibility to
use the cellular phone in a traditional way.
In the longer run there will probably also be the possibility to use VoIP over cellular networks,
like UMTS. When this option becomes really interesting, will, in our opinion, depend much on
the development of the cellular networks and the cost for using them.
31
Chapter 9
Conclusions
In this report there has been a presentation of the investigative work on the possibilities for
implementing VoIP in the Sony Ericsson cellular phones, using the SEMC architecture. The
possibilities to support such a solution over Bluetooth have also been investigated.
The investigations in this report have shown that there is partial support for VoIP in the SEMC
architecture. In order to have full VoIP-support, the issue of the base architecture only handling
half-duplex audio must be addressed. It has also been concluded that the best option for im-
plementing a VoIP-solution, in a Sony Ericsson cellular phone, is to use the Session Initiation
Protocol (SIP) for call signalling and the Real-time Transport Protocol (RTP) for media stream-
ing. The SIP and RTP protocols are supported through the use of the SEMC IMS architecture. It
has also been concluded that a SIP and RTP based solution could support other communication
technologies like PSTN, through the use of gateways.
The VoIP-support in the SEMC architecture was empirically tested by implementing a proto-
type. Measurements performed on this prototype show that Bluetooth will fulfil the requirements
for most VoIP-solutions, i.e., in respect to latency and bandwidth.
32
Acknowledgements
First of all we would like to thank everyone at UMTS and GSM Services at Sony Ericsson
Mobile Communications AB in Lund, Sweden. Everyone have been very understanding, helpful,
and willing to spend time answering questions regarding the SEMC mobile phone architecture
and the development environment.
We would like to give special thanks to Anna Göransson, Gary Cole, Håkan Grahn, Mikael
Kanstrup, Pär Olsson, Suri Maddhula, and Tobias Åkesson, as they have been particularly helpful.
33
Bibliography
34
[19] Jenny Levine. Product Pipeline. Library Journal, 130:22–24, 2005.
[20] Wayne Rash. Two IP Phones Worth Picking Up. InfoWorld, 26(4):26, 2004.
[21] Vince Vittore. VoIP-enable CPE market fills with new product entries. Telephony,
245(24):17–18, 2004.
[22] John R. Quain and Marc Silver. Phones that love Wi-Fi. U.S. News and World Report,
137(9):75, 2004.
[23] Bob Brewin. Mobile Phones Move Toward Combined Calling Capabilities. Computerworld,
38(13):6, 2004.
[24] Alan B. Johnston. SIP: The Session Initiation Protocol. Artech House Inc., 2001.
[26] J. Klensin. Simple Mail Transfer Protocol. RFC 2821 (Proposed Standard), April 2001.
35
Appendix A
The purpose of this section is to give a description on each one of these entities as well as an
understanding of how these entities are connected and used by each other.
36
A.2.1 User Agents
A user agent (UA) is an entity that a user interacts with [1]. It may be an application in a
computer or in a cellular phone, a telephone dedicated to internet telephony, etc.
A.2.2 Registrars
A registrar is a SIP server that accepts registration requests, i.e. a server that a user tells where
they can be reached [1].
Figure A.2: The interation between a SIP registrar and a location server when user A registers
Figure A.3: The interaction between the SIP entities when user A sends a SIP invite to user B,
when redirect servers are used
1. User A wants to invite User B to a multimedia session and tries to reach user B at his public
address, which is B@domain.com. At domain.com there is a redirect server that handles
incoming calls
37
2. The redirect server at domain.com asks its location server where User B is.
4. The redirect server tells User A that User B can be reached at B@domain2.com
5. User A then sends the invitation to B@domain2.com. Domain2.com also has a redirect
server that handles incoming calls.
6. The redirect server at domain2.com asks its location server where User B is.
8. The redirect server tells User A that User B can be reached at B@computer.domain2.com.
Figure A.4: The interaction between the SIP entities when user A sends a SIP invite to user B,
when proxy servers are used
1. User A wants to invite User B to a multimedia session and tries to reach User B at his public
address, which is B@domain.com. At domain.com there is a proxy server that handles the
incoming request.
2. The proxy server at domain.com asks its location server where User B is.
4. The proxy server at domain.com forwards the invitation from User A to B@domain2.com.
Domain2.com also has a proxy server that handles the incoming request.
5. The proxy server at domain2.com asks its location server where User B is.
7. The proxy server at domain2.com forwards the invitation from User A to B@computer.domain2.com
38
A proxy can try more than one location for a user. This is called forking and can be done
either parallel or sequential depending on how the proxy is configured [1]. Parallel means that
the proxy tries to reach all of the locations at the very same time whereas sequential means that it
tries one after another.
Parallel forking implies that many UAs retrieve a SIP invitation at once, e.g. more than one
phone rings when someone is calling. Sequential forking, on the other hand, can be seen as some
kind of forwarding. If the user is not available (does not answer) at one location, the next location
is tried, until there are no locations left or until the user answers.
A.3.1 Responses
When a UAS receives a request it will send out one or more responses. The meaning of the
response is to give the UAC information about the status of the transaction. This results in that
there are a vast amount of responses that can be sent.
A response contains both a status code and a reason phrase. The status code is an integer
between 100 and 699, whereas the reason phrase is a humanly-readable translation of that status
code. The responses are divided into six different classes, as seen in table A.1. A list of responses
that the authors have deemed as the most commonly used, and thus the most interesting, can be
found in table A.2. A complete list of the responses defined in the SIP core can be found in the
SIP rfc [3].
39
Status code Reason phrase Comment
100 Trying -
180 Ringing -
181 Call is being forwarded -
182 Queued -
200 OK -
301 Moved permanently -
302 Moved temporarily -
305 Use proxy -
400 Bad request -
401 Unauthorized Used only by registrars, proxys should use 407
404 Not found User not found
405 Method not allowed -
406 Not acceptable -
407 Proxy authentication required -
408 Request timeout Could not find user in time
415 Unsupported media type -
480 Temporarily unavailable -
486 Busy here -
487 Request terminated -
491 Request pending -
502 Bad gateway -
505 Version not supported -
A.3.2 Requests
A specific request is referred to as a method. The SIP core specification defines six types
of methods. There are, however, extensions to SIP that define additional requests. The request
method is denoted in a specific field in SIP [1]. SIP requests may also contain a body, which is
the packet’s payload. This payload is usually a session description [1].
Invite. As the name implies, the invite request invite users to a session [1]. The payload in this
request contains a session description, and can e.g. be describing an audio session.
Ack. An ack request is a final response to an invite request, i.e. the UAC that sent the invite
request will send an ack after it has received the final response [1]. Figure A.5 shows an example
of an invite-response-ack.
Cancel. A cancel request will cancel any pending transaction if the server processing the request
has not sent a final response. In that case the cancel request will be ignored [1].
40
Figure A.6: Example of a cancelled SIP invite
Figure A.6 shows an example of User A sending an invite request to User B. The request first
passes a forking proxy which sends out the invites in parallel to two locations that User B can be
reached at. User B answers at domain2.com and sends its final response (200 OK) to the proxy
which forwards it to User A. User A then responds with an ack. When the proxy receives the final
response from User B at domain2.com, it wants to cancel all other invitations, so it sends a cancel
request to User B at domain.com, which first sends a response for the invite (200 OK) and then a
response for the cancel (487 Cancelled). The proxy then sends an ack for the latter response.
The example described above shows the use of the cancel request. It should be mentioned
that the proxy could have sent a cancel request to all of the peers it sent an invite to, as the cancel
request will not affect an ongoing transaction [1].
Bye. When a person wants to leave a multimedia session, he sends a bye request (see figure
A.7). If the session is only consisting of two persons, it means that the session itself will be
terminated. If there are more persons involved in the session, it merely means that the user leaves
the session [1].
Register. A user sends a register request when it wants to update his current location at a server,
i.e. the user tells a server where he wants to be reached. It is possible to add multiple locations to
be reached at. It is also possible to inform the server how long the registration should last. If User
A wants to be reached at domain2.com until one o’clock he can register that, and at one o’clock
that registration is no longer valid [1].
Options. The option request is used to ask a server about its capabilities, i.e. what session
description protocols that it understands, what requests it can answer to, which encodings it un-
derstands [1].
41
A.4 SIP Message Format
SIP is a text-based protocol. The format of the message depends on if it is a request or a
response. These are however quite similar and consist of a request-list (request) or a status-line
(response), one or more header fields, an empty line (carriage return, line feed), and finally an
optional body [3].
A.4.3 Headers
After the request line (for requests) or status line (for responses) there are one or more header
fields. A header field provides information about the request, or response, and about the body of
the message. A complete list of the different header fields and information on where and whether
or not they should be used can be found in tables A.5.
The format of a header field is the field-name followed by a colon, a space character, and one
or more field-values and to it possible parameters and values, i.e. "field-name: field-value1;parameter1=value,
field-value2;parameter2=value, ..., field-valueN" [3]. It should be noted that this is the preferred
format; there can actually be any number of spaces between the field-name and the colon and
between the colon and the field-value [3].
Where Description
Req This header field may only appear in a request.
Resp This header field may only appear in a response.
xxx A numerical value that represents the status code with which the header
field can be used.
Copy The header field is copied from the request to the response.
All The header field may be present in all requests and responses.
42
Where Description
c Conditional. Requirements on the header field depend on the context of
the message.
m The header field is mandatory.
m* The header field should be sent, but clients/servers need to be prepared to
receive messages without the header field.
o The header field is optional.
t Same as m*, with the addition that if a stream-based protocol, e.g. TCP,
is used as a transport, the header field must be sent.
* The header field is required if the message body is not empty.
- The header field is not applicable.
Header Field (Compact Form) Where ACK BYE CAN INV OPT REG
Authorization Req o o o o o o
Call-ID(i) Copy m m m m m m
Call-Info All - - - o o o
Req o - - m o o
1xx - - - o - -
Contact (m) 2xx - - - m o o
3xx - o - o o o
485 - o - o o o
Content-Encoding (e) All o o - o o o
Content-Language All o o - o o o
Content-Length (l) All t t t t t t
Content-Type (c) All * * - * * *
CSeq Copy m m m m m m
Date All o o o o o o
From (f) Copy m m m m m m
Subject (s) Req - - - o - -
Req - o o m* o o
Supported (k)
2xx - o o m* m* o
To (t) Copy m m m m m m
User-Agent All o o o o o o
Via (v) Copy m m m m m m
Table A.5: Some of the supported header fields in SIP core specification
The field-name is case-insensitive, the same goes for the field-value, parameter names and
parameter values unless otherwise is stated in the definition of a specific header field [3]. Field-
values that are expressed as quotes are however case-sensitive.
Some header field-names have compact (abbreviated values). It is only the most common
headers that has a compact form, and the idea meaning behind it is too prevent messages from
becoming too large [rfc3261 chap 7.3.3]. SIP entities must accept both the normal header-field
and the compact equivalence [3].
A.4.4 Bodies
Message bodies may be contained in requests as well as in responses [3]. The type of internet
media in the message body must be given in Content-Type header field, and the same goes for
Content-Encoding if the body is encoded in a specific way [3]. The most common media type in
the body is a session description [1], e.g. SDP. A SIP message can contain several bodies [1, 3],
e.g. a session description and an audio-file.
43
A.5 Bridging SIP and the PSTN
The fact that SIP supports bridging between PSTN-SIP and SIP-PSTN, is one of the capabilities
that have had much impact when it comes to using SIP for VoIP solutions. This interoperability
is achieved by using gateways. The gateways’ job is to translate between the PSTN protocol and
SIP. In fact from a SIP perspective there is no difference if the call originates from PSTN via
gateway or form a native SIP entity. Figure A.8 shows a sample of the signaling taking place
when communicating between PSTN and SIP.
44
Appendix B
As mentioned in appendix A, SDP (Session Description Protocol) is the most commonly used
description protocol to establish multimedia sessions. What makes SDP powerful is that it is
independent of the actual transport protocol that is used for the media session. It is thus possible
to use SDP to set up any type of multimedia session.
The information needed to establish a session is what type (e.g. audio or video) and format
(e.g. amr or mpeg) of the media that is going to be sent, what transport protocol that is going to
be used, from what IP address and port that the media should be sent to. Besides being able to
provide this information, SDP may also be used to define additional information that can be of
interest, e.g. bandwidth information, session name, etc [4].
45
Appendix C
The Real-time Transport Protocol (RTP) is a protocol that is used to transport real-time data,
such as audio in a VoIP, and can be used over an unreliable transport protocol like UDP [5]. To
every RTP connection there exist a corresponding RTCP (Real-time Transport Control Protocol)
connection, which can give QoS statistics, the ability to synchronize the media, and to create
application specific messages [5].
When using IP to send packets, one can not be certain that the packets arrive in the order that
they were being sent. This also means that if two packets are being sent over IP with 100 ms
between each other, it does not mean that the second packet arrives 100 ms after the first. In order
to ensure that the data sent is used in the order intended, one need to be able to control the jitter
effect described. RTP does this by the use of timestamps.
In order to make use of these timestamps, the one receiving the RTP packets must place these
in a buffered sorted according to the timestamps of the packets. The packets can thus be retrieved
when needed. If a packet is needed but has not arrived yet, it is up to the receiver to take whatever
action it deems necessary, which means that it can either just ignore the fact that a packet is
missing, or try to do something about it using interpolation techniques. Should a packet with a
late timestamp arrive, it may just be dropped as such a packet does not have any purpose at the
given time.
46
Appendix D
Glossary
47