Académique Documents
Professionnel Documents
Culture Documents
Page |2
This paper examines and describes various virtualization architectures, its challenges for soft
real-time, and effects on an Online Charging System (OCS) deployment architecture therein.
Page |3
1 Abstract
Virtualization technology and Cloud computing have revolutionized general-purpose computing applications in the past decade. The cloud paradigm offers advantages through reduction
of operation costs, server consolidation, flexible system configuration and elastic resource provisioning. However, despite the success of cloud computing for general-purpose computing,
existing cloud computing and virtualization technology face tremendous challenges in supporting emerging real-time applications such as online-charging, online video streaming, and other
telecommunication management. These applications demand real-time performance in open,
shared and virtualized computing environments. This paper studies various virtualization architecture to identify the technical challenges in supporting real-time applications therein, focusing on Online-Charging. It also surveys recent advancement in real-time virtualization and
cloud computing technology, and research directions to enable cloud-based real-time applications in the future.
Keywords: Visualization, Cloud computing, Hypervisor, OCS, real-time, NFVI, Charging, distributed system.
Page |4
TABLE OF CONTENTS
1
Abstract ........................................................................................................................................ 3
Terminology.................................................................................................................................. 6
Introduction .................................................................................................................................. 8
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
Background ................................................................................................................................. 8
Concerns ..................................................................................................................................... 8
Objectives ................................................................................................................................... 9
Scope .......................................................................................................................................... 9
Stakeholders ............................................................................................................................... 9
Methods...................................................................................................................................... 9
Note to the reader .................................................................................................................... 10
Outline ...................................................................................................................................... 10
Current situation......................................................................................................................... 11
10
NFV challenges............................................................................................................................ 29
11
12
13
Conclusions ................................................................................................................................. 38
Page |5
14
Discussions ................................................................................................................................. 39
15
References .................................................................................................................................. 40
16
17
18
19
Page |6
2 Terminology
4G
ABMF
BE
Back-end
BRM
BSS
CAPEX
Capital Expenditures
CBA
CDR
CPU
CRM
CSP
CTF
DCCA
EM
Element Management
EPC
EPG
ERP
ETSI
FE
Front-end
GGSN
IaaS
Infrastructure-as-a-Service
IETF
IMS
IP Multimedia Subsystem
IT
Information Technology
KVM
LTE
Long-Term Evolution
MMS
MTAS
NaaS
Network-as-a-Service
NFV
NFVI
NFV Infrastructure
NFV-MANO
NIC
Page |7
OCF
Online-Charging Function
OCS
Online-Charging System
OM
OPEX
Operating Expenditures
OS
Operating System
OSS
PaaS
Platform-as-a-Service
PCI
PL
Payload
QoS
Quality of Service
RT
Rating Function
RTO
Retransmission TimeOut
SaaS
Software-as-a-Service
SC
System Controllers
SCUR
SDP
SLA
SR-IOV
TCO
TTM
Time to market
UDR
vCPU
Virtual CPU
vDC
VF
Virtual Functions
VIM
VM
Virtual Machine
VMM
VNF
VNFC
VNF Component
VNFM
VNF Manager
vNIC
Virtual NIC
VoLTE
VOMS
Page |8
3 Introduction
3.1 Background
The current online charging system (OCS) for TeliaSonera has provides real-time charging successfully since 2006 for three of the Nordic countries. It offers prepaid services for brands like
Refill, Halebop, Netcom, Telia DK & Chess.
These kind of systems has high demands on availability and performance to achieve their functions, which usually require software installed on proprietary hardware. The rapid pace of innovation for these types of hardware, OCS dependencies to it, and drive for cost efficient solutions has reduced the system life cycle. Today, resulting in a nightmare because of more or less
a constant flow of tedious upgrades. Their consequences are not limited to increased costs and
time consumption, it also risk limit the business delivery capabilities.
As the industry demands, faster time to market (TTM) and reduced total cost of ownership
(TCO) this is not sustainable in the long run and calls for some action.
Ideas about migrating the system from todays native, dedicated hardware, into a privatecloud has been discussed. That would most likely simplify todays life cycle management of the
system infrastructure, provide elastic scaling, reduce costs and bring other virtualization benefits. However, there are some doubts on whether a soft real-time system, like OCS, in a virtual
environment can provide the same overall quality attributes. In particular reliability, availability, manageability, security and performance as the current setup.
3.2 Concerns
The virtualized data centre are currently considered state-of-the-art technology in the Information Technology (IT) domain while in the telecom domain there are no widespread deployments yet. One key differentiator between the IT and telecom domains is the level of service
continuity required. In the IT domain outages lasting seconds are tolerable and the service user
typically initiates retries whereas in the telecom domain there is an underlying service expectation that outages will be below the recognizable level (i.e. in milliseconds), and service recovery is performed automatically.
One of the disruptive technologies that is emerging in the area of telecom Cloud Computing
and data centers architectures is the one of Network Functions Virtualization (NFV) [1]. Shifting from hardware-based provisioning of network functions to a software based provisioning
where so-called virtualized network functions (VNF) [1] are deployed in private or hybrid
clouds of communication service providers1 (CSP) [2].
Identifies the technical challenges in supporting real-time applications like OCS in the
cloud.
How compliant is the OCS architecture to the Telco-cloud (NFV reference architecture)?
CSP is broad category encompassing telecommunications, entertainment and media, and Internet/Web service businesses.
Page |9
3.3 Objectives
The objectives of this paper are
give an overview of the virtualization technology for cloud computing focusing on the realtime issues that appear therein.
identify real-time challenges in cloud computing.
present a subset of selected examples that illustrate some of the design decisions taken by
virtualization technology to integrate real-time support.
give an overview of the OCS architecture and the NFV framework.
proposing a deployment of for OCS as VNF in the Telco-cloud (NFVI) and identify architectural impact.
to learn more about the areas mentioned above.
3.4 Scope
Because of the limited time I will focus on OCSs integration points with the network which is
handling real-time charging of IMS services, e.g. voice call over 4G and messaging. Its framework is regarded as future-proof for charging solutions. Online charging of 2G/3G voice call are
out of the scoop for this paper. So is top-up, voucher management and balance control.
The deployment scenario in this paper will focus on a Monolithic Operator, where the same organization that operates the virtualized network functions deploys and controls the hardware
and hypervisors they run on, and physically secures the premises in which they are located.
3.5 Stakeholders
IT Solution owner
IT solution manager
Solution and Infrastructure architects
CSP Vendor(s)
Study counselor for the DF IT architecture training.
3.6 Methods
Unstructured interviews with various stakeholder to explore requirements and get more
knowledge.
System observations to provide the most objective picture of reality as possible and an inductive approach. Literature studies and perusing of available material for information gathering
about the theories to meet the objectives.
P a g e | 10
3.8 Outline
Section 5 overviews and explains various important requirements affecting OCS
Section 6 offers an overview of OCS architecture, and how its link to the current solution.
Section 7 offers an overview of the virtualization technology for cloud computing
Section 8 identifies different real-time challenges in cloud computing
Section 9 an overview of the technology and architecture for network function virtualization.
Section 10 identifies VNF challenges in NFV, with suggest solutions and requirements.
Section 11 proposing a solution architecture of for OCS as VNF in the NFVI
Section 12 points out open areas of research related to NFV.
Section 13 conclusion
P a g e | 11
4 Current situation
Besides what we mention in the background section, converting charging has gained momentum during the last years as the IMS technology get wider spread. Requiring operators to integrate or replace there various types of charging systems, e.g. for fixed, mobile and broadband,
with a new common one. The new common charging & rating system are often an OCS or similar. Furthermore, as the company planes for a new system is slowly taking shape, the legacy
prepaid OCS are facing a major upgrade of the platform software version and its hardware during next year.
The company can today provide a private-cloud with Platform-as-a-service (PaaS) and will soon
have a Telco-cloud as Infrastructure-as-a-Service (IaaS) ready. These gives us different options for the technical implementations. What type of cloud is most suitable based on the
Online Charging System requirement? If any?
In a virtualized context where a multitude of virtual machines (VMs) share the same physical
hardware for providing a plethora of services with highly various performance requirements to
independent customers/end-users brings many challenges, some of which can be tackled as
summarized in this paper.
P a g e | 12
One of the most important of these performance metrics quantifies are Retransmission
Timeouts (RTOs). The table below show the figures for the OCS Rc/Re reference point (page 14
- Figure 3) which is one of many integration point end-to-end for a charging control transition,
e.g. setup a VoLTE call.
maximum RTO
200 ms
minimum RTO
80 ms
The charging and billing information is to be generated, processed, and transported to a desired conclusion in less than 1 second. The customer will expect the call to be setup in reasonable time, normally 3-4 seconds. Figure 1 below illustrates the expected performance requirements.
Statistics from the DCCA server showing max response time for CCR-I and number of PADS initial statistics from the current OCS solution. See Appendix D: OCS RTO calculation & Latency for more details
P a g e | 13
Billing
CRM
ERP
systems
systems
systems
Online Charging
Network layer
The 3rd Generation Partnership Project (3GPP) Charging architecture divides OCS into three
main functions; the online-charging function (OCF), account balance management function
(ABMF) and finally the rating function (RT). The different functions are described below:
The Online Charging Function (OCF) consists of two distinct charging modules, namely
the Session Based Charging Function (SBCF) and the Event Based Charging Function
(EBCF). The charging events are forwarded from the CTF, outside of OCS, to the Online
Charging Function (OCF) in order to obtain authorization for the chargeable event
and/or network resource usage3 requested by the end-user. The OCF communicates
with the Rating Function in order to determine the value of the requested bearer resources or session, and then with the Account Balance Management Function to query
and update the subscribers' account and counters status [3].
Account Balance Management Function (ABMF) is the location of the subscribers account balance within the OCS. E.g. Recharged/top-up money or account counters like
free Giga bytes, free Calls/Message etc.
The Rating Function (RF) determines the value of the network resource usage on behalf of the Online Charging Function. To this end, the Online Charging Function furnishes the necessary information, obtained from the charging event to the RF and receives in return the rating output (monetary or non-monetary units) via the Re reference point. The RF may handle a wide variety of ratable instances [3].
The Charging Trigger Function (CTF)4 is outside of OCS and is able to delay the actual resource
usage until permission by the OCS has been granted. It also track the availability of resource
usage permission ("quota supervision") during the network resource usage, and can enforce
termination of the end users network resource usage when permission by the OCS is not
granted or expires. From the online charging architecture perspective, the IMS gateway function (GWF), the Gateway GPRS Support Node (GGSN) or Telephony Application Server (TAS called MTAS in this paper) are examples of online charging capable CTFs.
The Ro reference point supports interaction between a Charging Trigger Function and an
Online Charging Function. The information flow across this reference point are online charging
events. The protocol crossing this reference point, called Diameter Credit-Control Application
Typical examples of network resource usage are a voice call of certain duration, the transport of a certain volume of data, or the submission of a MMS of a certain size
4
Part of networks provide functions that implement offline and/or online charging mechanisms on the
bearer (e.g. EPC), subsystem (e.g. IMS) and service (e.g. MMS) levels.
P a g e | 14
(DCCA) [4], supports the capabilities of real-time transactions in stateless mode (event based
charging) and statefull mode (session based charging) [3].
Figure 3 below provides an overview of the online parts in a common charging architecture.
OCS (Onlince Charging System)
CN
Domain
ABMF
C
T
F
Service
element
Ro/CAP
O
C
F
Rc
Re
RT
Subsystem
Figure 4 shows the transactions that are required on the Ro reference point in order to perform the Session Charging with Unit Reservation (SCUR), e.g. for IMS Voice Call or Data session.
OCS/
CTF
1. Session Request
5. Session Delivery
9. Session Delivery
P a g e | 15
DCCA
client
DCCA server
EPG
OCF client
Customer
Module
Charging
Client
Diameter
Ro
MTAS
ABMF
OCF server
Charging
Server
Rc
Re
RT
Customer
Rating
Rating Engine
Back-End
BE-SDP
FIGURE 5 : CURRENT OCS DEPLOYMENT VIEW WITH 3GPP OCS ARCHITECTURE MAPPING
Front-End (FE): has two components; DCCA server and OCF client. The DCCA server communicates externally with the CTS DCCA client via the Ro (or Gy) integration point and internally
with the OCF client, which externally communicates with the OCF server.
Back-End (BE): has three components; OCF server, ABMF and RT. The BE can have one or many
FE connected. OCF-server communicates externally with OCF client(s), and internally with the
ABMF and RT. It also contain a Voucher Manager (VOM) component, but that is outside the
scope of this paper.
BE-Service Data Point (SDP): holds the data for Customer and Rating information in a relationship database.
Stateless during normal operation, stateful in case the front-end cant do online-charging against backend. The front-end then stores transaction and post-charge them as the connection to the back-end is
established again.
P a g e | 16
P a g e | 17
App
App
Guest OS
Guest
OS
VM
Bin/libs
VM
Bin/libs
App
Bin/libs
Bin/libs
Guest OS
Guest
OS
VM
App
VM
App
App
Hypervisor
Hypervisor
Host OS
Hardware
Hardware
Bare metal or type 1 that runs directly on the physical hardware platform. It virtualizes
the critical hardware devices offering several independent isolated partitions. It also
provides basic services for inter-partition control and communication.
Type 2 or hosted hypervisors that run on top of an operating system that acts as a
host. These are hosted hypervisors since they run within a conventional operating system environment. The hypervisor layer is typically a differentiated software level on
top of the host operating system (that runs directly above the hardware), and the
guest operating system runs at a different level.
Note: the distinction between these two types is not necessarily clear. For example, the Linux's
Kernel-based Virtual Machine (KVM) that effectively convert the host operating system to a
type 1 hypervisor, can also be categorized as type 2 hypervisor.
Conclusion: The type 1 hypervisors are suitable for real-time systems since its
VMs are close to the hardware and able to use hardware resources directly rather than
going through an operating system. It is more efficient than a hosted architecture and
delivers greater scalability, robustness and performance. Therefor type 2 hypervisor is
not in the scope for this paper.
Bare-metal hypervisors are divided into two subcategories: Monolithic and Micro-kernelized
designs. The difference between them are the way of dealing with the device drivers. Example,
Hyper-V (Microsoft) and Xen (open source) are micro-kernelized hypervisors which leverages
para-virtualization together with full-virtualization, while VMware ESXi is a monolithic hypervisor which leverages hardware emulation [7].
The biggest advantage of this design is that it does not require a host operating system. The hypervisor acts as the operating platform with drivers, so it is easily possible
to run multiple operating systems, even heterogeneous ones, on the same hardware.
P a g e | 18
Then, the hypervisor takes care of the necessary network emulation, in order to let the
VMs communicate with the outside world and with each other.
The drawbacks are limited hardware support and instability - the device drivers are directly incorporated into the layers of functionality, which means that if one driver is hit
by an update, bug, or security vulnerability, the entire virtual architecture within that
physical machine will be compromised [8].
Microkernelized Hypervisor
Monolithic Hypervisor
Management
Console
Parent Partition
Virtual Machines
Virtual Machines
Drivers
Hypervisor
Drivers
Hypervisor
Hardware
Hardware
In contrast, Micro-kernelized design (Figure 7), e.g. Hyper-V Server and Xen, does not require
the device drivers to be part of the hypervisor layer. Instead, drivers for the physical hardware
are installed in the operating system, which is running in the parent partition (VM) of the hypervisor [9]. This means that there is no need to install the physical hardware supporting device drivers for each guest operating system running as a child partition, because when these
guest operating systems need to access physical hardware resources on the host computer,
they simply do this by communicating through the parent partition [10]. This communication
can be via a very fast memory-based bus in case the child partition is para-virtualized, or using
the emulated devices provided by the parent partition in case of full-virtualization.
Micro-kernalized is out of the scoop for this paper. However, worth mentioning is that studies
has shown that Xen predictability is very close to the bare-machine (non-virtualized hardware)
which makes it a candidate to be used for soft real-time hypervisor [11].
P a g e | 19
We have to take a closer look at VMware ESXi, as it is the main stream vendors, and chosen as
hypervisor by the company. ESXi as is a type 1 monolithic hypervisor. It is an operating systemindependent hypervisor based on the VMkernel OS, interfacing with agents and approved
third-party modules that run atop it. VMkernel is a POSIX-like (Unix style) operating system developed by VMware and provides certain functionality similar to that found in other operating
systems, such as process creation and control, signals, file system, and process threads.
Designed specifically to support running multiple virtual machines and provides such
core functionality as resource scheduling, I/O stacks, and device drivers [12].
Full virtualization environment, like ESXi, will cause continuous traps to the hypervisor
that intercepts the special privileged instructions that a guest operating system kernel
believes to execute, but whose effect is actually solely emulated by the hypervisor. Examples of such privileged instructions are accesses to peripheral registers. The implied
performance issues may be mitigated by recurring to hardware-assisted virtualization
[13].
P a g e | 20
The biggest advantage of this design improved I/O throughput and reduced CPU utilization, lower latency, improved scalability.
This designs drawbacks are: portability; Live Migration is not possible, reduced features for VM configuration patching. But probably the major problem is the lack of full
support for SR-IOV in management and orchestration solutions for NFV.
7.2.3 Para-virtualization
It was the open-source Xen hypervisor that brought in the concept of para-virtualization, also
used by Hyper-V (MS product name Enlightened [15]). It is a technique by which the guest operating system is modified so as to be aware to be running within a VM. This results in avoiding
the wasteful emulation of virtualized hardware. Rather, the modified kernel and drivers of the
guest operating system are capable of performing direct calls to the hypervisor (also known as
hypercalls) whenever needed6. The evolution of hardware assisted virtualization, coupled with
para-virtualization techniques, allow virtualized applications now days to achieve an average
performance only slightly below the native one.
The guest OS must be tailored specifically to run on top of the virtual machine monitor
(VMM). And it eliminates the need for the virtual machine to trap privileged instructions. Trapping, means of handling unexpected or unallowable conditions, can be timeconsuming and can adversely impact performance in systems that employ full virtualization.
The performance advantages of para-virtualization over full-virtualization have to face the weakened
isolation from a security perspective, but these aspects are outside the scope of this paper.
P a g e | 21
Bin/libs
App
App
Bin/libs
Bin/libs
Guest OS
Guest
OS
VM
Bin/libs
Apps
VM
Apps
Container
Container
App
Apps
Bins/libs
Host OS
Hypervisor
Hardware
Hardware
The separation can improved security, hardware independence, and added resource
management features [16].
However, this operating system level virtualization are not as flexible as other virtualization approaches. For example, with Linux, different distributions are fine, but other
operating systems such as Windows cannot be hosted. [17]
Note: The Internet Engineering Task Force (IETF) have an ongoing analysis the challenges of using VM for NFV workloads and how containers can potentially address these challenges [18].
P a g e | 22
Using Ixias IxLoad to test HTTP connections/sec increasing from 2 vCPU to 6 vCPU achieves significant
performance improvements. When moving to 8 or beyond, the performance flattens or can even decrease.
P a g e | 23
Real-time concern
Multi-tenancy
Avoidance of interference of
multiple workloads in the
same platform
Dynamic provisioning
QoS guarantees
Temporal guarantees
In this line, one may think of the increasingly importance of network functions virtualization,
which we will take a closer look at in next session. Still there is an increasing need for controlling the temporal behavior of virtualized software, making their behavior more predictable.
P a g e | 24
NaaS
IaaS
NFVIaaS
As mentioned above there are different service offerings in cloud computing. Depending on
where cloud stack are split between tenant and provider defines the type of service offering
(X-as-a-Service). For the traditional IT systems the whole stack are managed by the owner,
whereas in e.g. PaaS the tenet are responsible for the top layers (data and application), and
the provider of the PaaS manage everything below. Compared to the Telco-cloud (IaaS) where
the tenet (for VNF) are responsible for the OS8 layer and above, and the provider of the IaaS
manage everything below. Figure 13 below illustrates how the different offerings of cloud
computing services are divided and managed.
This could various, in some cases the OS stack are split into two, as the figure below.
P a g e | 25
9.2 VNF
A Virtualized Network Function (VNF) is a functional element of the NFV architecture framework as represented on Figure 14 above. A VNF could e.g. be a Telco applications like OCS or
IMS component.
Note : When designing and developing the software that provides the VNF, VNF Providers may
structure that software into software components (implementation view of software architecture) and package those components into one or more images (deployment view of software
architecture). In the following, these VNF Provider defined software components are called VNF
Components (VNFCs).
P a g e | 26
The VNF design patterns and properties are described in [22], but will shortly described next.
The goal is to capture all practically relevant points in the design space.
Statefulness will create another level of complexity, e.g. a session (transaction) consistency has to be preserved and has to be taken into account in procedures such as
load balancing.
The data repository holding the externalized state may itself be a stateful VNFC in the
same VNF.
The data repository holding the externalized state may be an external VNF.
P a g e | 27
Auto scaling:
-
VNFM will trigger the scaling of VNF according to the defined rules setup for the VNF in
the VNFM. For VNFM to trigger the scaling the VNF Instance's state has to be monitored by tracking its events on an infrastructure-level and/or a VNF-level. Infrastructure-level events are generated by the VIM. VNF-Level events may be generated by the
VNF Instance or its EM. Auto scaling supports both horizontal and vertically scaling.
VNFM will be trigger the scaling of VNF through explicit request from VNF or its EM,
based on monitoring of VNFs VNFCs states. On-demand scaling supports both horizontal and vertically scaling.
Manual triggered scaling. OSS/BSS trigger scaling based on rules in NFVO or by human
operator triggers scaling through the VNFM.
Vertically scaling (out/in) allows adding/removing VNFC instance(s) that belong to the VNF.
Horizontal scaling (up/down) allows dynamic (adding/removing) resources from existing VNFC
that belong to the VNF.
P a g e | 28
P a g e | 29
10 NFV challenges
This section identify some of the VNF challenges in NFV, with suggested solutions and requirements.
Although NFV is a promising solution for CSP, it faces certain challenges that could degrade its
performance and hinder its implementation in the telecommunications industry. In this section, some of the NFV requirements and challenges, and proposed solutions are discussed. Table 3 summarizes this section.
Description
Virtualization security risks according to
functional domains:
1) Virtualization environment domain
(Hypervisor):
Unauthorized access or data leakage.
2) Computing domain:
Shared computing resources: CPU,
memory, disketc.
3) Infrastructure networking domain:
Shared logical-networking layer
(vSwitches).
Shared physical NICs.
Network isolation for tenants and application.
Computing performance
VNF interconnection
Portability
VNFs should be decoupled from any underlying hardware and software. VNFs
should be deployable on different virtual environments to take advantage of
virtualization techniques like live migrations.
VNFs should be easy to manage and migrate with existing legacy systems without losing the specification of a carriergrade service.
P a g e | 30
able to interact with legacy management systems with minimal effects on
existing networks. The NFVO must monitor network function performance almost in real time.
To achieve automated dynamic scaling
of the resources the design patterns of
Auto- or On-demand scaling should be
implemented. For a more manual approach the Scaling based on management request pattern is good enough.
P a g e | 31
11 Solution architecture
This section are proposing a solution architecture of for OCS as VNF in the Telco-cloud (NFVI)
and identify architectural impacts.
The OCS system has the following high level requirements
Front-End
Stateless9
Active/active
Communicate with BE+ELM via internal network for charging and monitoring.
Back-End
Stateful
Active/Standby
Communicate with FE+ELM via internal network for charging and monitoring.
Need an OAM network connection for admin and integration with Business Support System (BSS)
DB replication
Active/Standby
Need an OAM network connection for O&M and integration with Operation Support System (OSS).
We will start with describing a generic VNF cluster architecture for OCS and later deploy the
components therein.
P a g e | 32
FIGURE 20 : GENERIC VNF CLUSTER ARCHITECTURE WITH CONNECTION TO MULTIPLE SIG SUBNETS
There are two distinct types of VNF included in the cluster architecture: System Controllers
(SC) VNF and Payload (PL) VNF. The SC VNF are none-traffic nodes and handling system management/control of the PL VNF. The PL VNF are traffic nodes and connected to the signaling
network.
The figure above shows LB stretched over both types of VNF: SC and PL. In order to keep homogeneous networking design for all VNF in the NFVI, two SC for communicating externally
with both VRF enabled OM network (OM-SC) and emergency OM network (OM-CN) are used.
OM-SC is used for communicating with the BSS domain. The OM-CM is used for some auxiliary
functions as well, such as NTP sync, etc.
Independently on number of PL VNF included in the cluster, as a result of dimensioning output,
there are two dedicated PL VNF which are communicating externally through Signaling VRF.
Independently on the VNF type, there is an internal (backplane) network for intra-cluster
communication. Different types of VNF may have different number of internal networks required for their operation. The number of internal networks may vary between one and three.
However, one internal network is always present in any VNF. The backplane is also used for
backup/restore via the vSAN.
It is important to notice that current VNF Cluster Architecture is not limited to such design,
however, it is chosen due to the following reasons:
P a g e | 33
Traffic dimensioning results show that two Traffic VNF for OM and Signaling are enough to cope with the traffic load.
It is possible to increase number of traffic VFN for each type of traffic
during runtime (horizontal), although will most likely requires cluster
reboot for changes to take effect.
Another example of network connectivity is when all PL nodes are taking part in sending and
receiving traffic along with load balancing of processing capacity. This example is, as generic as
previous, shown in Figure 21.
.
LB
LB
VNF (PL)
VNF (PL)
VNF (PL)
VRF
VRF
VRF
VNF (PL)
VRF
Net: SIG A
VNF (SC)
VRF
VNF (SC)
VRF
Net: OM-SC
Net: OM-CN
FIGURE 21 : : GENERIC VNF CLUSTER ARCHITECTURE WITH ALL PL VNF CONNECTED TO THE SAME SUBNET AND
TAKING PART IN LOAD SHARING OF BOTH TRAFFIC AND PROCESSING CAPACITY
The BE and the ELM in OCS are not traffic nodes and will therefore not follow the generic design with separated SC and PL. They will be plane VNFs with external load balancer and VRF.
Note: ELM in this context is not the same as element manager (EM) in the NFV.
P a g e | 34
FIGURE 23 : DEPLOYING VIEW OF OCS BACK-END AND ELM AS VNFS ONE SITE
The reason for having two BE-VNF one each site is to provide redundancy (no SPOF). Most
likely just one BE-NVF per site is enough, but its unclear to me if the virtual routers could be
configured redundant for a single VNF at this point. Further research is needed.
P a g e | 35
Principally the splitting of a cluster is done by keeping the primary database on one site and
moving the standby database to another site. Furthermore, traffic handling nodes (FE) are divided between the sites. The external VNF load balancer for ELM and BE shall direct flows to
the active VNF that has the appropriate configured/learned state. Replication is done between
the active and standby databases. Figure 24 shows the geographically split cluster following a
high availability (HA)/disaster recovery (DR) model. Note: example has one BE-VNF per site.
The BE-VNF and FE-VNF are deployed in a single tenant virtual data center (vDC) on each site
for isolation of the served VM space, and provided access only with authentication controls,
shown below in Figure 25. The VNFs in the vDC should be able to temporarily scale vertically
resource assignments if it does not cause any temporal or spatial conflict. Carefully plan how
Allocation Models are placed in Provider vDCs [24]. The average latency between site A and B
must not exceed 10 ms during normal operation.
FIGURE 25 : TENANTS VIEW OF VIRTUAL DATA CENTERS EXECUTION WITH OCS VNFS10
10
Note that the figure is simplified, e.g. missing load balancer for the OAM network.
P a g e | 36
The business support system (BSS) communication via the OAM network should be encrypted.
The VNFs should take advantage of accelerated vSwitches and use NICs that are single-root I/O
virtualization (SR-IOV). The VNFC VM should not have more than 6vCPU, if needed - scaled
horizontal.
P a g e | 37
12 Future directions
This section points out open areas of research related to the NFV. They are additional to the
once earlier identified in this paper and have been identified during the research but not highlighted.
Network virtualization for cloud computing is focused at the different networking layers (layer
3, 2, or 1). For example, Layer 3 VPNs or Virtual Private Networks (L3VPN) are distinguished by
their use of layer 3 protocols (e.g., IP or MPLS) in the VPN backbone to carry data between the
distributed end points with possible heterogeneous tunneling techniques. QoS with respect to
network performance has to be guaranteed even when multiple users are sharing a specific infrastructure simultaneously. Some studies (such as [25]) have shown that performance can improve significantly if virtual machines are interoperated via a high-speed cluster interconnect.
There are some implementations that back up this idea based on the usage of InfiniBand [26],
providing improved networked latencies as compared to IaaS solutions based on Ethernet.
However, merging networking research with real-time design and scheduling techniques is an
open area of research.
The VNFM (Appendix C: MANO architecture) is attracting a lot of interest from all directions,
and while the ETSI standards are not yet set in stone and neither are the companies MANO
strategies and products [27]. As the NFV initiative has developed over the past several years,
the focus of attention in NFV management has shifted perceptibly. Initially, the VIM (e.g.,
OpenStack, VMware, etc.) at the bottom of the stack was the issue; then, the special requirements and complexities of Telco network orchestration kicked in and attention shifted up to
the NFVO at the top. So now, its the turn of the VNFM, which notwithstanding the awesome
responsibilities of the NFVO is the bit in the middle that often turns out to be the most
contentious.
Put very simply, the VNFM is responsible for the lifecycle management of the VNF under the
control of the NFVO, which it achieves by instructing the VIM. However, this is the big question: who is best placed to supply the VNFM? Vendors are taking different approaches to
VNFM development; they will need to be harmonized if carriers are to realize multi-vendor
NFV MANO. Is the mark of a good VNF supplier one that also provides its own VNFM? Is the
mark of a good MANO supplier one that can accommodate a VNF without a VNFM? Is the
mark of a good NFVI platform vendor one that takes away the need for a VNF supplier to even
develop a VNFM? There are likely many more angles to explore around the VNFM, but from a
CSPs perspective more research how to reduce the risk of multi-vendor NFV implementations
is an open area of research.
P a g e | 38
13 Conclusions
Despite that it has been a reality for some years, Cloud computing is a fairly new paradigm for
dynamically provisioning computing services (X-as-a-service). Located in data centers using virtualization technology allowing server consolidation and efficient resource usage in general. In
the last decades, important advances mainly at machine virtualization, network, and storage
levels have contributed to the wide spread usage and adoption of this paradigms in different
domains.
Real-time application domains are still behind in the full adoption of cloud computing, due to
their strong timing requirements and needed predictability guarantees. Merging cloud computing with real-time is a complex problem that requires real-time virtualization technology to
consolidate the predictability capabilities.
This paper has analyzed some of the problems and challenges for achieving real-time cloud
computing as a first step towards presenting an abstract map of the situation today, identifying the needed elements on different levels to make it happen. The presented concerns range
from the hypervisor structure and role, the different possible types of virtualization and their
respective performance, general resource management concerns, and the important role of
the network in the overall picture of virtualization technology. For the latter, this paper has described the OCS architecture, the different components and how they interact, and the realtime challenges that appear therein. A terminology mapping between cloud and real-time systems domains has been settled in order to connect both areas.
Furthermore, an overview of the technology and architecture for NFV that aims to revolutionize the telecommunication industry by decoupling network functions from the underlying proprietary hardware has been presented. Although NFV is a promising solution for CSP, it faces
certain challenges that could degrade its performance and hinder its implementation in the
telecommunications industry. Some of the NFV challenges, and proposed solutions have been
discussed. Lastly, a proposed solution architecture of for OCS as VNFs in the Telco-cloud (NFVI)
has been presented. Introducing a generic VNF design for the traffic nodes and pointing out
concerns related to the scalability with current backend architectural in a virtualization technology.
Besides all the advantages brought by NFV to the telecommunications industry, it faces technical challenges that might hinder its progress. Therefore, IT organizations, network enterprises, telecommunication equipment vendors, and academic researchers should be aware of
these challenges and explore new approaches to overcome them.
P a g e | 39
14 Discussions
It turned out that both NFVI and Cloud computing are two very large areas to study. In combination with how the OCS architecture are affected by this made the workload more or less
overwhelming. It has been difficult to keep a constant detail level and scoop, because for each
stone I took off, I found something new and exciting. In retrospect it would have been enough
with one area and gone into more depth or a more formal analyze and scooping method. I feel
that the OCS as VNF didnt get enough attention to cover all aimed aspects.
A better research before beginning had probably changed the scoop of this paper, e.g. during
the writing I identified that some of the architectural decisions has already been made for the
Telco-cloud (NFVI). I still hope this paper can give a new real-time perspective to the subject.
However, it has been extremely instructive and interesting area(s) to explore.
P a g e | 40
15 References
[1]
[2]
[3]
3. T. 29.078, "Customised Applications for Mobile network Enhanced Logic (CAMEL) Phase X; CAMEL Application Part (CAP)
specification," 08 01 2016. [Online]. Available: http://www.3gpp.org/DynaReport/29078.htm.
[4]
[5]
[6]
[7]
A. Syrewicze, "VMware vs. Hyper-V: Architectural Differences," 2013. [Online]. Available: http://syrewiczeit.com/vmwarevs-hyper-v-architectural-differences/.
[8]
C. Bradford, "Virtualization Wars: VMware vs. Hyper V: Which is Right For Your Virtual Environment?," 2014. [Online].
Available: http://www.storagecraft.com/blog/virtualization-wars-vmware-vs-hyper-v-which-is-right-for-your-virtualenvironment/.
[9]
N. Sharma, "Hyper-V and VMware vSphere Architectures: Pros and Cons," [Online]. Available:
http://www.serverwatch.com/server-tutorials/microsoft-hyper-v-and-vmware-vsphere-architectures-advantages-anddisadvantages.html.
[10] Z. H. Shah, "Windows Server 2012 Hyper-V," in Deploying Hyper-V Enterprise Server Virtualization Platform, 2013..
[11] L. P. M. T. Hasan Fayyad-Kazan, "Benchmarking the Performance of Microsoft Hyper-V server, VMware ESXi and Xen
Hypervisors," 2013. [Online]. Available: http://www.cisjournal.org/journalofcomputing/archive/vol4no12/vol4no12_5.pdf.
[12] VMware, "ESXi architecture," [Online]. Available: http://www.vmware.com/files/pdf/ESXi_architecture.pdf.
[13] H.-A. V. T. Intel Corp. [Online]. Available: http://www.intel.com/content/www/us/en/virtualization/virtualizationtechnology/intel-virtualization-technology.html.
[14] A. I. t. S.-I. T. Intel Corp. [Online]. Available: http://www.intel.com/content/dam/doc/application-note/pci-sig-sr-iovprimer-sr-iov-technology-paper.pdf.
[15] Microsoft, "Hyper-V Architecture," [Online]. Available: https://msdn.microsoft.com/enus/library/cc768520%28v=bts.10%29.aspx.
[16] M. V. N. F. D. R. Miguel G. Xavier, "Performance Evaluation of Container-based Virtualization for," [Online]. Available:
http://marceloneves.org/papers/pdp2013-containers.pdf.
[17] Wikipedia, "Operating-system-level virtualization," [Online]. Available: https://en.wikipedia.org/wiki/Operating-systemlevel_virtualization.
[18] IETF, "An Analysis of Container-based Platforms for NFV," [Online]. Available: https://tools.ietf.org/html/draft-natarajannfvrg-containers-for-nfv-01.
[19] Wikipedia, "Cloud computing," [Online]. Available: https://en.wikipedia.org/wiki/Cloud_computing.
[20] ETSI, "NFV - Use Cases," [Online]. Available:
http://www.etsi.org/deliver/etsi_gs/nfv/001_099/001/01.01.01_60/gs_nfv001v010101p.pdf.
[21] ETSI, "NFV - Architectural Framework," [Online]. Available:
https://www.etsi.org/deliver/etsi_gs/NFV/001_099/002/01.01.01_60/gs_NFV002v010101p.pdf.
P a g e | 41
[22] ETSI, "NFV - Virtual Network Functions Architecture," [Online]. Available: https://www.etsi.org/deliver/etsi_gs/NFVSWA/001_099/001/01.01.01_60/gs_nfv-swa001v010101p.pdf.
[23] OpenStack, "OpenStack-Foundation-NFV-Report," [Online]. Available: http://www.openstack.org/assets/telecoms-andnfv/OpenStack-Foundation-NFV-Report.pdf.
[24] VMware, "Allocation Models for Organizations using vCloud Director," [Online]. Available:
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1026290.
[25] N. R. &. J.-C. Ducom, "Recommendations for Virtualization Technologies in High Performance Computing," 2010. [Online].
Available: http://www3.nd.edu/~nregola/files/cloudcom2010.pdf.
[26] Infiniband. [Online]. Available: http://www.infinibandta.org/content/pages.php?pg=about_us_infiniband.
[27] D. Snow, "Networkmatter," 04 08 2014. [Online]. Available: https://networkmatter.com/2014/08/04/stuck-in-the-manowith-you-who-supplies-the-vnfm/.
[28] MustBeGeek, "Difference between vSphere, ESXi and vCenter," [Online]. Available:
http://www.mustbegeek.com/difference-between-vsphere-esxi-and-vcenter/.
[29] S. Lowe, "Mastering VMware Vsphere 5.5," [Online]. Available:
http://books.denisio.ru/VMware/Mastering%20VMware%20vSphere%205.5.pdf.
[30] VMWare, "Creating a vCloud NFV," [Online]. Available:
https://www.google.se/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&cad=rja&uact=8&ved=0ahUKEwjX89Gck83KAhXLa
3IKHWbnBZcQFggjMAE&url=http%3A%2F%2Fwww.vmware.com%2Ffiles%2Fpdf%2Ftechpaper%2Fvcloud-nfv-referencearchitecture2.0.pdf&usg=AFQjCNGZW9PlC7M5-kjH2Rkm2n4.
[31] B. Giri, "Mustbegeek," [Online]. Available: http://www.mustbegeek.com/difference-between-vsphere-esxi-and-vcenter/.
[32] Wiki, "Linux Containers," [Online]. Available: https://en.wikipedia.org/wiki/LXC.
A special thanks to the Mikael Gardh and Sven-Gunnar Nyberg (Opponents), Michael Thurell
(Study counselor Data Freningen) and Magnus Stomfelt (Symsoft - OCS Vendor).
P a g e | 42
16 Appendix A: VMWare
VMware vSphere is a software suite that has many software components such as vCenter,
ESXi, and vSphere client. VSphere is not a particular software that you can be install and use,
it is just a package name which has other sub components [28]. The core of the vSphere
product suite is the hypervisor, which is the virtualization layer that serves as the foundation
for the rest of the product line [29]. All the virtual machines or Guest OS are installed on ESXi
server. To install, manage and access those virtual servers which sit above of ESXi server, another part of vSphere suit called vSphere client or vCenter is needed [28].
P a g e | 43
P a g e | 44
We assume that the RTO is configures to 1 sec in the DCCA-server, because none of the beaks
are higher. And also can see that the average max response is ~100ms.
Now by matching the left diagram max response peak of 1000ms with right diagrams for number of request we can see a drop in at the corresponded time. But only at one of the peeks. So
by also looking at the bottom graph, showing average response time for CCR-I, we can see
P a g e | 45
something interesting. Average peek above 100ms are not causing any trouble in max but in
over 100ms is causing disturbance. By this we assume that the actual RTO should be 200ms
max and min 80ms.
The delta between the two graphs in bottom (average) shows the network latency between
the two geographical separated sites - ~10ms.