Experience Report-Processing 6 Billion CDRs-day - From Research To Production

Experience Report: Processing 6 Billion CDRs/day - from
Research to Production
Eric Bouillet1 Ravi Kothari3 Vibhore Kumar2 Laurent Mignet3 Senthil Nathan2
Anand Ranganathan2 Deepak S. Turaga2 Octavian Udrea2 Olivier Verscheure1
IBM Technology Campus, Damastown Industrial Estate, Mulhuddart, Dublin 15, Ireland1
Thomas J. Watson Research Center, IBM Research, 19 Skyline Drive, Hawthorne, NY 10532, USA2
IBM Research - India, 4 Block C, Institutional Area, Vasant Kunj, New Delhi - 110070, India3
{bouillet,verscheure}@ie.ibm.com
{vibhorek,sen,arangana,turaga,oudrea}@us.ibm.com {rkothari,lamignet}@in.ibm.com
ABSTRACT
H.4 [Information Systems Applications]: Miscellaneous
becoming evident in several other domains as well, lies in

the ability to derive timely actionable insights from massive
amounts of customer and operational data that is available
with the organization. For telecommunications companies
that predominantly oer wireless services, a large volume of
their data takes the form of call detail records (CDRs) and
an event processing system that is capable of ingesting and
analyzing these CDRs in real-time, as they are generated
by the network equipment, can provide valuable insights.
These insights can range from real-time billing to location
dependent marketing oers, to detecting, in real-time, the
issues being faced by the subscribers (e.g. dropped calls),
to expediting the detection and diagnosis of issues with the
network infrastructure.
This paper describes our experience with implementing
and deploying a novel CDR processing application using the
IBM InfoSphere Streams [2] middleware. The deployed realtime CDR processing application is a mediation and analytics solution capable of ingesting CDRs from a variety of
network elements, transforming and enriching such CDRs
in real-time, performing on-the-y aggregation and analytics on the CDR stream and nally loading the CDRs into a
warehouse for archival and for performing deeper analytics.
This paper attempts to capture the challenges and the specic solutions that were implemented for the real-time CDR
processing application.
Keywords
2.
Call Detail Records, Mediation, Real-Time Analytics, IBM

InfoSphere Streams
Call Detail Records are structured event records generated

within a wireless telecommunication network, by network
switches and elements, to summarize various aspects of individual connections for dierent types of services including
voice, Short Message Service, Multimedia Message Service
etc. Typical CDRs contain information about the call origin,
call destination, timestamp, call duration, sequence number
as well as additional information such as call status (busy,
drop, connected), fault conditions, number to be charged
etc. Capturing and processing all generated CDRs is central to support critical telecommunication service provider
applications including billing, revenue assurance (RA), and
fraud management (FM). Additionally, analysis of CDRs
can provide several insights into the state of the network, call
distribution, user behavior - all of which are necessary for
several business intelligence (BI) applications. These applications range from long term provisioning, load-balancing,
and system design all the way to several real-time services
A call detail record (CDR), is a data record produced by

a telephone exchange or other telecommunications equipment documenting the details of a phone call that passed
through the exchange or equipment. Telecommunications
companies (or telcos) use CDRs for purposes of billing, extracting business intelligence, fraud detection, etc. However,
they face a Big Data challenge many telcos get billions of
CDRs per day, and are unable to keep up with these data
rates. In this paper, we describe a stream processing solution for processing CDRs that allows scaling the processing
to handle 6 billion CDRs per day for a certain telco. We
describe the stream processing application (running on the
IBM InfoSphere Streams platform) that performs CDR mediation and analysis in real-time. We also describe various
business and operational constraints and the legacy software
ecosystem - seldom discussed in academic gatherings - that
make the problem more challenging than originally meets
the eye. The outcome of our work is a highly congurable
and scalable CDR processing stream with several functional
and performance capabilities that are a rst for the telecommunication industry.
Categories and Subject Descriptors
1.
INTRODUCTION
In todays competitive market, telecommunications companies are in a race to dierentiate themselves from the competition and are striving hard to maintain their prot margins. The key to staying ahead of the competition, as is
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for prot or commercial advantage and that copies
bear this notice and the full citation on the rst page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specic
permission and/or a fee.
DEBS 12, July 1620, 2012, Berlin, Germany.
Copyright 2012 ACM 978-1-4503-1315-5 ...$10.00.
264
SOLUTION OVERVIEW
Real-Time Aggregates & Dashboards
Master Script
File parsing &

error handling
Rules: Lookups &

Transforms
Rules Compiler
Checkpoint
Controller
De-Duplication
Parallel or Serial Write
File Ingest &

Parallelization
Config Files
CDR
Repository
External Data
CDR Statistics
Figure 1: Solution Architecture

for fault recovery, customer experience management, content
and location driven advertising, e-commerce applications,
and several other novel applications (e.g. social networking,
real-time transportation services etc.).
Mediation is the rst step in processing these CDRs, and
involves capturing CDRs from upstream network systems
and making them ready for processing by downstream applications (RA, BI, FM, warehousing). This is a complex
task composed of several steps that include:
management including support for application development

and extension, monitoring and end to end provenance, and
nally for real-time result visualization and validation. We
emphasize the key technical challenges associated with each
of these tasks that made it a complex, multi-disciplinary,
multi-month eort. We provide details of these challenges,
our design decisions, and implementation and results in the
following sections.
Collection: capture and ingest CDRs from various source

sub-systems
Validation and Filtering: identify relevant CDRs and
discard invalid or corrupted CDRs
Collation: correlate and aggregate all CDRs corresponding to one call
3.
3.1
Format Conversion and Normalization: parse binary

and proprietary formats to extract elds of interest
REQUIREMENTS AND CHALLENGES
The key challenges in the implementation of the CDR mediation and analysis solution using IBM InfoSphere Streams
were around performance, scalability, latency, ease-of-use
and fault-tolerance. The following section briey describe
these challenges.
Performance
The key requirement was the ability to process 6 billion

CDRs per day, which translates to around 70,000 CDRs
per second. However, various operational constraints meant
that the desired throughput was about three times that
value (i.e. around 220,000 CDRs per second). This is because of frequent power and infrastructure outages, sporadic
human operator errors in conguring the system, delays in
getting source data from switches, etc. These issues often
result in backlogs of unprocessed CDRs that need to be processed asap. Hence, the system had to be able to support a
higher throughput so as to overcome these issues.
Enrichment and Transformation: Apply business rules

to enrich and transform CDRs
De-duplication: Filter duplicate CDRs that may have
been injected into the data-ow by source sub-systems
Analysis and Summarization: Compute aggregates and
summaries of CDR data, and visualize results
Distribution: Transmit CDRs for further downstream
processing
3.2
Additionally, given that this is a business-critical task, there

are several requirements on the performance and the faulttolerance of a mediation system.
In this paper we describe the mediation and analysis system we built and deployed (in production at a large telecommunication provider) using the InfoSphere Streams platform. Specically we provide an overview of the architecture and components developed for these dierent functions
(Figure 1, as well as the appropriate systems support for
the required performance, and guarantees on failure recovery. We also describe our tooling for user interaction and
Scalability
Since the subscriber base of telcos is growing at huge rates,

another key requirement was the ability to scale easily in
the future when the number of CDRs per day increases.
For example, telcos frequently see growth rates of 10-20%
every year. And they would like their applications to scale
up seamlessly when the rate increases, potentially by just
adding more hardware.
3.3
Latency
While most of the applications that are downstream to

a mediation system are not very sensitive to small delays
265
Intra-region parallelism
Region-wise parallel
processing of CDRs
Figure 2: Snapshot of deployed application showing parallel chains and intra-region parallelism
in arrival of data, most of them can benet from a mediation system that can bring down the latency from a few
hours/days to a few seconds/minutes. These include applications like fraud management, customer experience management and some early fault-detection applications. When
implementing a medition solution using InfoSphere Streams,
while the letency introduced by ingest and processing of
CDRs was virtually eliminated, the real-time detection of
duplicate CDRs over large windows of time (e.g. 15 days =
90 Billion CDRs) and maintaining high-througput and low
latency for inserts into a warehouse was a challenge that we
had to address.
provide transactional guarantees for processing CDRs (i.e.

a CDR must be processed completely or not at all). This
means that the system should keep track of which CDRs are
at dierent stages of processing. If the system fails, it should
come back up and reprocess partially processed CDRs.
3.4
In order to achieve the requirement of high-performance

and scalability, one of the key features of the system is parallelizing the CDR processing. A natural way of parallelizing
the processing is by region, which represents a locationbased zone of operations for a telco. All the processing for
each region (CDR transformation, enrichment, deduplication, aggregates, etc) can happen in parallel. One of the
challenges, though, is that dierent regions have dierent
loads at dierent times. Hence, our application has various
strategies to balance the processing across dierent parallel
chains based on current loads. Figure 2 shows a snapshot
of one version of the application where the incoming CDRs
are split across 15 parallel chains of processing (one corresponding to each region). The degree of parallelism, based
on the number of regions, can be easily modied by means
of a conguration parameter. To address issues that may
arise due to load disparity between the various regions, the
application exposes a conguration parameter that allows
one to select the number of parallel sub-chains for each region. While such parallelization strategy seems obvious, the
fault-tolerance issues and the need to maintain the throughput lead to several interesting challenges.
4.
4.1
Ease of Use
Telcos in many countries face the problem of not necessarily having highly skilled programmers or researchers to develop and manage highly performant applications for them.
They also often face the problem of employee churn. Hence,
a key requirement is that the application must be very easy
to deploy and manage by human operators, who may not be
familiar with high-performance computing or stream processing. Also, it should be easy for them to change certain
portions of the application as required, particularly the enrichment, transformation and lookup rules.
3.5
KEY FEATURES OF THE SYSTEM
In this section, we describe some of the key features of the

CDR processing application that were developed to address
the challenges described above.
Fault-Tolerance
Telcos operate huge computing infrastructures and often

face problems such as frequent power and infrastructure outages and human operator errors in conguring the system.
Hence, there is a good chance that a long-running stream
processing application will fail at some point. Hence, a key
requirement is that the system must recover from dierent
kinds of failures and come back up. In addition, no data
should be lost when the system fails. There is a need to
266
Region-Based & Intra-Region Parallelism
(b) A drill-down showing the system call termination reason

for enterprisecustomers for cell-site 201 in last 1 hour.
(a) Showing the number of system and user terminated calls

for enterprise and non-enterprise customers in the last 1 hour.
Figure 3: Real-time visualization of aggregates determined from CDRs
4.2
In-Memory Processing
ing application comes with its own domain-specic rules language. The rules along with enrichment and lookups are
specied in a separate le and this is the interface exposed
to operators for modications to the business logic that runs
as part of the CDR processing application.
In order to improve throughput, an absolute requirement

is not to refer to any les or databases during processing
as far as possible. Hence, all lookup-tables, de-duplication
information, aggregates, etc are maintained in memory. This
does make things interesting for fault tolerance, though. We
also implemented a capability that allowed the operators to
hot-swap the lookup tables in a running application.
4.3
4.7
De-duplication using bloom lters
In order to handle the potential scale of de-duplication,

we chose to use Bloom lters [1] for detecting duplicates.
Bloom lters have excellent scaling and memory properties.
The main choice is in setting an appropriate false positive
threshold. Also, our system has mechanisms for periodically
checkpointing the bloom lters to recover from any software
or hardware failures.
4.4
4.8
Log-Replay based Fault-Tolerance
5.
CONCLUSION
In this paper, we briey presented a stream-processing

based system for processing CDRs. We discussed some of the
requirements, challenges and design decisions in the building
of the application and the supporting infrastructure.
Parallel Insertion into Database
The processed CDRs need to be nally inserted into a

database at very high rates. DB2 has a very useful partitioning feature that makes it possible to insert CDRs into
the same database table in parallel. In some instantiations
of the application, there are 216 parallel DB2 insertion operations running simultaneously.
4.6
Real-time Aggregates & Analytics
An interesting ability oered by Streams is the ability to

analyze data on-the-y, this includes calculation of aggregates over varying time windows (e.g. dropped calls in last
1 hour). The CDR processing application has access to enriched records even before they are inserted into the warehouse, this allows the application to calculate and maintain
several pre-congured aggregates in real-time. An adapter
to a dashboard is then used to visualize such aggregates and
monitor activity as it happens. Such aggregates not only
reduce the load on the warehouse but also enable early detection of anamolies. Figure 3 shows a sample dashboard
that makes use of aggreagtes from the CDR application.
The primary interface for receiving CDRs from network elements is les, the sizes of which can vary widely (from a few
KB to GB). The log-replay based approach to fault-tolerance
exploits the fact that CDRs arrive in les. A checkpointing
mechanism keeps track of the CDRs that have been processed through the Bloom lters and ones which have been
committed to the warehouse. In case of a failure the application is restarted, appropriate checkpoints are loaded and
the required les are replayed to bring the system to a stable state. An interesting optimization that we are currently
implementing is the ability to restart only parts of the application (e.g. the parallel chain that contains the data-ow
operator that failed).
4.5
Master Script and Conguration Files
In order to improve the usability and manageability of the

application, we developed a master script that allows onecommand operations to orchestrate all the moving parts of
the system. Also, all conguration parameters of the application are exposed out in one le. Hence, the operators
need not have to change the actual SPL (Stream Processing
Language) application ever.
6.
REFERENCES
[1] Bloom Filter.

http://en.wikipedia.org/wiki/Bloom_filter, 2012.
[Online; accessed 30-May-2012].
[2] IBM InfoSphere Streams. http://www-01.ibm.com/
software/data/infosphere/streams/, 2012. [Online;
accessed 30-May-2012].
Rules Language
To simplify the specication and modication of rules by

operators who may not be highly skilled, the CDR process-
267

Experience Report-Processing 6 Billion CDRs-day - From Research To Production

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Experience Report-Processing 6 Billion CDRs-day - From Research To Production

Transféré par

Droits d'auteur :

Formats disponibles

Experience Report: Processing 6 Billion CDRs/day - from

H.4 [Information Systems Applications]: Miscellaneous

becoming evident in several other domains as well, lies in

Call Detail Records, Mediation, Real-Time Analytics, IBM

Call Detail Records are structured event records generated

A call detail record (CDR), is a data record produced by

Categories and Subject Descriptors

Real-Time Aggregates & Dashboards

File parsing &

Rules: Lookups &

File Ingest &

Figure 1: Solution Architecture

management including support for application development

Collection: capture and ingest CDRs from various source

Format Conversion and Normalization: parse binary

REQUIREMENTS AND CHALLENGES

The key requirement was the ability to process 6 billion

Enrichment and Transformation: Apply business rules

Additionally, given that this is a business-critical task, there

Since the subscriber base of telcos is growing at huge rates,

While most of the applications that are downstream to

provide transactional guarantees for processing CDRs (i.e.

In order to achieve the requirement of high-performance

KEY FEATURES OF THE SYSTEM

In this section, we describe some of the key features of the

Telcos operate huge computing infrastructures and often

Region-Based & Intra-Region Parallelism

(b) A drill-down showing the system call termination reason

(a) Showing the number of system and user terminated calls

Figure 3: Real-time visualization of aggregates determined from CDRs

In order to improve throughput, an absolute requirement

De-duplication using bloom lters

In order to handle the potential scale of de-duplication,

Log-Replay based Fault-Tolerance

In this paper, we briey presented a stream-processing

Parallel Insertion into Database

The processed CDRs need to be nally inserted into a

Real-time Aggregates & Analytics

An interesting ability oered by Streams is the ability to

Master Script and Conguration Files

In order to improve the usability and manageability of the

[1] Bloom Filter.

To simplify the specication and modication of rules by

Vous aimerez peut-être aussi