Académique Documents
Professionnel Documents
Culture Documents
Research to Production
Eric Bouillet1 Ravi Kothari3 Vibhore Kumar2 Laurent Mignet3 Senthil Nathan2
Anand Ranganathan2 Deepak S. Turaga2 Octavian Udrea2 Olivier Verscheure1
IBM Technology Campus, Damastown Industrial Estate, Mulhuddart, Dublin 15, Ireland1
Thomas J. Watson Research Center, IBM Research, 19 Skyline Drive, Hawthorne, NY 10532, USA2
IBM Research - India, 4 Block C, Institutional Area, Vasant Kunj, New Delhi - 110070, India3
{bouillet,verscheure}@ie.ibm.com
{vibhorek,sen,arangana,turaga,oudrea}@us.ibm.com {rkothari,lamignet}@in.ibm.com
ABSTRACT
Keywords
2.
1.
INTRODUCTION
In todays competitive market, telecommunications companies are in a race to dierentiate themselves from the competition and are striving hard to maintain their prot margins. The key to staying ahead of the competition, as is
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for prot or commercial advantage and that copies
bear this notice and the full citation on the rst page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specic
permission and/or a fee.
DEBS 12, July 1620, 2012, Berlin, Germany.
Copyright 2012 ACM 978-1-4503-1315-5 ...$10.00.
264
SOLUTION OVERVIEW
Master Script
Rules Compiler
Checkpoint
Controller
De-Duplication
Parallel or Serial Write
Config Files
CDR
Repository
External Data
CDR Statistics
3.
3.1
The key challenges in the implementation of the CDR mediation and analysis solution using IBM InfoSphere Streams
were around performance, scalability, latency, ease-of-use
and fault-tolerance. The following section briey describe
these challenges.
Performance
3.2
Scalability
3.3
Latency
265
Intra-region parallelism
Region-wise parallel
processing of CDRs
Figure 2: Snapshot of deployed application showing parallel chains and intra-region parallelism
in arrival of data, most of them can benet from a mediation system that can bring down the latency from a few
hours/days to a few seconds/minutes. These include applications like fraud management, customer experience management and some early fault-detection applications. When
implementing a medition solution using InfoSphere Streams,
while the letency introduced by ingest and processing of
CDRs was virtually eliminated, the real-time detection of
duplicate CDRs over large windows of time (e.g. 15 days =
90 Billion CDRs) and maintaining high-througput and low
latency for inserts into a warehouse was a challenge that we
had to address.
3.4
4.
4.1
Ease of Use
Telcos in many countries face the problem of not necessarily having highly skilled programmers or researchers to develop and manage highly performant applications for them.
They also often face the problem of employee churn. Hence,
a key requirement is that the application must be very easy
to deploy and manage by human operators, who may not be
familiar with high-performance computing or stream processing. Also, it should be easy for them to change certain
portions of the application as required, particularly the enrichment, transformation and lookup rules.
3.5
Fault-Tolerance
266
4.2
In-Memory Processing
ing application comes with its own domain-specic rules language. The rules along with enrichment and lookups are
specied in a separate le and this is the interface exposed
to operators for modications to the business logic that runs
as part of the CDR processing application.
4.3
4.7
4.4
4.8
5.
CONCLUSION
4.6
The primary interface for receiving CDRs from network elements is les, the sizes of which can vary widely (from a few
KB to GB). The log-replay based approach to fault-tolerance
exploits the fact that CDRs arrive in les. A checkpointing
mechanism keeps track of the CDRs that have been processed through the Bloom lters and ones which have been
committed to the warehouse. In case of a failure the application is restarted, appropriate checkpoints are loaded and
the required les are replayed to bring the system to a stable state. An interesting optimization that we are currently
implementing is the ability to restart only parts of the application (e.g. the parallel chain that contains the data-ow
operator that failed).
4.5
6.
REFERENCES
Rules Language
267