Académique Documents
Professionnel Documents
Culture Documents
Introduction
Building high performance production systems for
billing and mediation relies heavily on making the
right architectural choices needed to achieve the high
availability and system throughput demanded by such
systems. This paper examines some of the common
key functional areas of these processing engines with a
particular focus on data management strategies. We use
the Berkeley DB database engine, an open source devel-
oper database used in many carrier grade applications,
to illustrate how data management strategies
can be implemented.
Makers of Berkeley DB
2 MANAGING DATA WITHIN BILLING, MEDIATION AND RATING SYSTEMS
Contents
Introduction 1
Global Requirements 3
Data Throughput 3
Real-Time Data 3
Data Durability 3
Data Availability 3
Module-Secific Data Requirements 3
Reference Data 3
Live Data 3
Critical Design Considerations 5
Predictable Data Access 5
Simple Data Types 5
Service Orientation 5
Repetitive Access 5
Live Data Management Strategies 5
In-Memory Data Storage 5
Client-Server Data Storage 6
Bespoke Flat FIle Data Storage 6
In-Process Data Storage 6
Berkeley DB 6
Database Performance Comparison 7
Aggregating/Mediating CDRs 7
Real-Time Processing of CDRs 8
Rating/Fraud Detection 9
Compiling Reports of CDRs 9
Exporting Data 10
Deployment Considerations 10
Data Replication and Load Balancing 10
Data High Availability 11
Data Security 11
Database Maintenance 11
Summary 11
WHITE PAPER 3
Reference Data
Data Throughput
Reference data relating to the operational or business
The proliferation of new services, new billing methods
knowledge of an organization – such as subscriber
and new subscribers continues to drive data volumes
details, account histories or plans – is generally
higher throughout the system. Subscribers number in the
accessed infrequently and is of a relatively complex
millions and billable events are of the order of tens or
nature. Examples of reference data applications are
hundreds of millions, depending on the billing cycle.
rate plan management and subscriber relationship
Real-Time Data management. The frequency of changes/writes is
Support for pre-paid services in particular requires that relatively low (< 10,000-1,000,000 per day), and there
full end-to-end processing of billable services is possible are a high number of queries/data reads relative to data
in real time. Latency within a system architecture is often writes. Additionally, queries are more likely to be of an
a limiting factor on meeting real-time requirements. ad-hoc nature, and the associated database schema is
System latency can be more difficult to address than generally more complex.
system throughput as it is often a feature of the software
architecture and cannot be efficiently remedied by the
availability of additional hardware. The storage of reference data is usually well understood
and well suited to database management systems with
Data Durability
support for high level schema definition and ad-hoc
Data loss equals revenue loss for an operator. Systems
queries. Often, the choice of database is dictated by the
simply cannot afford to drop data under any conditions.
demands of specific operators, who may have existing
Data Availability software or who rely on SQL as the common denominator
Billing is a real time operation (particularly when sup- between applications for integration and interoperability.
porting pre-paid services) and must always be avail-
able. Billing requires rating information which, in turn,
Live Data
requires usage data. Any failure of any part of the sys-
Live data is data that has to be processed before it can
tem (e.g., rating engine) will result in failure of a whole
become useful business knowledge or reference data.
system or a failure to achieve the operator’s desired
Using a call data record (CDR) as an example, it is of
level of revenue assurance.
little use to the business without significant processing.
Once it has been collected, rated and presented as part
of a subscriber bill, it has distinctly more business value.
Live data is also vital business data that is accessed very
frequently in the course of the system operation, such
as subscriber information caches/lookups, pre-paid bal-
ance management, event/audit logs. By its nature, live
data exists in large amounts since it is the lifeblood of
the business.
4 MANAGING DATA WITHIN BILLING, MEDIATION AND RATING SYSTEMS
��� ������
������� �����
��� ������
������
������� ��
��� ������
��������� ����
���
���� ����
Database Performance
Comparison Write Time (ms)
# Records MySQL Berkeley DB JE Berkeley DB C++
1000 1168 492 10
To understand the magnitude of performance gains 2000 2261 680 20
5000 5673 1103 60
available through using Berkeley DB, it is important
7000 7156 1396 90
to understand its relative strengths. In a recent simple 10000 10167 1905 130
100000 100787 14624 1880
benchmarking exercise, a small data record of a similar 1000000 3167
size to a CDR (~60-80 bytes) was written and read using
Query Time (ms)
three database configurations: # Records MySQL Berkeley DB JE Berkeley DB C++ *
1000 9 1 0
2000 10 2 0
1. A C++ application using Berkeley DB 4.2 5000 15 5 0
2. A Java application using Berkeley DB 1.5 Java Edition (JE) 7000 17 8 10
10000 20 14 10
3. A Java application connecting to a relational data- 100000 142 138 80
base management system (in this case, MySQL 4.0.21 1000000 800
accessed via JDBC).
Comparative time to write data and perform a single query.
*The accuracy of the system clock (clock_t) on the Linux x86 test machine
was insufficient for measuring the time taken for small numbers of
queries using Berkeley DB C++. Note that Berkeley DB offers other con-
figurations (DB_TXN_NOSYNC, DB_TXN_WRITE_ NOSYNC) for even higher
performance where relaxing durability is a possibility.
8 MANAGING DATA WITHIN BILLING, MEDIATION AND RATING SYSTEMS
To correlate the events, one might employ a join to find store where access speed, isolation and durability are para-
log events where the trunk and dialed number match mount. Berkeley DB supports a configurable cache for data
each other and then pair the results to produce a single storage and can also be configured to run entirely in-memo-
billing object per call. For example: ry if needed. It is routinely deployed as part of server groups
using replication which process submitted events from a
Date Time Trunk Number Duration central event queue/buffer to perform balance updates or
25/12/04 12:01 1238523 00441234567 13:05 record mediation.
Berkeley DB uses a simple key-value model for indexing Berkeley DB supports a number of internal database struc-
data. Typically, the key value will be an index value cor- tures. The most common and general purpose of these is
responding to a column of a database table. Berkeley the b-tree. Berkeley DB also supports hash, queue and
DB supports and manages multiple sets of keys on the recno access methods internally. For real-time processing of
same table of values. In this example, assume the unique CDRs, a recno/queue indexed database operates as a very
primary key is the LogId. Additionally, we can use [Trunk efficient queue/buffer which is particularly suited to storing
+ Number] as a secondary key/index to the data so that and retrieveing fixed length records. CDRs can be pushed
multiple entries can be entered using the same duplicate into and popped out of the database which prevents data
secondary key. By using a simple cursor to the data, all loss or out-of-memory problems which can occur when real
the entries for a particular [Trunk + Number] combination time billing servers are subjected to “bursty” data traffic.
can be read very efficiently. Berkeley DB is frequently used as a high speed persistent
buffer in data critical and high throughput applications (e.g.,
SMS/MMS messaging and email gateways) to make
applications more resilient to high traffic loads and denial
Real-Time Processing of CDRs of service attacks.
��� ��
���� ����
������ ����� �������
������ �������� ��
������ ������� ������
����� �������
������ �������� ��
������ ������� �������
�����
����� �������
������ �������� ��
������ ������� �������
WHITE PAPER 11
Database Maintenance
Berkeley DB does not require regular maintenance - or
even a DBA - which greatly reduces the operational
expense of systems deployed using it. For this reason,
many of the best known names in the telecommunica-
tions business depend on Berkeley DB. A partial list
includes Cisco, Motorola, Amdocs, Ericsson, LogicaCMG,
Alcatel, Tellabs, Openwave, Jabber, Hitachi, Lucent and
AT&T. With over 200 million deployments, Berkeley DB
is the natural storage solution wherever data is on the
move – in the handset, at the switch, at the message
center, in the OSS and in the billing system. Berkeley
DB offers exceptional performance, zero maintenance
deployments and unmatched reliability.
12 MANAGING DATA WITHIN BILLING, MEDIATION AND RATING SYSTEMS
Sleepycat Software www.sleepycat.com makes Berkeley DB, the most widely used open source devel-
oper database in the world with over 200 million deployments. Customers such as Amazon.com, AOL,
British Telecom, Cisco Systems, EMC, Google, Hitachi, HP, Motorola, RSA Security, Sun Microsystems,
TIBCO and Veritas also rely on Berkeley DB for fast, scalable, reliable and cost-effective data manage-
ment for their mission-critical applications. Profitable since it was founded in 1996, Sleepycat is a
privately held company with offices in California, Massachusetts and the United Kingdom.
For further information, please contact Sleepycat by sending email to info@sleepycat.com or visiting
Sleepycat’s website at www.sleepycat.com
Sleepycat Software Inc. Sleepycat Software Inc. Sleepycat Europe Ltd. +1-978-897-6487
118 Tower Road 5858 Horton St. Suite 265 Coronation House, +1-877-SLEEPYCAT
Lincoln, MA 01773 Emeryville, CA 94608 Guildford Rd. (Toll-free, USA only)
Makers of Berkeley DB USA USA Woking, GU22 7QD
United Kingdom wp_billmed_0305