DIPLOMARBEIT Stochastic Purchase Models For Noncontractual Consumer Relations Ausgef HRT

DIPLOMARBEIT
Stochastic Purchase Models for

Noncontractual Consumer Relations
ausgeführt am Institut für
Handel und Marketing

der Wirtschaftsuniversität Wien
unter Anleitung von
a.o. Univ-Prof. Dr. Thomas Reutterer
durch
Michael Platzer
Speisingerstrasse 76/1
1130 Wien
Matr.Nr. 9650359
Datum Unterschrift
Stochastic Purchase Models for
Noncontractual Consumer Relations
Michael Platzer
michael.platzer@gmail.com
MASTER THESIS AT THE

VIENNA UNIVERSITY OF ECONOMICS
AND BUSINESS ADMINISTRATION
OCTOBER 2008
Abstract
The primary goal of this master thesis is to evaluate several well-established

probabilistic models for forecasting customer behavior in noncontractual set-
tings on an individual level. This research has been carried out with the
particular purpose of participating in a lifetime value competition, that has
been organized by the Direct Marketing Educational Foundation throughout
fall 2008.
We undertake an in-depth exploratory analysis and visually summarize the
key characteristics of the dataset at hand. Subsequently we apply the Pare-
to/NBD (Schmittlein et al., 1987), the BG/NBD (Fader et al., 2005a) and
the CBG/NBD (Hoppe and Wagner, 2007) model on the data. But because
the data seems to violate the Poisson assumption, a prevalent assumption
regarding the random nature of the transaction timings, the existent mod-
els under investigation produce rather dissatisfactory results. A fact, that
becomes apparent as we show, that a simple linear regression model outper-
forms the probabilistic models for the DMEF data.
As a consequence the author develops a new variant based upon the CBG/NBD
model, namely the CBG/CNBD-k model. This model, that is being intro-
duced here for the first time, considers a certain degree of regularity in the
observed interevent times by modeling Erlang-k interpurchase times, and as
such, is able to deliver considerably better predictions for the dataset at
hand. Out of 25 teams, that submitted their forecasts on an individual level
at the competition, the author finished at second place, only marginally be-
hind the winning model. A result, that shows that under certain conditions
this new variant is able to outperform numerous other existent, in particular
stochastic, models.
Keywords: marketing, consumer behavior, lifetime value, stochastic predic-
tion models, customer base analysis, Pareto/NBD
i
Contents
Abstract i
1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Problem Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Presented Models . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 DMEF Competition 6
2.1 Contest Details . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 Game Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3 Exploratory Data Analysis 12

3.1 Key Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2 Distributions on Individual Level . . . . . . . . . . . . . . . . . 14
3.3 Trends on Aggregated Level . . . . . . . . . . . . . . . . . . . . 16
3.4 Interpurchase Times . . . . . . . . . . . . . . . . . . . . . . . . 20
4 Forecast Models 22
4.1 NBD Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
ii
CONTENTS iii
4.1.1 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.1.2 Empirical Results . . . . . . . . . . . . . . . . . . . . . . 26
4.2 Pareto/NBD Model . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.2.1 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.3 BG/NBD Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.3.1 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.4 CBG/NBD Model . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.4.1 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.5 Heuristic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.6 LM25 + NN25 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.7 Variants of BG/NBD . . . . . . . . . . . . . . . . . . . . . . . . 32
4.8 CBG/CNBD-k Model . . . . . . . . . . . . . . . . . . . . . . . . 32
4.8.1 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.9 Model Comparison . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.10 Estimation of Monetary Component . . . . . . . . . . . . . . . 33
5 Conclusion / Discussion 35
A Derivation of CBG/CNBD-k 36
A.1 Overview of Used Methods . . . . . . . . . . . . . . . . . . . . . 36
A.2 Overview of Used Stochastic Distributions . . . . . . . . . . . 36
A.3 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
A.4 Erlang-k . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
CONTENTS iv
A.5 Individual Likelihood . . . . . . . . . . . . . . . . . . . . . . . . 38

A.6 Aggregate Likelihood . . . . . . . . . . . . . . . . . . . . . . . . 40
A.7 Parameter Estimation . . . . . . . . . . . . . . . . . . . . . . . 41
A.8 Probability Distribution of Purchase Frequencies . . . . . . . . 42
A.9 Probability of Being Active . . . . . . . . . . . . . . . . . . . . 43
A.10 Expected Number of Transactions . . . . . . . . . . . . . . . . 45
A.10.1 Unconditional Expectation for Condensed Poisson . . . 46
A.10.2 Unconditional Expectation for Grouped Poisson . . . . 47
A.10.3 Expectations for Condensed NBD . . . . . . . . . . . . 48
A.10.4 Unconditional Expectation for CBG/CNBD-k . . . . . 48
A.10.5 Conditional Expectation for CBG/CNBD-k . . . . . . 49
A.11 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . 50
B R Source Code 51
B.0.1 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . 51
B.0.2 EDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
B.1 NBD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
B.2 Pareto/NBD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
B.3 BG/NBD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
B.4 CBG/CNBD-k . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
B.5 Data Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
B.6 Usage Example for CDNOW . . . . . . . . . . . . . . . . . . . 52
Bibliography 53
Chapter 1
Introduction
1.1 Background
Over 80% of those companies, that participated in a German study on the
usage of information instruments in retail controlling (Schröder et al., 1999)
regarded the application of customer lifetime value as useful. But less than
10% actually had already a working implementation at that time. No other
consumer related information, for example customer satisfaction, penetration
or sociodemographic variables, showed such a big discrepancy between as-
sessed usefulness and actual usage. Therefore accurate lifetime value models
will become, despite but also because of their inherent challenging complex-
ity, a crucial information advance in highly competitive markets.
Typical fundamental managerial questions, that arise, are (Schmittlein et al.,
1987; Morrison and Schmittlein, 1988):
• How much is my current customer base worth? “Customer Equity?”

• Which sales volume can I expect from my clientele in the future?
• How many customers are still active customers? Who has already, resp.
who is likely to defect?
• Who will be my most, resp. my least profitable customers?
• Who should we target with a specific marketing activity?
• How much of the sales volume has been attributed to such a marketing
activity?
1
CHAPTER 1. INTRODUCTION 2
And a key part for finding answers to those questions is the accurate assess-
ment of lifetime value, on an aggregated as well as on an individual level.
Hardly any organization can afford making budget plans for the upcoming
period without making careful estimations regarding the income side. Such
estimates, on the aggregate level, are therefore widely common and numer-
ous different methods, ranging from simple managerial heuristics to advanced
time series analysis exist. Fairly more challenging is the prediction of future
sales broken down between trial and repetitive customers. And, considering
how little information we have on an individual level, an undoubted demand-
ing task is making accurate forecasts for each single client.
Nevertheless, the ongoing uprise of computerized transaction systems and
the drop in data storage costs, that we have seen over the past decade,
provides more and more companies with customer databases coupled with
large records of transaction history (“Who bought which product at what
price at what time?”). But the sheer data itself is useless, unless models
and tools are implemented, that condense the desired characteristics, trends
and forecasts out of the data. Such tools are commonly provided as part
of Customer Relationship Management software, that enable organizations
to act and react individually to each customer. An ability, that takes the
heterogeneity in one’s customer base into account, and that subsequently
allows an organization to further optimize marketing activities and their
efficiency1 . And one essential information bit for CRM implementations is
the (monetary) valuation of an individual customer (Rosset et al., 2003).
1.2 Problem Scope

The primary focus of our work is the evaluation and implementation of sev-
eral probabilistic models for forecasting customer behavior in noncontractual
settings on an individual level. This research has been carried out with the
main goal of participating in a lifetime value competition, that has been
organized by the Direct Marketing Educational Foundation in fall 2008.
The limits of the research scope of this thesis are fairly well defined by the
main task of the competition, which is, to assess the future purchase amount
for an existent customer base on a disaggregated level based upon transac-
1
Clustering a customer base into segments can be seen as a first step in dealing with
heterogeneity. But One-to-One marketing, as it is described here, is the consequent con-
tinuation of this approach.
tion history. As such, we will not provide a complete overview of existing

lifetime value models (see Gupta et al. (2006) for such an overview), but will
rather focus on models, that can make such accurate future predictions on
an individual level.
Due to the large portion of one-time purchases and the long time span of
the data, we have to use models, that can also incorporate the defection
of customers besides modeling the purchase frequency. Additionally we are
faced with noncontractual consumer relations. A characteristic that is widely
common in commerce, but adds some complexity to the forecasting tasks,
since we have no information regarding the status of a consumer relation.
Neither now nor later. I.e. we do not know whether a specific customer is
still active or whether she has already defected. In a contractual setting2 ,
such as the client base of a telecommunication service provider for example,
we know exactly when a customer cancels her contract, and is therefore lost
for good3 . In a noncontractual setting, such as shop visitors, air carrier
passengers or donators for a NPO, we can not observe the current status of a
customer relation (i.e. it is a latent variable), but rather rely on other data,
such as the transaction history to make proper judgements. Therefore we
will limit our research to models, that can handle this kind of uncertainty.
Further, because the dataset only provides transaction records4 , we will put
the emphasis on models, that extract the most out of the transaction his-
tory and do not rely on incorporating other covariates, such as demographic
variables, competition activity or other exogenous variables.
1.3 Presented Models

Table 1.1 displays an overview of the probabilistic models, that are being
evaluated and applied upon the competition data within this thesis.
We start out by investigating the ground-breaking work by Ehrenberg, who
proposed the Negative Binomial Distribution (NBD) as early as 1959 as
a model for repeated buying in section 4.1. Further we will evaluate the
2
Also known as subscription-based setting.
3
Models that explicitly model churn rates are, among others, logistic regression models
and survival models. See Rosset et al. (2003) and Mani et al. (1999) for examples of the
latter kind of models.
4
Actually it also includes detailed records of direct marketing activities, but we neglect
this data, as such data is not available for the target period. See section 2.3 for a further
reasoning.
Model Author(s) Year

NBD Ehrenberg 1959
Pareto/NBD Schmittlein, Morrison, Colombo 1987
BG/NBD Fader, Hardie, Lee 2005
CBG/NBD Hoppe, Wagner 2007
CBG/CNBD-k Platzer 2008
Table 1.1: Overview of presented models.
well-known Pareto/NBD model (section 4.2), and two of its variants, the
BG/NBD (section 4.3) and the CBG/NBD (section 4.4) model, which are all
extensions of the NBD model, but make additional assumptions regarding
the defection process, and its heterogeneity among customer. In order to
get a feeling for the forecast accuracy of these probabilistic models, we will
subsequently also benchmark them against a simple linear regression model.
Finally a new model will be introduced in section 4.8, namely the CBG/CNBD-
k model, that has been developed as a variation of the CBG/NBD model by
the author. This variant makes differing assumptions regarding the timing
of purchases, in particular it considers a certain degree of regularity, and as
such will improve forecast quality considerably for the competition dataset.
Detailed derivations for this model are provided in appendix A.
1.4 Use Cases

But before diving into the very details of the models at hand, we try to
spike the reader’s motivation by providing some common usage scenarios of
noncontractual relations with repeated transactions. Use cases, that have
been already studied in various articles, and which should give an idea of the
broad field of applications for such models.
• Customers of the (former) online music store CDNOW (Fader et al.,

2005a). This dataset is also publicly available at http://brucehardie.
com/notes/008/, and has been used in numerous other articles (Abe,
2008; Hoppe and Wagner, 2007; Batislam et al., 2007; Fader et al.,
2005b; Fader and Hardie, 2001; Wübben and von Wangenheim, 2008)
to benchmark the quality of various models.
• Clients of a financial service broker (Schmittlein et al., 1987).

• Members of a frequent shopper program at a department store in Japan

(Abe, 2008).
• Consumers buying at a grocery store (Batislam et al., 2007). Individual

data can be collected by providing client-cards, that are being combined
with some sort of loyalty program.
• Business customers of an office supply company (Schmittlein and Pe-

terson, 1994).
• Clients of a catalog retailer (Hoppe and Wagner, 2007).
But actually whenever “a customer purchases from a catalog retailer, walks

off an aircraft, checks out of a hotel, or leaves a retail outlet, the firm has no
way of knowing whether and how often the customer will conduct business
in the future” (Wübben and von Wangenheim, 2008, p. 82). And as such
the usage scenarios are practically unlimited.
One other example from the authors own business experience is the challenge
to assess the number of active users of a free webservice, such as a blogging
platform. Users can be uniquely identified by a permanent cookie stored in
the browser client, when they access the site. Each posting of a new blog
entry could be seen as a transaction, and therefore these models could also
provide answers to questions like “How many of the registered users are still
active?” and “How many blog entries will be posted within the next month
by each one of them?”.
This thesis hopefully sheds some light on how to find accurate answers to
questions like these.
Chapter 2
DMEF Competition
2.1 Contest Details

The Direct Marketing Educational Foundation1 (DMEF) is a US based non-
profit organization, with the mission “to attract, educate, and place top
college students by continuously improving and supporting the teaching of
world-class direct / interactive marketing”2 . It is an affiliate of the Direct
Marketing Association Inc.3 , and is founder and publisher of the “Journal of
Interactive Marketing”4 .
The purpose of the competition is ”to compare and improve the estimation
methods and applications for [lifetime value and customer]“, that “have at-
tracted widespread attention from marketing researchers [..] over the past 15
years” (May, Austin, Bartlett, Malthouse, and Fader, 2008). The participat-
ing teams have been provided with a data set from a “leading US nonprofit
organization” (that is not being named) containing “detailed transaction and
contact history of a cohort of 21.166 donors, that it acquired during the first
half of 2002” (May et al., 2008), over a period of 4 years and 8 months. The
transaction records include a unique donor ID, the timing and the amount
of each single donation, together with a (rather cryptic) code for the type
of contact. The contact data include records of each single record, together
with the contacted donor, the timing and the implied costs of that contact.
1
cf. http://www.directworks.org/
2
http://www.directworks.org/About/Default.aspx?id=386, retrieved on 9 Oct
2008
3
cf. http://www.the-dma.org/
4
cf. https://www.directworks.org/Educators/Default.aspx?id=220
6
CHAPTER 2. DMEF COMPETITION 7
The first phase of the competition consisted of three separate estimation

tasks for a target period of two years:
1. Estimate the donation sum on an aggregated level.
2. Estimate the donation sum on an individual level.
3. Estimate which donor, that had his last donation before 1st of Septem-
ber 2004, will be donating at all.
The prize sum for winning any of these three tasks is $500, plus the winning
teams are “invited to write a short note describing the winning model [...],
which will be published in the ‘Journal of Interactive Marketing’ ” (May
et al., 2008)5 .
Task 1 results in a single figure, whereas for task 2 & 3 a data file containing
donor IDs and the according estimates had to be submitted. Winning task
1 is rather a simple guessing game, considering the number of participants,
and the irregular fluctuations throughout the training period. Therefore our
main focus has been on task 2, whereas a guess for task 3 is being derived
straightforward from our calculations for task 2.
An error measure is defined for all 3 tasks by the contest organizing commit-
tee, and the submitted calculations by the participating teams are evaluated
regarding these measures. Closeness on an aggregated level (task 1) is simply
defined as the absolute deviation from the actual donation amount. The error
measure for task 3 is given straight by the percentage of correctly classified
cases. The error measure for task 2 is to some extent rather uncommon, and
is defined as the mean squared logarithmic error:
X
MSLE = (log(yi + 1) − log(ŷi + 1))2 /21.166, (2.1)
i
with the 1 added to avoid taking the logarithm of 0, and with 21.166 being the
size of the cohort. More common distance measures for evaluating forecasts
on an individual level are the mean absolute deviation (MAE), the root mean
squared error (RMSE), also the root median of squared errors (Wübben and
von Wangenheim, 2008), or simply the correlation between the estimated and
actual data (Fader et al., 2005a). Hoppe and Wagner (2007, p. 85) used the
geometric mean relative absolute error (GMRAE), which is a relative measure
5
It is noteworthy that Pete Fader, author of the BG/NBD model, which is being
presented in the following, is a member of the contest organizing committee.
compared to some other particular benchmark model (in their article they
benchmarked against the NBD model) and also allows comparison between
different data set.
The author assumes that the mean squared logarithmic error has been cho-
sen over the root mean squared error, as it is less sensitive regarding large
values, which are only generated by a very small portion of cases, and as
such rather emphasizes on an accurate guess for the dominant low-purchase
class. But, as we will show later on in the simulation studies, the MSLE
favors forecasts, that systematically underestimate. This becomes also ap-
parent, when considering that in case of a 50%√chance of y = 0 and a 50%
chance of y = 1 the MSLE is minimized by ŷ = 2 − 1 = 0.414, as opposed to
ŷ = 0.5 · 0 + 0.5 · 1 = 0.5 for minimizing RMSE. For the competition we tried
to take advantage of this particular characteristic of the MSLE.
The deadline for submitting calculations for phase 1 (task 1 to 3) has been
Sep 15, 2008. The results for the 25 participating teams were announced
couple of weeks afterwards, and have been discussed at the DMEF’s Research
Summit in Las Vegas6 .
2.2 Data Set

The data set contains records of 53,998 donations for 21,166 distinct donors,
ranging from Jan 2, 2002 until Aug 31, 2006. Whereas, in compliance with
the definition of the cohort, each donor made at least a single donation during
the first half of 2002. Each donation record includes the unique donor id, the
date, the amount in USD and the type of contact, that is associated with
this transaction. See table 2.1.
Additionally detailed contact records for each donor of the cohort are pro-
vided, together with their according costs. These 611,188 records range from
Sep 10, 1999 until Aug 28, 2006. Each contact record contains the donor id,
the date, the type of contact and the associated costs for the contact. See
table 2.2.
According to May et al. (2008) ”the full data set, including 1 million cus-
tomers, 17 years of transaction and contact history, and contact costs, will
be released for general research purposes”, and should become available at
https://www.directworks.org/Educators/Default.aspx?id=632. The com-
6
cf. http://www.researchsummit.org/
id date amt source

8128357 2002-02-22 5 02WMFAWUUU
9430679 2002-01-10 50 01ZKEKAPAU
9455908 2002-04-19 25 02WMHAWUUU
9652546 2002-04-02 100 01RYAAAPBA
9652546 2003-01-06 100 02DEKAAGBA
9652546 2004-01-05 100 04CHB1AGCB
.. .. .. ..
13192422 2005-02-11 50 05HCPAAICD
13192422 2005-02-16 50 05WMFAWUUU
Table 2.1: Transaction Records

id date source cost
9652546 2000-07-20 00AKMIHA28 0.2800000
9430679 2000-07-07 00AXKKAPAU 0.3243999
9455908 2000-07-07 00AXKKAPAU 0.3243999
11303542 2000-07-07 00AXKKAPAU 0.3243999
11305422 2000-01-14 00CS31A489 0.2107999
11261005 2000-01-14 00CS31A489 0.2107999
.. .. .. ..
11335783 2005-09-01 06ZONAAMGE 0.4068198
11303930 2005-09-01 06ZONAAMGE 0.4068198
Table 2.2: Contact Records
petition data set therefore represents only a small subset of the complete
available data, that has been provided by the NPO after the competition.
2.3 Game Plan

Before starting out with the model building an in-depth exploratory analysis
of the data set is performed, in order to gain insight of its key characteristics.
Various visualizations provide an comprehensive overview of these, and will
help understand certain features throughout the modeling process.
As mentioned above, our main emphasis has been on winning task 2, i.e. on
finding the “best” forecast model, that will subsequently provide the lowest
MSLE for the target period. But of course no data for the target period has
been available before the deadline of the competition, and therefore we have
to split the provided data into a training period and a validation period. The
training data is used for calibrating the model and its parameters, whereas
the validation data enables us to compare the forecast accuracy among the
models. By choosing several different lengths of training periods, as has also
been done by Schmittlein and Peterson (1994), Batislam et al. (2007) and
Hoppe and Wagner (2007) for example, we can further improve the robustness
of our choice. After picking a certain model for the competition, the complete
provided data set is used for the final calibration of the model.
Despite the fact, that a strong causal relation between contacts and actual
donations can be assumed, we will not include the contact information into
our model building. The main reason is, that such data is not available
for the target period, and also can not be reliably estimated. Therefore we
implicitly assume that direct marketing activities will have a similar pattern
as in the past, and simply neglect this information. A similar assumption is
done regarding all other possible exogenous influences, such as competition,
advertisement, public opinion, etc., due to the lack of information.
The probabilistic models under investigation all try to model the purchase
opportunity7 as opposed to the actual purchase amount. Assuming indepen-
dence between purchase amount and purchase rate, resp. defection rate, we
will simply estimate the average amount per customer in a separate step (see
section 4.10), on top of the estimated number of future purchase.
Providing an estimate for task 3 is directly derived from task 2, as we assume,
that any customer with an estimated number of purchases of 0.5 or higher
will actually make a purchase. Task 1 could also be deduced from task 2 by
simply building the sum over all individual estimates.
All of our following calculations and visualizations are carried out on top
of the statistical programming environment R8 (R Development Core Team,
2008), which is freely available, well documented, widely used in academic
research, and further provides a large repository of additional libraries. Un-
fortunately the presented probabilistic models are (not yet) part of an ex-
istent library, and therefore we carry out the programming of these models
by ourselves, based upon the presented analytical results of the referred ar-
ticles. Together with the published estimates regarding the CDNOW data
set9 within those articles, we are able to verify the correctness of our imple-
7
From time to time we will refer to donations as purchases, and to donors as consumers
or clients, as these are the general terms, used in marketing literature.
8
Version 2.7.2
9
http://brucehardie.com/notes/008/
mentations.
Chapter 3
Exploratory Data Analysis
In this chapter an in-depth descriptive analysis of the contest data set is

undertaken. Several key characteristics are being outlined and concisely vi-
sualized in according graphs. These will provide valuable insight into the
succeeding model fitting process in chapter 4.
3.1 Key Summary
Nr of donors 21,166
Cohort time length 6 months
Available time frame 4 years 8 months
Available time units days
Nr of zero repeaters: absolute; relative 10,626; 50.2%
Nr of repetitive donations: mean; sd1 ; max 1.55; 2.93; 55
Donation amount: mean; sd; max $39.31; $119.32; $10,000
Time between donations: mean; sd; max 296days; 260days; 1626days
Time until last donation: mean; sd 460days; 568days
Table 3.1: Descriptive Statistics
The data set consists of a rather large2 , heterogenous cohort of donors.

Heterogeneity can be observed in the donation frequency, in the donation
amount, in the time laps between succeeding donations and as well in the
overall recorded lifetime.
2
in comparison to the CDNOW data set for example
12
CHAPTER 3. EXPLORATORY DATA ANALYSIS 13
On the one hand, the majority (50.2%) did not donate at all after their initial
donation, on the other hand, some individuals donated up to 55 donations.
The amount per transaction ranges from as little as a quarter of a dollar
up to $10,000, with the standard deviation being 3 times larger than its
mean. These simple statistics already make it clear, that any model, that is
being considered to fit the data, should be able to account for such kind of
heterogeneity.
It can also be noted, that the covered time span of the records is quite long
(as is the target period of 2 years). This implies, that people that are still
active at the end of that 4 year and 8 month period are rather loyal, long-
term customers. But it also means that assuming stationarity regarding the
underlying mechanism (resp. model parameters) might not hold true.
Various Timing Patterns
11382546 | | | | |
11371770 | | | | | | | | | | | | | | | |
11359536 | | |
11343894 | |
11329984 |
Donor ID
11317401 |
11303989 |
11292547 | |
11281342 | | | | | | |
11270451 |
11259736 |
10870988 ||||||||||||||||||||||||||||||||||||||||||||
2002 2003 2004 2005 2006

Time Scale
Figure 3.1: Timing patterns for 12 randomly selected donors.
An important feature of the data set is that donation (as well as contact)
records are given with their exact timing, and are not aggregated to longer
time spans, nor are condensed to simple frequency numbers. Therefore we can
and should use the information of the exact timing of the donations for our
further analysis. A first ad-hoc visualization (see figure 3.1) of 12 randomly
selected donors already displays some of the differing characteristic timing
patterns, ranging from single-time (e.g. ID 11259736), over sporadic (e.g.
ID 11359536) up to regular donors, that obviously already defected (e.g. ID

10870988). Therefore besides heterogeneity in donation frequency, we might
also focus on models, that can account for some sort of defection process.
3.2 Distributions on Individual Level
Distribution of Nr of Donations
12000
50.2%
10000
8000
Nr of Donors
6000
4000
16.9%
10.8%
2000
7.6%
6.3%
3.9%
2.6% 1.6%
0
1 2 3 4 5 6 7 8+
Nr of Donations
Figure 3.2: Distribution of number of donations.
Besides the aforementioned 50,2% of single-time donors another large share

of individuals (42%, see figure 3.2) donated fewer than 6 times, i.e. in average
about once a year or less. And only as little as 8% of the customer base (in
total 1733 people) can be considered regular donors, with 6 or more dona-
tions. Yet these 8% of the donors account for over half of the transactions
(51,5%) in the last year of the observation period.
It it is important to point out, that a low number of recorded donations can
be the result of two different causes. Either it really stems from a (very) low
donation frequency, or from the fact, that people defected, i.e. turned away
from the NPO and do not intend to donate at all anymore. An upcoming
challenge will be to distinguish these two mechanism within the data.
The recorded donation amounts span a wide value range, starting from as
Distribution of Donation Amounts

0.30
25
0.25
10
0.20
Relative Frequency
0.15
50
20
0.10
15
5 100
0.05
0.00
0.25 1 2 3.5 6 10 18 32 57 110 235 500 1200 3000 10000
Donation Amount − logarithmic scale
Figure 3.3: Distribution of donation amounts.
little as a quarter dollar up to a single generous donation of $10,000. Regard-

ing the overall distribution of donation amount a visual inspection of figure
3.3 indicates that, to some degree, the amounts follow a log-normal distri-
bution3 , whose values are restricted to certain integers. 89% of the 53,998
donations are accounted by some very specific dollar amounts (namely $5,
$10, $15, $20, $25, $50 and $100). The other donation amounts seem to play
a minor role. Though special attention should be directed to those few large
donations, as the 3% of donations, which exceed 100 dollars, actually sum
up to 30% of the overall donation sum.
Looking for a possible relation between the average amount of a single dona-
tion and the number of donations per individual in figure 3.4, we can state,
that single time donors as well as very active donors (7+) tend to spend a lit-
tle less money per donation act. A result that seems plausible, as single time
donors rather “cautiously try out the product”, and heavy donors spread
their overall donation over several transactions. Nevertheless, the observed
correlation between these two variables is minimal, and will be neglected
by us in the following. Note: The widths of the drawn boxes in the chart
3
The dashed gray line in the chart represents a kernel density estimation with a broad
bandwidth.
Conditional Distribution of Donation Amounts

100
80
Average Donation Amount
60
40
20
0
1 2 3 4 5 6 7 8+
Nr of Donations
Figure 3.4: Conditional distribution of average donation amounts vs. number of

donations.
are proportional to the square roots of the number of observations in the

according groups.
3.3 Trends on Aggregated Level

Possible existent trends in the data are now being analyzed on an aggregated
level. Most of the following charts in this section share the same structure.
The connected line represents sums over quarters of a year, and the horizontal
lines are averaged over 4 of these quarters. We have chosen to smooth over
quarters instead of charting a day-to-day time series in order to reduce the
noise. The displayed percentage changes indicate the change from one year
to the next. But note, that these averages cover the second half of one year,
and the first half of the next year. This shifted year average has been chosen,
since the covered time range of the competition data ends slightly after the
second quarter in 2006.
Charting the evolution of overall donation sums (figure 3.5) already displays
various interesting properties. First of all it is apparent, that donations show
Donation Sum
4e+05
3e+05
2e+05
1e+05
0e+00
+8% −24% −3%
2002 2003 2004 2005 2006 2007
Time
Figure 3.5: Overall trend in donation sum.
a sharp decline right after the second quarter in 2002. An effect that is
plausible, if we recall that our cohort actually consists by definition of new
donors who donated within the first half of 2002, and that a large portion of
donors just donated a single time. Further we can state that the data shows
a strong seasonal fluctuation, with the third quarter being the weakest, and
the fourth and first quarter being the strongest periods, each showing about
twice as many donations than the third quarter. It also seems that we have a
downward trend in donation sums, but by looking at the percentage changes,
we see that the speed of this trend is unclear. In the beginning we even record
an 8% increase, then a sharp 24% drop, which is followed by a moderate 3%
decrease over the last year. Task 1 of the competition is to estimate the future
trend of these aggregated donation sums for the next two years. Considering
the erratic movements this is quite a challenge.
The overall donation sum is the result of the multiplication of the number
of donations with the average donation amount. Charting the trends in
these two variables separately in figure 3.6 provides some further insight.
Regarding the number of donations, the seasonality, with the peak around
the Christmas holidays, is also apparent and seems plausible. The continuous
downward trend (-13%, -15%, -14%) in the transaction numbers is quite
Nr of Donations Avg Donation Amount
50
8000
40
30
4000
20
10
−13% −15% −14% +24% −10% +12%
0
0
2002 2004 2006 2002 2004 2006
Time Time
Figure 3.6: Trend in number of donations and average donation amount.
stable, and as such predictable. A simple heuristic could, for example, assume
a constant decreasing rate of 14% for the next two years. As has been
noted in the preceding section this downward trend can either be the result
from a decreasing donation frequency for each donor, or from an ongoing
defection process. Figure 3.7 indicates that rather the latter is the case. The
share of active donors is steadily decreasing4 , whereas the average number of
donations per active donor is slightly increasing.
Percentage of Donors Average Nr of Donations

that have donated within that year per active Donor
0.5
2.0
0.4
1.51 1.55
1.46
1.5
1.42
27.8% 29.5%
0.3
23.5%
1.0
18.8%
0.2
0.5
0.1
0.0
0.0
2002 2003 2004 2005 2002 2003 2004 2005
Time Time
Figure 3.7: Trend in activity.
Regarding the erratic movement of the overall sum we can conclude, due to
the stable decline of donation numbers, that this results from the up and
4
Note, that we do not count the initial donation, as otherwise the share for 2002 would
simply be 100%.
downs in the average donation amount. This chart (on the right hand side
of figure 3.6) also surprisingly shows seasonal fluctuation, and has no clear
overall trend at all, making it hard to extrapolate into the future.
Donation Sum Contact Costs

4e+05
25000
2e+05
10000
0e+00
+8% −24% −3% +25% −16% −33%
0
2002 2003 2004 2005 2006 2007 2002 2003 2004 2005 2006 2007
Time Time
Nr of Contacts Avg Contact Cost

0.6
50000
0.4
20000
0.2
−3% −30% −7% +22% +19% −24%

0.0
0
2002 2003 2004 2005 2006 2007 2002 2003 2004 2005 2006 2007
Time Time
Figure 3.8: Trend in contacts.
One possible explanation might be contained in the contact records, which

have been provided by the organizing committee together with the transac-
tion details. Each donation is linked to a particular contact, but certainly not
each contact necessarily resulted in a donation. Therefore a natural assump-
tion is, that the amount (the number and the costs) of the contact have a
strong influence on the donation sums. Figure 3.8 strongly supports such an
assumption. We can see the same seasonal variation in the contact activities,
regarding the number as well as average costs. Furthermore the increase in
donation sums in 2003/2004 can now be justified by a tremendous increase
of 25% in contact spending during that period. On the other hand the NPO
has been able to cut costs in 2005/2006 by 33% (mostly due to a 24% drop in
average contact costs) without hurting the generated contributions. But, as
has been argued before in section 2.3, despite the obvious relation between
contacts and donations, we can not take any advantage out of it for our fore-
cast tasks, as we have absolutely no information regarding the future contact
activities.
3.4 Interpurchase Times
Overall Distribution of Interpurchase Times

rescaled to Months
3500
3000
2500
2000
Count
1500
1000
500
0
0 3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51
Nr of months in between donations
Figure 3.9: Interpurchase times.
The disaggregated availability of transaction data on a day-to-day base allows

us an inspection of the observed intertransaction times, i.e. the lapsed time
between two succeeding donations for an individual5 . Figure 3.9 depicts the
overall distribution, rescaled to months, in order to reduce some noise in
the chart. The distribution contains two peaks, the first being one month
intervals, the second being one year intervals. Further we see that only very
few times (1.4%) donations occur within a single month. It seems that there
is a dead period of about a month, until someone is willed to make another
donation. It is also interesting to note, that in 5% of the cases we have a
waiting period of more than 24 months, and that there are even values higher
than 4 years. This is a sign, that some customers remain inactive for a very
long period, but subsequently decide to make a single donation again. This
characteristic will make it hard to model the defection process correctly, as
it seems, that some long-living customers just never defect, but are rather
“hibernating” and can be reactivated anytime.
Figure 3.10 shows that light and frequent donors have a differing distribution
5
Also commonly termed as “interpurchase times” or “interevent times”.
of intertransaction times, with the former one donating about every year, and
the latter one donating regularly each month. As we will see, this particular
observed regularity will play a major role in the upcoming modeling phase.
Intertransaction Times for light Donors (2, 3 or 4 Donations)

300
Yearly Donations (~8%)

Count
8814 Donors , 18352 Donations

150
0
0 76 178 292 406 520 634 748 862 976 1103 1243 1383 1524
Nr of days in between donations
Interpurchase Times for frequent Donors (5 or more Donations)
Monthly Donations (~10%)

Count
400
1733 Donors , 14480 Donations

0
0 76 178 292 406 520 634 749 870 994 1126 1385
Nr of days in between donations
Figure 3.10: Intertransaction times split by frequency.

Chapter 4
Forecast Models
4.1 NBD Model
4.1.1 Assumptions
As early as 1959 Andrew Ehrenberg1 published his ground breaking arti-

cle “The Pattern of Consumer Purchase” (Ehrenberg, 1959), in which he
suggested the Negative Binomial Distribution (abbr. NBD) as a fit to ag-
gregated count data of sales of non-durable consumer goods2 . Ever since
then Ehrenberg’s paper has been cited numerous times in marketing litera-
ture, and various models have been derived upon his work, proving that his
assumptions are reasonable and widely applicable.
Besides the sheer benefit, that a well fitting probability distribution is found,
Ehrenberg further provides a logical justification for choosing that particular
distribution within his paper. He reasons that each consumer purchases
according to a Poisson process, and that the average purchase rate varies
among consumers according to a gamma distribution3 . Now, the Negative
Binomial Distribution is exactly the distribution that arises from such a
Gamma-Poisson mixture. Table 4.1 summarizes the postulated assumptions.
1
See http://www.marketingscience.info/people/Andrew.html for a brief summary
of his major acchievements in the field of marketing science.
2
I.e. a discrete distribution is proposed that should fit the data represented in figure
3.2 on page 14.
3
Actually he assumed a χ2 -distribution in his original article, but this is simply a special
case of the more general gamma distribution.
22
CHAPTER 4. FORECAST MODELS 23
A1 The number of transactions follows a Poisson process

with rate λ.
A2 Heterogeneity in λ follows a gamma distribution with

shape parameter r and rate parameter α across cus-
tomers.
Table 4.1: NBD Assumptions
In order to support the reader’s understanding of the postulated assumptions

visualizations of the aforementioned distributions are provided in figure 4.1,
4.2 and 4.3 for various parameter constellations.
The Poisson distribution is being characterized by the fact, that its mean
λ equals its variance. Further, it can be shown that assuming Poisson dis-
tributed transaction counts for an individual is equivalent to assuming, that
the lapsed time between two succeeding transactions follows an exponential
distribution. I.e. the Poisson process with rate λ is the according count pro-
cess for a timing process with independent exponentially distributed waiting
times with mean 1/λ (Chatfield and Goodhardt, 1973).
The exponential distribution itself is a special case of the gamma distribution,
with the shape parameter being equal to 1 (see the middle chart in figure
4.3). An important property of exponentially distributed random variables
is, that it is “memoryless”. I.e. any information regarding the time laps since
the last event, does not change the probability of an event occurring within
the immediate future.
P (T > s + t | T > s) = P (T > t) for all s, t ≥ 0.
Mathematically such a property might be appealing as it simplifies some

derivations, but applied on our use case, this implies that the timing of
a purchase does not depend at all on the timing of the last purchase. A
conclusion, that is quite contrary to common intuition, which would rather
suggest that non-durable consumer goods are purchased with certain regu-
larity. If a consumer buys for example a certain good, such as a package of
detergent, she will wait with her next purchase until that package is nearly
consumed. But, even more troublesome, the memoryless property further
implies, that the most likely time for another purchase is immediately after
0.4 Negative Binomial Distribution
0.4
0.4
r=1 r=1 r=3
0.3
0.3
0.3
p = 0.4 p = 0.2 p = 0.5
0.2
0.2
0.2
0.1
0.1
0.1
0.0
0.0
0.0
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9
Figure 4.1: Probability mass function of the Negative Binomial Distribution for
different parameter values.
Poisson Distribution
0.4
0.4
0.4
0.3
0.3
0.3
lambda = 0.9 lambda = 2.5 lambda = 5

0.2
0.2
0.2
0.1
0.1
0.1
0.0
0.0
0.0
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9
Figure 4.2: Probability mass function of the Poisson Distribution for different
parameter values.
Gamma Distribution
0.5
0.5
0.5
shape = 0.5 shape = 1 shape = 2

0.4
0.4
0.4
rate = 0.5 rate = 0.5 rate = 0.5

0.3
0.3
0.3
0.2
0.2
0.2
0.1
0.1
0.1
0.0
0.0
0.0
0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10
Figure 4.3: Probability density function of the Gamma Distribution for different
parameter values.
a purchase has occurred4 (Morrison and Schmittlein, 1988, p. 148).

For other fields of application, such as the decay of radioactive particles,
the occurrence of accidents or the arrival of customers in a queue, where
the Poisson distribution all fit very well too, the memoryless property does
withstand basic face validity checks. For example, it seems intuitive, that the
particular arrival time of one customer in a queue is absolutely independent
of the arrival of the next customer, as they both do not interact with each
other. In other words, the fact, that just a customer has arrived, does not
change the timing of arrival for the next one. Therefore we have a memoryless
process.
As has been argued above, this is not the case for purchases of non-durable
consumer goods for an individual customer. The regularity of consumption
of a good does lead to a certain degree of regularity regarding its purchase.
Ehrenberg has been aware of this effect (Ehrenberg, 1959, p. 30), but only
requires in his original paper, that the observed periods should therefore not
be “too short, so that the purchases made in one period do not directly affect
those made in the next” (Ehrenberg, 1959, p. 34).
Assumption A2, regarding the heterogeneity of purchase rates, postulates
a gamma distribution across customers. Considering the different possible
shapes of this two-parameter continuous probability distribution, it is safe to
state, that such an assumption adds quite some flexibility to the model. But
other than the added flexibility and the fact that it is positively skewed, no
reasoning for taking that particular distribution is being provided.
Nevertheless, by explicitly modeling heterogeneity, Ehrenberg applies a pow-
erful trick. He utilizes information of the complete customer base for model-
ing on an individual level. He thereby takes advantage of the well-established
“regression to the mean phenomenon”. “[We] can better predict what the
person will do next if we know not only what that person did before, but
what other people did.” (Greene, 1982, p. 130, cited according to Hoppe
and Wagner, 2007, p. 80). In other words: “While there is not enough
information to reliably estimate [the purchase rate] for each person, there
will generally be enough to estimate the distribution of [it] over customers.
[..] This approach, estimating a prior distribution from the available data, is
usually called an empirical Bayes method” (Schmittlein et al., 1987, p. 5).
Concluding, despite a possibly violated assumption A15 , and a somewhat ar-
4
This can also be seen from the middle chart of figure 4.3, as the density function
reaches its maximum for value zero.
5
See the histograms in Herniter (1971, p. 104) for some empirical evidence.
bitrary assumption A2 regarding the shape of heterogeneity the Negative Bi-

nomial Distribution fits empirical market data very well (Wagner and Taudes,
1987, p. 16).
4.1.2 Empirical Results
We will now apply the NBD model on the data set from the DMEF compe-
tition. First, we will estimate the according parameters, then check how well
the model fits the data on an aggregated level, and finally we will calculate
individual estimates.
Ehrenberg suggests an estimation method for the parameters α and r, that
only requires the mean number of purchases m and the proportion share of
non-buyers p0 (Ehrenberg, 1959). Yet, with modern computational power it
is no problem at all anymore, to apply a Maximum Likelihood Estimation
(abbr. MLE) for these parameters for the cohort size at hand. The MLE
method tries to find those parameter values, for which the “likelihood” of
the observed data is maximized, and has the favorable property of being an
asymptotically unbiased, asymptotically efficient and asymptotically normal
estimator.
The calculation of the likelihood for the NBD model requires two pieces of
information for each individual: The length of observed time T , and the
number of transactions x within interval (0, T ]. The time span T differs
for each donor, resp. customer, as the date of the first transaction has taken
place anytime within the first half of 2002. Also note, that x does not include
the initial transaction, as that transaction occurred by definition for each
person of our cohort. As we will see the upcoming models will also require
another piece of information, namely the recency for each customer, i.e. the
timing tx of the last recorded transaction. With this notation we closely
follow the variable conventions used in Schmittlein et al. and Fader et al..
Generally, the summary for each customer consisting of recency, frequency
and a monetary value is often referred to as RFM variables, and is commonly,
not just for probabilistic models, the condensed data base of many customer
base analyses. The layout of the transformed data can be depicted from
table 4.2.
The MLE estimation procedure applied on the transformed data results in
id x tx T amt
10458867 0 0 1605 25.42
10544021 1 728 1602 175.00
10581619 7 1339 1592 80.00
.. .. .. .. ..
9455908 0 0 1595 25
9652546 4 1365 1612 450
9791641 4 1488 1687 275
Table 4.2: DMEF data converted to RFM
the following parameter estimates
r = 0.475 = shape parameter, and

α = 498.5 = rate parameter,
with both parameters being highly significant unequal to zero. The general
shape of the resulting gamma distribution can be depicted from the left
chart of figure 4.3, i.e. it is reversed J-shaped. This implies that the mass
of donors have a very low donation frequency, with the mode being at zero,
the median being 0.00042 and the mean being 0.00095. In terms of average
intertransaction times this result reflects an average time period of 1048
days (=2.9 years) between two succeeding donations, and with half of the
donors donating less often than every 2406 days (=6.6 years). Considering
that the majority of donors has not re-donated at all during the observation
period (see section 3.2), these long intertransaction times are obviously a
consequence of the overall low observed donation frequencies.
But, how well does our model now represent the data? To answer this ques-
tion, we compare the actual observed donation counts with their according
theoretical counts, calculated by the NBD model. Table 4.3 contains the
result.
0 1 2 3 4 5 6 7+
Actual 10,626 3,579 2,285 1,612 1,336 548 348 832
NBD 10,617 3,865 2,183 1,379 918 629 439 1,135
Table 4.3: Comparison of actual vs. theoretical count data for the complete time
span.
As we can see, despite the irritating distribution of λ, we have nearly a perfect

fit for the large share of non-repeaters. Yet, the estimated group sizes of the
more frequent donors drift apart for higher counts, indicating that the model
is not able to fit our data structure well.
Let’s now take a look at the predictive accuracy of the NBD model on an in-
dividual level, which is, considering the DMEF competition, our main focus
when evaluating models. We therefore split the overall observation period of
4 years and 8 months, into a calibration period of 3.5 years and a validation
period of one year6 . By considering only the first 3.5 years for model cali-
bration, slightly different parameter estimates (r = 0.53, α = 501) are being
returned than before. We will now calculate a conditional estimate for each
individual for a one year period, based on their observed frequency x and
their observed time span T . Table 4.4 contains the actual and the estimated
average number of donations during the validation period, split by the ac-
cording number of donations during the training period. I.e. those people,
that did not donate at all within the first 3.5 years, donated in average 0.038
times in the following year, whereas the NBD model only predicted an av-
erage of 0.001 donations. On the other hand, as can be seen, the future
donations of the frequent donors are being vastly overestimated. Overall on
an aggregated level, the NBD model estimates 11,088 number of donations,
which is nearly twice as much as the actual number (6,047).
0 1 2 3 4 5 6 7+
Actual 0.038 0.196 0.428 0.686 0.747 1.061 1.540 2.442
NBD 0.001 0.423 0.844 1.266 1.687 2.109 2.530 4.676
Table 4.4: Comparison of actual vs. theoretical avg. number of donations during
the validation period.
One possible explanation for the poor performance of the NBD model is the
long overall time period, in combination with the assumption, that all donors
remain active. The upcoming section will present a model, that explicitly
takes a possible defection of customers into account.
4.2 Pareto/NBD Model
4.2.1 Assumptions
CONTINUE
6
We will only consider validation periods being multiples of one year, in order to cir-
cumvent modeling the strong seasonal influence, that has been detected in section 3.3
The Pareto/NBD model has received growing attention among researchers

and managers within recent years7 ,
The Pareto/NBD model is derived based on the following assumptions8 :
A1 While active, the number of transactions follows a Pois-

son process with rate λ.

shape parameter r and rate parameter alpha across cus-
tomers.
A3 Customer lifetime is exponentially distributed with death

rate µ.
A4 Heterogeneity in µ follows a gamma distribution with

shape parameters s and rate parameter β across cus-
tomers.
A5 The purchasing rate λ and the death rate µ are dis-

tributed independently of each other.
Table 4.5: Pareto/NBD Assumptions
TODO: Estimation Methods
4.3 BG/NBD Model
4.3.1 Assumptions
TODO: Plot Beta-Verteilung;

7
See Fader et al. (2005a, p. 275)
8
For consistency reasons the ordering and wording of the above assumptions is changed
compared to the originating paper in order to ease comparison with the other models
presented here.


tomers.
A3 Directly after each purchase there is a constant proba-

bility p that the customer becomes inactive. Therefore,
dropouts follow a central geometric distribution with
dropout probability p.
A4 Heterogeneity in p follows a beta distribution with pa-

rameters a and b across customers.
A5 The transaction rate λ and the dropout probability p are

distributed independently of each other.
Table 4.6: CBG/NBD Assumptions
4.4 CBG/NBD Model
4.4.1 Assumptions
The CBG/NBD is a modified variant of the BG/NBD model. It has been

developed by Daniel Hoppe and Udo Wagner and published in 2007 in “Mar-
keting - Journal of Research and Management” (Hoppe and Wagner, 2007).
This variant makes similar assumptions, but inserts an additional dropout
opportunity at time zero and by that resolves the rather unrealistic conclu-
sion of the BG/NBD model, that all customers that have not (re-)purchased
at all after time zero, are still active. Hoppe/Wagner also showed that their
modification results in an overall better fit to the publicly freely available
CD-Now dataset, that has been used in (Fader et al., 2005a).
Besides providing a variant of the BG/NBD Hoppe/Wagner additionally de-
rived their mathematic key expressions in their paper by focusing on counting
processes, instead of timing processes (as Fader/Hardie/Lee did) and thereby

could reduce the inherent complexity in the derivations significantly.
Around the same time as Hoppe/Wagner worked on their model Batislam,
Denizel and Filiztekin developed the very same variation of the BG/NBD,
termed it MBG/CBG (the M stands for ”modified”) and published it in
the International Journal of Research in Marketing (Batislam et al., 2007).
Within this thesis we chose to use the abbreviation CBG/NBD instead of
MBD/NBD when we refer to this kind of variant, as the C has a deeper
meaning. (FIXME wording)


tomers.
A3 At time zero and directly after each purchase there is a

constant probability p that the customer becomes in-
active. Therefore, dropouts follow a central geometric
distribution with dropout probability p.
A4 Heterogeneity in p follows a beta distribution with pa-

rameters a and b across customers.
A5 The transaction rate λ and the dropout probability p are

distributed independently of each other.
Table 4.7: CBG/NBD Assumptions
Assumptions A1, A2, A4 and A5 are identical to the corresponding assump-

tions of the BG/NBD model. And even A3 is only slightly modified, and
now allows an immediate defect of a customer at time zero. Note, that the
same constant probability p is used for this additional dropout opportunity.
4.5 Heuristic
CITE wuebben regarding similar results
4.6 LM25 + NN25
4.7 Variants of BG/NBD

TODO: * long-living customers; CITE wuebbens discussions section, that
tries to explain why this happens;
4.8 CBG/CNBD-k Model
4.8.1 Assumptions
TODO: * timing pattern - worst 10 * regularity measure (Wheat and Morri-

son, 1990) * see figure 4.4; Each horizontal line represents a donor, and each
vertical dash the timing of a donation;
TODO * note in smc paper * Haight: s.c.d. = generalized poisson, a.c.d. =
Morse-Jewell distribution (HAIGHT, 1965) * But Consul/Jan reused used
”generalized poisson” differently (Consul and Jain, 1973); mixed poisson;
compound poisson; * condensed poisson, CNBD (Goodhardt/Chatfield) (Chat-
field and Goodhardt, 1973) * common belief: using Erlang-2 does not im-
prove forecasts * Gap characterization vs Counting characterization (see
(HAIGHT, 1965)) * plot several distributions as IPT: normal, lognormal,
erlang, maxwell(=weibull?),..
Timing Patterns for the Timing Patterns for the

20 worst overestimated Donors 20 worst underestimated Donors
| | |||||||||||||||||||||||||||||| | | | | || |||||| | | |||| | ||||| ||

| |||| || |||| | || |||| ||||| | | || | | |||||||||||||
|||||||||||||||| |||||||||||| | || | | | ||||||||||
| | | | || | | | | ||| | | | | | | | | | | || | |||||||||||||||
|| | || || | | ||||||| ||| | | | | | | |||||||||||||||
| | | | | | ||| | ||||||||||| | | | | | | |||||||||||||||
|||||||||||| ||||||||||||||||||||||| | | | | | | | ||||||||||||||
||||||||||||||||||||||||||||||||| | | | |||||| |
| ||| |||| | | | | | || | | || | | | || |||||||
| | | | || | ||| | | | | | | || | | || |
|| | |||| | | | || | | | | |||| |
| | | | | ||| | | | | | | | | | | || || | || |
| | | || | | | || | | | | | | || | |
|| | || | | | || | ||| | | | |||
||| | | | | | | | | | | || ||
| | | | | | | || || | ||| | | | | | | | || || | ||||||||
| | | | || || | | | | | | | | | | || |
| | || || | ||| | | | | || || |||| | | | | | || | | | | |||| | | | |
|| || | || | | || | | | || | || | | || | || | || ||||
| ||| | || | | | | | | | | | | || || | || |
Training Period Validation Period Training Period Validation Period
Figure 4.4: Timing patterns of those donors, for whom the BG/NBD model
produced the worst estimates for the validation period.
4.9 Model Comparison

TODO: * tabular overview of assumptions? * all models implicitly as-
sume stationarity * performance comparison regarding CDNOW; * error-
measures?
4.10 Estimation of Monetary Component

2002 2003 2004 2005 2006

Min. 0.25 0.25 1.00 1.00 1.00
1st Qu. 10.00 10.00 10.00 10.00 15.00
Median 25.00 25.00 25.00 25.00 25.00
Mean 36.06 39.64 43.32 44.91 44.57
3rd Qu. 25.00 35.00 45.00 50.00 50.00
Max. 5000.00 5000.00 10000.00 5000.00 5000.00
Table 4.8: Trend in Donation Amount.

??
Chapter 5
Conclusion / Discussion
TODO: * competition results * Discussion of the Results * Future Research *

CNBD can be applied for all kind of existing models where NBD is currently
used; * Gamma Gamma mixture?
35
Appendix A
Derivation of CBG/CNBD-k
A.1 Overview of Used Methods
X∞
(a)j (b)j z j
2 F1 (a, b; c; z) = , c 6= 0, −1, −2, . . . ,
j=0
(c)j j!
Γ(a + j)
(a)j = .
Γ(a)
A.2 Overview of Used Stochastic Distribu-

tions
A.3 Assumptions
A1 While active, transactions of customers occur with Erlang-k (rate pa-
rameter λ) distributed waiting times.
A2 Heterogeneity in λ follows a gamma distribution.
A3 At time zero and directly after each transaction there is a constant

probability p that the customer becomes inactive.
A4 Heterogeneity in p follows a beta distribution.
36
APPENDIX A. DERIVATION OF CBG/CNBD-K 37
A5 The transaction rate λ and the dropout probability p vary independently

across customers.
A6 The observation period of each individual starts out with a transaction

at time 0.
These assumptions differ from the CBG/NBD model only regarding the mod-
ified assumption A1, and the newly introduced assumption A6.
A.4 Erlang-k
The Erlang-k distribution with parameters k and λ is defined by the proba-
bility density
1
fΓ (t|k, λ) = λk tk−1 e−λt ∀t > 0; k ∈ N+ , λ > 0. (A.1)
(k − 1)!
The Erlang-k is a specialization of the more general Gamma distribution,

with the restriction of k being an integer. If k = 1 then we are dealing with
the exponential distribution again.
The Erlang-k distribution can also be seen as the sum of k i.i.d. exponentially
distributed random variables with parameter λ. Therefore the corresponding
counting process of events with Erlang-k distributed waiting times can be
deduced from the Poisson process straightforward. Under the assumption
that an event actually occurred at time zero the probability of encountering
x events until time t is
X
k−1
Pk (X(t) = x) = PP (X(t) = kx + j). (A.2)
j=0
This result is straightforward if we take a look at the following figure, which

renders the relation between a Poisson process (t00 , t01 , t02 , ..) and the timing
of Erlang-k (t0 , t1 , t2 , ..). We consider the occurrence of an event as the k-
th realization of an according exponentially distributed process (tx = t0kx ).
Therefore, the probability of encountering x events until time t, is the sum
of the probabilities of encountering kx, kx+1, .., kx+k−1 Poisson events.
The notion, that we start counting with an event at time zero is important,
since we are not dealing with a memoryless process, such as an exponentially
t
t0 = 0 t1 t2
3·2 × × × × × × × × × -
t00 t01 t02 t03 t04 t05 t06 t07 t08
3·2+1 × × × × × × × × × -
t07 t08
3·2+2 × × × × × × × × × -
t07 t08
Figure A.1: Illustration for Erlang-3 distributed interevent times. P3 (X(t) =

2) is the probability of encountering 6, 7 or 8 Poisson events.
distributed variable, anymore. Being memoryless implies, that the chances

of the event to occur within a marginally small time span remains constant,
independent of the time that has past sine the last event. Whereas the
Erlang-k distribution clearly has a spike unequal to 0. This is also the reason,
why we had to postulate assumption A6 for our model.
According to (Chatfield and Goodhardt, 1973, see footnote 4 on page 829)
F.A. Haight distinguished between counting processes that start out with an
event at time zero, and those who do not. He termed them synchronous,
resp asynchronous counting processes. Chatfield and Goodhardt studied the
asynchronous counting of Erlang-k events and termed the resulting process
“Condensed Poisson Process”. Also according to them a synchronous count-
ing equivalent of Erlang-k would be called “Grouped Poisson Process”. It
should be noted, that the author did not find evidence, that the latter term
actually is widely accepted.
A.5 Individual Likelihood

The individual likelihood of parameters λ and p for some recorded purchase
pattern (t1 , . . . , tx , T ) can be deduced analogous to the referred papers. It is
simply the likelihood of the observed interevent times t1 −t0 , t2 −t1 , . . . , tx −tx−1 ,
times the probability of having “survived” at time 0 and the first x − 1

purchases, times the probability of seeing no transaction within time (tx , T ].
Whereas the latter can either be, because the customer defected immediately
after the last purchase, or because simply the time until the next transaction
happens to be larger than T − tx .
L(λ, p|t1 , . . . , tx , T ) = (1 − p)fΓ (t1 |k, λ) · · · (1 − p)fΓ (tx − tx−1 |k, λ) ·

© ª
p + (1 − p)P (X(T − tx ) = 0|k, λ)
Inserting the Erlang-k pdf (A.1) and our previous result (A.2) it follows that
L(λ, p|t1 , . . . , tx , T ) =
λk tk−1
1 e−λt1 λk (tx − tx−1 )k−1 e−λ(tx −tx−1 )
= (1 − p)x · ···
(k − 1)! (k − 1)!
( )
X
k−1
· p + (1 − p) PP (X(T − tx ) = j|λ)
j=0
t̃ :=
z }| {
x kx −λtx
= (1 − p) λ e (1/(k−1)!)x (tx −tx−1 )k−1 · · · (t1 −0)k−1
( )
X
k−1 j
λ (T − tx )j
−λ(T −tx )
· p + (1 − p)e
j=0
j!
X
k−1 j
λ (T − tx )j
= t̃ · p(1 − p)x λkx e−λtx + t̃ · (1 − p)x+1 λkx e−λT (A.3)
j=0
j!
One major difference of this result from the likelihood methods of models with
exponential timing is, that we still have the actual timing of the transactions
t1 , ..., tx (which we subsumed into variable t̃) in our final formula. (x, tx , T ) is
therefore not a sufficient statistic anymore for the likelihood. But, as we will
see shortly, we do not need these timings for the estimation of the parameters,
and therefore actually do not impose any extra requirements regarding the
input data.
A.6 Aggregate Likelihood

Taking into account assumptions A2, A4 and A5 regarding the distribution of
λ and p, we have to mix in the gamma- and beta-distribution by integration.
L(r, α, a, b|t1 , ..., tx , T ) =

Z 1Z ∞
= t̃ · p(1−p)x λkx e−λtx fΓ (λ|r, α)fB (p|a, b) dλ dp
0 0
Z 1Z ∞ Ã k−1 !
X (T −tx )j λj
x+1
+ t̃ · (1−p) λkx e−λT fΓ (λ|r, α)fB (p|a, b) dλ dp
0 0 j=0
j!
(A.4)
Due to assumption A5 we can solve these integrals separately and will for this
purpose use these definitions and results from (Hoppe and Wagner, 2008):
Z ∞
αr · (r)i
IΓ (i, j, r, α) := λi e−λj fΓ (λ|r, α) dλ = (A.5)
0 (j + α)r+i
Z 1
B(a + i, b + j)
IB (i, j, a, b) := pi (1−p)j fB (p|a, b) dp = (A.6)
0 B(a, b)
whereas B(a, b) is the Beta-Function, and (r)x Pochhammer’s symbol:
Γ(a)Γ(b)
B(a, b) = (A.7)
Γ(a + b)
Γ(r + x)
(r)x = (A.8)
Γ(r)
Furthermore we can easily see by using Γ(a+1) = aΓ(a) that
a
B(a + 1, b + x) = · B(a, b) (A.9)
b+x
(r)x+y =(r + x)y · (r)x (A.10)
holds. Therefore:
L(r, α,a, b|t1 , ..., tx , T ) =
= t̃ · IB (1, x, a, b) · IΓ (kx, tx , r, α)
Ã k−1 !
X (T − tx )j
+ t̃ · IB (0, x + 1, a, b) · IΓ (kx + j, T, r, α) (A.11)
j=0
j!
(b)x+1
= t̃ · · αr (r)kx
(a + b)x+1
Ã µ ¶r+kx X k−1
!
a 1 (T − tx )j (r + kx)j
· + (A.12)
b + x α + tx j=0
j! (α + T )r+kx+j
For the Erlang-2 case this is
L(r, α,a, b|t1 , ..., tx , T ) =

(b)x+1
= t̃ · · αr (r)2x
(a + b)x+1
Ã µ ¶r+2x µ ¶r+2x µ ¶r+2x+1 !
a 1 1 1
· + + (T −tx )(r+2x)
b+x α+tx α+T α+T
(A.13)
with t̃ being t1 · (t2 − t1 ) · · · (tx − tx−1 ).
A.7 Parameter Estimation

A well-known parameter estimation method, which is under quite general
conditions asymptotically optimal (i.e. unbiased and efficient), is the Max-
imum Likelihood Estimation. This method tries to find a parameter set
(r, α, a, b) at which the likelihood for given data (ti,1 , ..., ti,x , Ti )i=1..N reaches
its global maximum.
(r̂, α̂, â, b̂) = argmax L(r, α, a, b|(ti,1 , ..., ti,x , Ti )i=1..N )
r,α,a,b
Y
N
= argmax L(r, α, a, b|ti,1 , ..., ti,x , Ti ))
r,α,a,b
i=1
And as we can now see, we can simply drop the cumulative term t˜i for the
exact timing patterns, since these multiplicative factor has no effect on the
location of the maximum, i.e. on the estimated parameters. Therefore we
can stick to (x, tx , T ) as input data for our further calculations.
To circumvent problems with numerical precision it is common to actually
optimize the logarithmic of the likelihood, which transforms the multiplica-
tion (of very small numbers) into a sum.
X
N
(r̂, α̂, â, b̂) = argmax log(L(r, α, a, b|ti,1 , ..., ti,x , Ti )) (A.14)
r,α,a,b
i=1
A.8 Probability Distribution of Purchase Fre-

quencies
We now try to deduce an expression for P (X(t) = x|r, α, a, b), i.e. the proba-
bility distribution of the purchase frequencies conditional on the (estimated)
parameters, and will again closely follow the mathematical derivation from
(Hoppe and Wagner, 2008).
For a single customer (with given λ and p) the probability of encountering
x transactions until time t can be split into two cases. Either the customer
simply just had x transactions and is still active at time t, or she would
have had more than x transactions but defected immediately after her x-th
purchase.
P (X(t) = x|λ, p) = (1 − p)x+1 P (X(t) = x) + p(1 − p)x P (X(t) ≥ x) (A.15)
Using P (X(t) ≥ x) = 1 − P (X(t) < x) and result A.2, we derive
X
kx+k−1
P (X(t) = x|λ, p) = (1 − p)x+1 PP (X(t) = j)
j=kx
Ã !
X
kx−1
+ p(1 − p)x 1 − δx>0 PP (X(t) = j) . (A.16)
j=0
Note, that we added the Kronecker-Delta which is 1 for x > 0, and 0 other-
wise, to correctly consider the case x = 0 for which the second summation
term simply becomes the dropout probability p at time zero.
Again we mix in our heterogeneity assumptions:
P (X(t) = x|r, α, a, b) =
Z 1Z ∞
= P (X(t) = x|λ, p)fΓ (λ|r, α)fB (p|a, b) dλ dp
0 0
Z 1 Z ∞ Ãkx+k−1
X (λt)j
!
= (1 − p)x+1 fB dp e−λt fΓ dλ
0 0 j=kx
(j)!
Z 1 Z ∞Ã X (λt)j
kx−1
!
x −λt
+ p(1 − p) fB dp 1 − δx>0 e fΓ dλ (A.17)
0 0 j=0
(j)!
and apply the results (A.6) and (A.5):

P (X(t) = x|r, α, a, b) =
Ãkx+k−1 !
X tj
= IB (0, x + 1, a, b) · IΓ (j, t, r, α)
j=kx
j!
Ã !
X tj
kx−1
+ IB (1, x, a, b) · 1 − δx>0 IΓ (j, t, r, α)
j=0
j!
Ãkx+k−1 !
B(a, b + x + 1) X tj αr (r)j
= ·
B(a, b) j=kx
j! (α + t)r+j
Ã !
B(a + 1, b + x) X tj αr (r)j
kx−1
+ · 1 − δx>0 (A.18)
B(a, b) j=0
j! (α + t)r+j
Considering the probability distribution for the Negative Binomial Distribu-

tion
tj αr (r)j
PNBD (X(t) = j) = (A.19)
j! (α + t)r+j
we can also write
P (X(t) = x|r, α, a, b) =
Ãkx+k−1 !
B(a, b + x + 1) X
= · PNBD (X(t) = j)
B(a, b) j=kx
Ã !
B(a + 1, b + x) X
kx−1
+ · 1 − δx>0 PNBD (X(t) = j) . (A.20)
B(a, b) j=0
Thus, for the Erlang-2 case this expression is

P (X(t) = x|r, α, a, b) =
B(a, b + x + 1)
= · (PNBD (X(t) = 2x) + PNBD (X(t) = 2x + 1))
B(a, b)
Ã !
B(a + 1, b + x) X
2x−1
+ · 1 − δx>0 PNBD (X(t) = j) . (A.21)
B(a, b) j=0
A.9 Probability of Being Active

As Schmittlein et al. (Schmittlein et al., 1987) pointed out one of the key
expressions of models of this kind, is the probability of a single customer
still being active at the end of the observation period, based on his past
transaction history. I.e. we ask for P (τ > T |t1 , .., tx , T, r, α, a, b) with τ being
the unobserved customer lifetime.
P (τ > T |t1 , .., tx , T, λ, p) = 1 − P (τ ≤ T |t1 , .., tx , T, λ, p)

p
=1−
P (X(T − tx ) = 0)
p
=1− Pk−1
p + (1 − p) j=0 PP (X(T − tx ) = j)
Pk−1
(1 − p) j=0 PP (X(T − tx ) = j)
= Pk−1
p + (1 − p) j=0 PP (X(T − tx ) = j)
By expanding this term with t̃(1 − p)x λkx e−λtx , and comparing the denomi-
nator with equation (A.3) it follows, that
Pk−1 λj (T −tx )j
t̃(1 − p)x+1 λkx e−λT j=0 j!
P (τ > T |t1 , .., tx , T, λ, p) = (A.22)
L(λ, p|t1 , .., tx , T )
Building the double integral
P (τ > T |t1 , .., tx , T, r, α, a, b) =

Z 1Z ∞
P (τ > T |t1 , .., tx , T, λ, p)fΓ (λ|r, α)fB (p|a, b) dλ dp (A.23)
0 0
and using the following result from section 3.2.3 of (Hoppe and Wagner,
2008)
L(λ, p|t1 , .., tx , T )fΓ (λ|r, α)fB (p|a, b)
f (λ, p|t1 , .., tx , T ) = (A.24)
L(r, α, a, b|t1 , .., tx , T )
yields
P (τ > T |t1 , .., tx , T, r, α, a, b) =

Z 1
t̃
= · (1 − p)x+1 fB (p|a, b) dp
L(r, α, a, b|t1 , .., tx , T ) 0
Z ∞ X
k−1
kx −λT (T − tx )j j
· λ e λ fΓ (λ|r, α) dλ
0 j=0
j!
X
k−1
(T − tx )j
= t̃ · IB (0, x + 1, a, b) · IΓ (kx + j, T, r, α)
j=0
j!
/L(r, α, a, b|t1 , .., tx , T ) (A.25)
Comparing this with equation (A.11), we can see that the numerator is actu-
ally one of the summation terms of the aggregated likelihood function in the
A A −1
denominator. And considering A+B = (1 + B ) the fraction can be reduced
to
P (τ > T |t1 , .., tx , T, r, α, a, b) =
Ã !−1
t̃ · IB (1, x, a, b) · IΓ (kx, tx , r, α)
= 1+ Pk−1 j
t̃ · IB (0, x + 1, a, b) · j=0 (T −t j!
x)
IΓ (kx + j, T, r, α)
(A.26)
Fortunately the term t̃ cancels out, and therefore we still do not need the
exact timing of the transactions for our calculations. Now we resolve the
integral functions, extract common terms and use the relation (r)kx+j = (r)kx ·
(r + kx)j and yield
P (τ > T |x, tx , T, r, α, a, b) =
Ã
B(a + 1, b + x) αr (r)kx (α + T )r+kx
= 1+ · ·
B(a, b + x + 1) (α + tx )r+kx αr (r)kx
!−1
X
k−1
(T − tx )j j
/ (r + kx)j (α + T )
j=0
j!
Ã µ ¶r+kx X k−1
!−1
a α+T (T − tx )j (r + kx)j
= 1+ / . (A.27)
b + x α + tx j=0
j! (α + T )j
Thus, for Erlang-2:

P (τ > T |x, tx , T, r, α, a, b) =
Ã µ ¶r+2x µ ¶!−1
a α+T T − tx
= 1+ / 1 + (r + 2x) (A.28)
b + x α + tx α+T
A.10 Expected Number of Transactions

In order to arrive at a closed form solution for the predicted number of
transactions for a single customer with given purchase history E(Y (T, T +
t)|x, tx , T, r, α, a, b), we follow the same steps as in (Hoppe and Wagner, 2008).
Unfortunately we do not succeed. Nevertheless we come up with an heuristic
approximation, and provide some reasoning for our simplifications. As the
calculations for the DMEF competition have shown, such an approach can
still outperform existing models, that assume a Poisson process.
A.10.1 Unconditional Expectation for Condensed Pois-

son
The expected number of transactions for an active customer with exponen-

tially distributed interevent times is known to be E(X(t)|λ) = λt.
The asynchronous counting process for Erlang-2 waiting times has an expec-
tation of E(X(t)|λ) = λt/2 (see (Chatfield and Goodhardt, 1973)). Similarly
we will now prove that the generalization for Erlang-k E(X(t)|λ) = λt/k also
holds true. Let us recall that asynchronous counting for Erlang-k can also be
seen as a censored counting of a Poisson process, where every k-th event is
being counted. As we start the counting independent of a particular event,
the recording of r censored events can either arise from recording rk, rk+1,
rk −1,..., rk +k −1 or rk −k +1 uncensored events. Or, if we take a look at
it the other way around, then rk +j (0 ≤ j ≤ k) uncensored events result in
either r (with probability k−j
k
) or r+1 (with probability kj ) censored events to
be counted. Therefore
X
∞
E(X(t)|λ) = rPC (r)
r=1
Ã !
X
∞ X
k−1
k − |j|
= r PP (kr + j)
r=1 j=−k+1
k
1X
∞
= krPP (kr)
k r=1
Ã !
X
k−1
j X
∞
k−j X
∞
+ krPP (kr − k + j) + krPP (kr + j)
j=1
k 2
r=1
k 2
r=1
| {z }
=:Tj
Tj can be reduced to
j ³X X ´
Tj = 2 (kr−k+j)PP (kr−k+j) + (k−j)PP (kr−k+j)
k
k−j ³X X ´
+ 2 (kr+j)PP (kr+j) − jPP (kr+j)
k
j ³X X ´
= 2 (kr−k+j)PP (kr−k+j) + (k−j)PP (kr−k+j)
k
k−j ³X X ´
+ 2 (kr−k+j)PP (kr−k+j) − jPP (j) − jPP (kr−k+j) + jPP (j)
k
1X
= (kr − k + j)PP (kr − k + j),
k
and we receive our result for the unconditional expected number for asyn-
chronous counting:
1X X 1X
∞ k−1 ∞
E(X(t)|λ) = krPP (kr) + (kr − k + j)PP (kr − k + j)
k r=1 j=1
k r=1
1X
∞
λt
= rPP (r) = (A.29)
k r=1 k
A.10.2 Unconditional Expectation for Grouped Pois-

son
For a synchronous counting process with Erlang-k waiting times the deriva-
tion of the expectation unfortunately becomes a bit trickier. Using result (A.2)
we can deduce
X
∞ X
∞ X
k−1
E(X(t)|λ) = rPG (r) = r PP (rk + j)
r=1 r=1 j=0
k−1 ³ X
X ∞ ´
1
= rkPP (rk + j)
k j=0 r=1
1 X³X X ´
k−1 ∞ ∞
= (rk + j)PP (rk + j) − j PP (rk + j)
k j=0 r=1
P
|r=1 {z }
∞
r=0
PP (rk+j)−PP (j)
1³X X X ¡X ¢´
∞ k−1 k−1 ∞
= rPP (r) − rPP (r) − j PP (rk + j) − PP (j)
k r=0 r=0 j=0 r=0
1³ X ´
k−1 X ∞
= λt − j PP (rk + j) .
k j=1 r=0
For k = 2 it is possible to find a simple closed form for the unconditional

expected number for synchronous counting.
1³ X ´
∞
E(X(t)|λ) = λt − PP (2r + 1)
2 r=0
1 ³ X∞
(λt)2r+1 ´
= λt − e−λt
2 r=0
(2r + 1)!
λt 1 −λt
= − e sinh(λt) (A.30)
2 2
The result for the synchronous counting process (A.30) differs from the asyn-
chronous result (A.29) only by an additional subtraction term, that converges
for Erlang-2 to 1/4 for t → ∞. I.e. for a long time horizon we can assess the
error, that we make, if we use the simpler formula (A.29).
A.10.3 Expectations for Condensed NBD
Schmittlein and Morrison published some findings regarding the Condensed

Negative Binomial Distribution (but only considered the Erlang-2 case) in an
article in 1983 (see (Schmittlein and Morrison, 1983)). They state a formula
for the higher moments of the unconditional expectation, in particular
r
E(X|r, α) = , and (A.31)
2α · µ ¶r ¸
r 1 α r
V ar(X|r, α) = + 1− + 2, (A.32)
4α 8 α+2 4α
but also derived a formula for the conditional expectation. Due to its com-
plexity, we will not reproduce this result here, but rather want to point out
two important characteristic differences to the NBD, that Schmittlein and
Morrison noted. First, the expected number of future transactions is not
linear regarding the observed number of transactions anymore, and second,
the result now does depend on any elapsed time between the observation
and the prediction period. Both of these statements already indicate that
deriving a formula for the conditional expectation of CBG/CNBD-k model
will be anything but trivial.
A.10.4 Unconditional Expectation for CBG/CNBD-k
Unfortunately the author did not succeed in deriving a closed form for the
expression E(X(t)|r, α, a, b). We could derive a (rather complex) expression
for E(X(t)|λ, p) for k = 2, but subsequently incorporating heterogeneity would
have required solving double integrals of the form
Z 1Z ∞ √
pv4 (1 − p)v5 λv1 e−λ(v3 +v2 1−p)t
dλ dp. (A.33)
0 0
Nevertheless we proceed with our calculations by using some simple heuristic

modifications to the results of Hoppe/Wagner ((Hoppe and Wagner, 2007)).
They define
µ ¶v1
v4 t
G(v1 , v2 , v3 , v4 | α, t) := 1 − 2 F1 (v1 , v2 + 1; v3 + a; )
v4 + t v4 + t
(A.34)
with 2 F1 being the Gaussian hypergeometric function again, and stated

b
E(X(t)|r, α, a, b) = · G(r, b, b, α | α, t) (A.35)
a−1
for the unconditional expected number of transactions until time t for their
CBG/NBD model.
Recalling our findings, that the expectation for asynchronous counting is
simple 1/k of the corresponding Poisson process (see equation (A.29)), and
that the synchronous counting only differs by some term, that becomes a
constant for long time horizon, we simply approximate the expected number
of transactions for the CBG/CNBD-k model with
1 b
Ê(X(t)|r, α, a, b) = · · G(r, b, b, α | α, t). (A.36)
k a−1
A.10.5 Conditional Expectation for CBG/CNBD-k
But even if we come up with a proper solution for the unconditional expec-
tation, the next hurdle is to calculate the expected number of future trans-
actions, based on a given purchase history. Due to the fact, that as opposed
to the exponential distribution the Erlang-k distribution is not memoryless,
we can not use the relation
E(Y (T, T + t)|x, tx , T, r, α, a, b) = E(X(t)|τ > T, λ, p) · P (τ > T |x, tx , T, λ, p),

(A.37)
as it is the case for the CBG/NBD model. Recency (T − tx ) actually does

influence the expected number of future transactions (i.e. the first multiplica-
tion term), and not just the probability of still being active. Assuming that
the customer has survived the last transaction, a longer time period since
the last transaction actually makes it more likely that the next transaction
will take place soon. Therefore we will systematically underestimate future
transactions, if we still use this relation for CBG/CNBD-k.
Nevertheless we proceed with our heuristic simplifications, and again adapt
the findings of Hoppe/Wagner. They derived
a+b+x
E(Y (T, T + t)|x, tx , T, r, α, a, b) = · G(r + x, b + x, b + x, α + T | α, t)
a−1
· P (τ > T |x, tx , T, r, α, a, b) (A.38)
for the CBG/NBD model. In their erratum (Wagner and Hoppe, 2008) to
(Batislam et al., 2007) they note, that it is possible to come up with the result
for the forecast by updating the parameters (r, α, a, b) to (r + x, α + T, a, b + x).
We use our exact derivation (A.27) for P (τ > T |x, tx , T, r, α, a, b), and combine
this with our approximation for the expectation from the previous section.
Additionally we will update the parameters accordingly from (r, α, a, b) to
(r + kx, α + T, a, b + x) (since we encountered kx uncensored events within
(0, T ]).
1 a+b+x
Ê(Y (T, T + t)|x, tx , T, r, α, a, b) = ·
k a−1
· G(r + kx, b + x, b + x, α + T | α, t)
· P (τ > T |x, tx , T, r, α, a, b) (A.39)
A.11 Concluding Remarks

Despite the fact that we are just able to derive a biased approximation, we
demonstrate in the main part of this thesis that this formula still is able
to outperform classic models based on the Poisson assumption regarding
individual forecasts. The author assumes, that the critical part for a correct
prediction (especially when it comes to rather long prediction periods) is a
proper assessment of whether a customer is still active or not. It seems that
the error, we get by approximating the expected number of transactions is
less then the gained precision for the assessment of whether a customer is
still active or not.
Appendix B
R Source Code
All calculations for this thesis have been carried out with the statistical soft-
ware R CITE. The author releases the following code bits under the liberal
open-source license Apache2 (FIXME url), and as such does not any respon-
sibility, whatsoever FIXME. Nevertheless the correctness of the implemen-
tations of Pareto/NBD, BG/NBD as well as of CBG/NBD could be verified
by comparing calculated figures regarding the CDNOW dataset with those
from the referred articles.
Listing B.1: Gaussian Hypergeometric Function
h 2 f 1 <− function ( a , b , c , z ) {
j <− 0
u j <− 1
y <− u j
l t e p s <− 0
while ( l t e p s < 1 ) {
l a s t y <− y
j <− j +1
u j <− u j ∗ ( ( a+j −1) ∗ ( b+j −1) ∗ z ) / ( ( c+j −1) ∗ j )
y <− y + u j
l t e p s <− sum( y==l a s t y ) /length ( y )
}
return ( y )
}
B.0.1 Data Preprocessing
TODO: * convertToRF: convert transaction data to RFM * buildTrainData

* convertToTab
51
APPENDIX B. R SOURCE CODE 52
B.0.2 EDA
* plot regularities? * estimate shape?
B.1 NBD
B.2 Pareto/NBD
B.3 BG/NBD
B.4 CBG/CNBD-k
B.5 Data Simulation
B.6 Usage Example for CDNOW

Bibliography
M. Abe. Counting Your Customers One by One: A Hierarchical Bayes Ex-

tension to the Pareto/NBD Model. Marketing Science, forthcoming, 2008.
E.P. Batislam, M. Denizel, and A. Filiztekin. Empirical validation and com-

parison of models for customer base analysis. International Journal of
Research in Marketing, 24(3):201–209, 2007.
C. Chatfield and G.J. Goodhardt. A Consumer Purchasing Model with Er-

lang Inter-Purchase Time. Journal of the American Statistical Association,
68(344):828–835, 12 1973.
P.C. Consul and G.C. Jain. A generalization of the Poisson distribution.

Technometrics, 15:791–799, 1973.
A.S.C. Ehrenberg. The Pattern of Consumer Purchases. Applied Statistics,

8(1):26–41, 1959.
P. Fader and B. Hardie. Forecasting Repeat Sales at CDNOW: A Case Study.

Interfaces, 31(4):94–107, 2001.
P. Fader, B. Hardie, and K.L. Lee. Counting Your Customers the Easy Way:
An Alternative to the Pareto/NBD Model. Marketing Science, 24:275–284,
2005a.
P. Fader, B. Hardie, and K.L. Lee. RFM and CLV: Using Iso-Value Curves
for Customer Base Analysis. Journal of Marketing Research, 42:415–430,
2005b.
J.D. Greene. Consumer behavior models for non-statisticians: the river of

time. Praeger, 1982.
S. Gupta, D. Hanssens, B. Hardie, W. Kahn, V. Kumar, N. Lin, N. Rav-

ishanker, and S. Sriram. Modeling Customer Lifetime Value. Journal of
Service Research, 9(2):139, 2006.
53
BIBLIOGRAPHY 54
F.A. HAIGHT. Counting distributions for renewal processes. Biometrika, 52

(3-4):395–403, 1965.
J. Herniter. A Probabilistic Market Model of Purchase Timing and Brand

Selection. Management Science, 18(4):102–112, 1971.
D. Hoppe and U. Wagner. Customer Base Analysis: The Case for a Cen-
tral Variant of the Betageometric/NBD Model. Marketing - Journal of
Research and Management, 2:75–90, 2007.
D. Hoppe and U. Wagner. Supplementary Appendix to “Customer Base

Analysis: The Case for a Central Variant of the Betageometric/nbd
Model”. Appendix with detailed mathematic derivations that is being
provided by authors upon request., 2008.
D.R. Mani, J. Drew, A. Betz, and P. Datta. Statistics and data mining
techniques for lifetime value modeling. In Proceedings of the fifth ACM
SIGKDD international conference on Knowledge discovery and data min-
ing, pages 94–103. ACM New York, NY, USA, 1999.
L. May, D. Austin, T.L. Bartlett, E. Malthouse, and P. Fader. Lifetime

Value and Customer Equity Modeling Competition, 2008. URL http:
//www.the-dma.org/dmef/2008DMEFDKContestAnnouncement.pdf.
D.G. Morrison and D.C. Schmittlein. Generalizing the NBD Model for Cus-
tomer Purchases: What Are the Implications and Is It Worth the Effort?
Reply. Journal of Business and Economic Statistics, 6(2):165–66, 1988.
R Development Core Team. R: A Language and Environment for Statistical

Computing. R Foundation for Statistical Computing, Vienna, Austria,
2008. URL http://www.R-project.org. ISBN 3-900051-07-0.
S. Rosset, E. Neumann, U. Eick, and N. Vatnik. Customer Lifetime Value

Models for Decision Support. Data Mining and Knowledge Discovery, 7
(3):321–339, 2003.
D.C. Schmittlein and D.G. Morrison. Prediction of Future Random Events

With the Condensed Negative Binomial Distribution. Journal of the Amer-
ican Statistical Association, 78(382):449–456, 1983.
D.C. Schmittlein and R.A. Peterson. Customer Base Analysis: An Industrial

Purchase Process Application. Marketing Science, 13(1):41–67, 1994.
BIBLIOGRAPHY 55
D.C. Schmittlein, D.G. Morrison, and R. Colombo. Counting your customers:

who are they and what will they do next? Management Science, 33(1):
1–24, 1987.
H. Schröder, M. Feller, and M. Großweischede. Kundenorientierung im Cat-

egory Management. 12 1999. URL http://cm.uni-essen.de/praxis/
publikationen/download/MH_Publikationen_1999_ECR-Studie.pdf.
U. Wagner and D. Hoppe. Erratum on the MBG/NBD Model. International

Journal of Research in Marketing, 2008.
U. Wagner and A. Taudes. Stochastic models of consumer behaviour. North-

Holland, 1987.
R.D. Wheat and D.G. Morrison. Estimating Purchase Regularity with Two
Interpurchase Times. Journal of Marketing Research, 27(1):87–93, 1990.
M. Wübben and F. von Wangenheim. Instant Customer Base Analysis:

Managerial Heuristics Often “Get It Right”. Journal of Marketing, 72:
82–93, 5 2008.

DIPLOMARBEIT Stochastic Purchase Models For Noncontractual Consumer Relations Ausgef HRT

Transféré par

Informations du document

Description originale:

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

DIPLOMARBEIT Stochastic Purchase Models For Noncontractual Consumer Relations Ausgef HRT

Transféré par

Droits d'auteur :

Formats disponibles

DIPLOMARBEIT

Stochastic Purchase Models for

ausgeführt am Institut für

Handel und Marketing

unter Anleitung von

a.o. Univ-Prof. Dr. Thomas Reutterer

MASTER THESIS AT THE

The primary goal of this master thesis is to evaluate several well-established

3 Exploratory Data Analysis 12

A.5 Individual Likelihood . . . . . . . . . . . . . . . . . . . . . . . . 38

• How much is my current customer base worth? “Customer Equity?”

1.2 Problem Scope

tion history. As such, we will not provide a complete overview of existing

1.3 Presented Models

Model Author(s) Year

Table 1.1: Overview of presented models.

1.4 Use Cases

• Customers of the (former) online music store CDNOW (Fader et al.,

• Clients of a financial service broker (Schmittlein et al., 1987).

• Members of a frequent shopper program at a department store in Japan

• Consumers buying at a grocery store (Batislam et al., 2007). Individual

• Business customers of an office supply company (Schmittlein and Pe-

• Clients of a catalog retailer (Hoppe and Wagner, 2007).

But actually whenever “a customer purchases from a catalog retailer, walks

2.1 Contest Details

The first phase of the competition consisted of three separate estimation

1. Estimate the donation sum on an aggregated level.

2. Estimate the donation sum on an individual level.

2.2 Data Set

id date amt source

Table 2.1: Transaction Records

Table 2.2: Contact Records

2.3 Game Plan

Exploratory Data Analysis

In this chapter an in-depth descriptive analysis of the contest data set is

3.1 Key Summary

Table 3.1: Descriptive Statistics

The data set consists of a rather large2 , heterogenous cohort of donors.

Various Timing Patterns

2002 2003 2004 2005 2006

Figure 3.1: Timing patterns for 12 randomly selected donors.

ID 11359536) up to regular donors, that obviously already defected (e.g. ID

3.2 Distributions on Individual Level

Figure 3.2: Distribution of number of donations.

Besides the aforementioned 50,2% of single-time donors another large share

Distribution of Donation Amounts

0.25 1 2 3.5 6 10 18 32 57 110 235 500 1200 3000 10000

Donation Amount − logarithmic scale

Figure 3.3: Distribution of donation amounts.

little as a quarter dollar up to a single generous donation of $10,000. Regard-

Conditional Distribution of Donation Amounts

Figure 3.4: Conditional distribution of average donation amounts vs. number of

are proportional to the square roots of the number of observations in the

3.3 Trends on Aggregated Level

+8% −24% −3%

2002 2003 2004 2005 2006 2007

Figure 3.5: Overall trend in donation sum.

Nr of Donations Avg Donation Amount

Figure 3.6: Trend in number of donations and average donation amount.

Percentage of Donors Average Nr of Donations

2002 2003 2004 2005 2002 2003 2004 2005

Figure 3.7: Trend in activity.