Académique Documents
Professionnel Documents
Culture Documents
durch
Michael Platzer
Speisingerstrasse 76/1
1130 Wien
Matr.Nr. 9650359
Datum Unterschrift
Stochastic Purchase Models for
Noncontractual Consumer Relations
Michael Platzer
michael.platzer@gmail.com
i
Contents
Abstract i
1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Problem Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Presented Models . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 DMEF Competition 6
2.1 Contest Details . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 Game Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4 Forecast Models 22
4.1 NBD Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
ii
CONTENTS iii
4.1.1 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.1.2 Empirical Results . . . . . . . . . . . . . . . . . . . . . . 26
4.2 Pareto/NBD Model . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.2.1 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.2.2 Empirical Results . . . . . . . . . . . . . . . . . . . . . . 29
4.3 BG/NBD Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.3.1 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.3.2 Empirical Results . . . . . . . . . . . . . . . . . . . . . . 30
4.4 CBG/NBD Model . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.4.1 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.4.2 Empirical Results . . . . . . . . . . . . . . . . . . . . . . 32
4.5 Heuristic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.6 LM25 + NN25 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.7 Variants of BG/NBD . . . . . . . . . . . . . . . . . . . . . . . . 32
4.8 CBG/CNBD-k Model . . . . . . . . . . . . . . . . . . . . . . . . 32
4.8.1 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.8.2 Empirical Results . . . . . . . . . . . . . . . . . . . . . . 33
4.9 Model Comparison . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.10 Estimation of Monetary Component . . . . . . . . . . . . . . . 33
5 Conclusion / Discussion 35
A Derivation of CBG/CNBD-k 36
A.1 Overview of Used Methods . . . . . . . . . . . . . . . . . . . . . 36
A.2 Overview of Used Stochastic Distributions . . . . . . . . . . . 36
A.3 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
A.4 Erlang-k . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
CONTENTS iv
B R Source Code 51
B.0.1 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . 51
B.0.2 EDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
B.1 NBD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
B.2 Pareto/NBD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
B.3 BG/NBD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
B.4 CBG/CNBD-k . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
B.5 Data Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
B.6 Usage Example for CDNOW . . . . . . . . . . . . . . . . . . . 52
Bibliography 53
Chapter 1
Introduction
1.1 Background
Over 80% of those companies, that participated in a German study on the
usage of information instruments in retail controlling (Schröder et al., 1999)
regarded the application of customer lifetime value as useful. But less than
10% actually had already a working implementation at that time. No other
consumer related information, for example customer satisfaction, penetration
or sociodemographic variables, showed such a big discrepancy between as-
sessed usefulness and actual usage. Therefore accurate lifetime value models
will become, despite but also because of their inherent challenging complex-
ity, a crucial information advance in highly competitive markets.
Typical fundamental managerial questions, that arise, are (Schmittlein et al.,
1987; Morrison and Schmittlein, 1988):
1
CHAPTER 1. INTRODUCTION 2
And a key part for finding answers to those questions is the accurate assess-
ment of lifetime value, on an aggregated as well as on an individual level.
Hardly any organization can afford making budget plans for the upcoming
period without making careful estimations regarding the income side. Such
estimates, on the aggregate level, are therefore widely common and numer-
ous different methods, ranging from simple managerial heuristics to advanced
time series analysis exist. Fairly more challenging is the prediction of future
sales broken down between trial and repetitive customers. And, considering
how little information we have on an individual level, an undoubted demand-
ing task is making accurate forecasts for each single client.
Nevertheless, the ongoing uprise of computerized transaction systems and
the drop in data storage costs, that we have seen over the past decade,
provides more and more companies with customer databases coupled with
large records of transaction history (“Who bought which product at what
price at what time?”). But the sheer data itself is useless, unless models
and tools are implemented, that condense the desired characteristics, trends
and forecasts out of the data. Such tools are commonly provided as part
of Customer Relationship Management software, that enable organizations
to act and react individually to each customer. An ability, that takes the
heterogeneity in one’s customer base into account, and that subsequently
allows an organization to further optimize marketing activities and their
efficiency1 . And one essential information bit for CRM implementations is
the (monetary) valuation of an individual customer (Rosset et al., 2003).
well-known Pareto/NBD model (section 4.2), and two of its variants, the
BG/NBD (section 4.3) and the CBG/NBD (section 4.4) model, which are all
extensions of the NBD model, but make additional assumptions regarding
the defection process, and its heterogeneity among customer. In order to
get a feeling for the forecast accuracy of these probabilistic models, we will
subsequently also benchmark them against a simple linear regression model.
Finally a new model will be introduced in section 4.8, namely the CBG/CNBD-
k model, that has been developed as a variation of the CBG/NBD model by
the author. This variant makes differing assumptions regarding the timing
of purchases, in particular it considers a certain degree of regularity, and as
such will improve forecast quality considerably for the competition dataset.
Detailed derivations for this model are provided in appendix A.
DMEF Competition
6
CHAPTER 2. DMEF COMPETITION 7
3. Estimate which donor, that had his last donation before 1st of Septem-
ber 2004, will be donating at all.
The prize sum for winning any of these three tasks is $500, plus the winning
teams are “invited to write a short note describing the winning model [...],
which will be published in the ‘Journal of Interactive Marketing’ ” (May
et al., 2008)5 .
Task 1 results in a single figure, whereas for task 2 & 3 a data file containing
donor IDs and the according estimates had to be submitted. Winning task
1 is rather a simple guessing game, considering the number of participants,
and the irregular fluctuations throughout the training period. Therefore our
main focus has been on task 2, whereas a guess for task 3 is being derived
straightforward from our calculations for task 2.
An error measure is defined for all 3 tasks by the contest organizing commit-
tee, and the submitted calculations by the participating teams are evaluated
regarding these measures. Closeness on an aggregated level (task 1) is simply
defined as the absolute deviation from the actual donation amount. The error
measure for task 3 is given straight by the percentage of correctly classified
cases. The error measure for task 2 is to some extent rather uncommon, and
is defined as the mean squared logarithmic error:
X
MSLE = (log(yi + 1) − log(ŷi + 1))2 /21.166, (2.1)
i
with the 1 added to avoid taking the logarithm of 0, and with 21.166 being the
size of the cohort. More common distance measures for evaluating forecasts
on an individual level are the mean absolute deviation (MAE), the root mean
squared error (RMSE), also the root median of squared errors (Wübben and
von Wangenheim, 2008), or simply the correlation between the estimated and
actual data (Fader et al., 2005a). Hoppe and Wagner (2007, p. 85) used the
geometric mean relative absolute error (GMRAE), which is a relative measure
5
It is noteworthy that Pete Fader, author of the BG/NBD model, which is being
presented in the following, is a member of the contest organizing committee.
CHAPTER 2. DMEF COMPETITION 8
compared to some other particular benchmark model (in their article they
benchmarked against the NBD model) and also allows comparison between
different data set.
The author assumes that the mean squared logarithmic error has been cho-
sen over the root mean squared error, as it is less sensitive regarding large
values, which are only generated by a very small portion of cases, and as
such rather emphasizes on an accurate guess for the dominant low-purchase
class. But, as we will show later on in the simulation studies, the MSLE
favors forecasts, that systematically underestimate. This becomes also ap-
parent, when considering that in case of a 50%√chance of y = 0 and a 50%
chance of y = 1 the MSLE is minimized by ŷ = 2 − 1 = 0.414, as opposed to
ŷ = 0.5 · 0 + 0.5 · 1 = 0.5 for minimizing RMSE. For the competition we tried
to take advantage of this particular characteristic of the MSLE.
The deadline for submitting calculations for phase 1 (task 1 to 3) has been
Sep 15, 2008. The results for the 25 participating teams were announced
couple of weeks afterwards, and have been discussed at the DMEF’s Research
Summit in Las Vegas6 .
petition data set therefore represents only a small subset of the complete
available data, that has been provided by the NPO after the competition.
to split the provided data into a training period and a validation period. The
training data is used for calibrating the model and its parameters, whereas
the validation data enables us to compare the forecast accuracy among the
models. By choosing several different lengths of training periods, as has also
been done by Schmittlein and Peterson (1994), Batislam et al. (2007) and
Hoppe and Wagner (2007) for example, we can further improve the robustness
of our choice. After picking a certain model for the competition, the complete
provided data set is used for the final calibration of the model.
Despite the fact, that a strong causal relation between contacts and actual
donations can be assumed, we will not include the contact information into
our model building. The main reason is, that such data is not available
for the target period, and also can not be reliably estimated. Therefore we
implicitly assume that direct marketing activities will have a similar pattern
as in the past, and simply neglect this information. A similar assumption is
done regarding all other possible exogenous influences, such as competition,
advertisement, public opinion, etc., due to the lack of information.
The probabilistic models under investigation all try to model the purchase
opportunity7 as opposed to the actual purchase amount. Assuming indepen-
dence between purchase amount and purchase rate, resp. defection rate, we
will simply estimate the average amount per customer in a separate step (see
section 4.10), on top of the estimated number of future purchase.
Providing an estimate for task 3 is directly derived from task 2, as we assume,
that any customer with an estimated number of purchases of 0.5 or higher
will actually make a purchase. Task 1 could also be deduced from task 2 by
simply building the sum over all individual estimates.
All of our following calculations and visualizations are carried out on top
of the statistical programming environment R8 (R Development Core Team,
2008), which is freely available, well documented, widely used in academic
research, and further provides a large repository of additional libraries. Un-
fortunately the presented probabilistic models are (not yet) part of an ex-
istent library, and therefore we carry out the programming of these models
by ourselves, based upon the presented analytical results of the referred ar-
ticles. Together with the published estimates regarding the CDNOW data
set9 within those articles, we are able to verify the correctness of our imple-
7
From time to time we will refer to donations as purchases, and to donors as consumers
or clients, as these are the general terms, used in marketing literature.
8
Version 2.7.2
9
http://brucehardie.com/notes/008/
CHAPTER 2. DMEF COMPETITION 11
mentations.
Chapter 3
Nr of donors 21,166
Cohort time length 6 months
Available time frame 4 years 8 months
Available time units days
Nr of zero repeaters: absolute; relative 10,626; 50.2%
Nr of repetitive donations: mean; sd1 ; max 1.55; 2.93; 55
Donation amount: mean; sd; max $39.31; $119.32; $10,000
Time between donations: mean; sd; max 296days; 260days; 1626days
Time until last donation: mean; sd 460days; 568days
12
CHAPTER 3. EXPLORATORY DATA ANALYSIS 13
On the one hand, the majority (50.2%) did not donate at all after their initial
donation, on the other hand, some individuals donated up to 55 donations.
The amount per transaction ranges from as little as a quarter of a dollar
up to $10,000, with the standard deviation being 3 times larger than its
mean. These simple statistics already make it clear, that any model, that is
being considered to fit the data, should be able to account for such kind of
heterogeneity.
It can also be noted, that the covered time span of the records is quite long
(as is the target period of 2 years). This implies, that people that are still
active at the end of that 4 year and 8 month period are rather loyal, long-
term customers. But it also means that assuming stationarity regarding the
underlying mechanism (resp. model parameters) might not hold true.
11382546 | | | | |
11371770 | | | | | | | | | | | | | | | |
11359536 | | |
11343894 | |
11329984 |
Donor ID
11317401 |
11303989 |
11292547 | |
11281342 | | | | | | |
11270451 |
11259736 |
10870988 ||||||||||||||||||||||||||||||||||||||||||||
An important feature of the data set is that donation (as well as contact)
records are given with their exact timing, and are not aggregated to longer
time spans, nor are condensed to simple frequency numbers. Therefore we can
and should use the information of the exact timing of the donations for our
further analysis. A first ad-hoc visualization (see figure 3.1) of 12 randomly
selected donors already displays some of the differing characteristic timing
patterns, ranging from single-time (e.g. ID 11259736), over sporadic (e.g.
CHAPTER 3. EXPLORATORY DATA ANALYSIS 14
Distribution of Nr of Donations
12000
50.2%
10000
8000
Nr of Donors
6000
4000
16.9%
10.8%
2000
7.6%
6.3%
3.9%
2.6% 1.6%
0
1 2 3 4 5 6 7 8+
Nr of Donations
10
0.20
Relative Frequency
0.15
50
20
0.10
15
5 100
0.05
0.00
60
40
20
0
1 2 3 4 5 6 7 8+
Nr of Donations
Donation Sum
4e+05
3e+05
2e+05
1e+05
0e+00
Time
a sharp decline right after the second quarter in 2002. An effect that is
plausible, if we recall that our cohort actually consists by definition of new
donors who donated within the first half of 2002, and that a large portion of
donors just donated a single time. Further we can state that the data shows
a strong seasonal fluctuation, with the third quarter being the weakest, and
the fourth and first quarter being the strongest periods, each showing about
twice as many donations than the third quarter. It also seems that we have a
downward trend in donation sums, but by looking at the percentage changes,
we see that the speed of this trend is unclear. In the beginning we even record
an 8% increase, then a sharp 24% drop, which is followed by a moderate 3%
decrease over the last year. Task 1 of the competition is to estimate the future
trend of these aggregated donation sums for the next two years. Considering
the erratic movements this is quite a challenge.
The overall donation sum is the result of the multiplication of the number
of donations with the average donation amount. Charting the trends in
these two variables separately in figure 3.6 provides some further insight.
Regarding the number of donations, the seasonality, with the peak around
the Christmas holidays, is also apparent and seems plausible. The continuous
downward trend (-13%, -15%, -14%) in the transaction numbers is quite
CHAPTER 3. EXPLORATORY DATA ANALYSIS 18
50
8000
40
30
4000
20
10
−13% −15% −14% +24% −10% +12%
0
0
2002 2004 2006 2002 2004 2006
Time Time
stable, and as such predictable. A simple heuristic could, for example, assume
a constant decreasing rate of 14% for the next two years. As has been
noted in the preceding section this downward trend can either be the result
from a decreasing donation frequency for each donor, or from an ongoing
defection process. Figure 3.7 indicates that rather the latter is the case. The
share of active donors is steadily decreasing4 , whereas the average number of
donations per active donor is slightly increasing.
2.0
0.4
1.51 1.55
1.46
1.5
1.42
27.8% 29.5%
0.3
23.5%
1.0
18.8%
0.2
0.5
0.1
0.0
0.0
Time Time
Regarding the erratic movement of the overall sum we can conclude, due to
the stable decline of donation numbers, that this results from the up and
4
Note, that we do not count the initial donation, as otherwise the share for 2002 would
simply be 100%.
CHAPTER 3. EXPLORATORY DATA ANALYSIS 19
downs in the average donation amount. This chart (on the right hand side
of figure 3.6) also surprisingly shows seasonal fluctuation, and has no clear
overall trend at all, making it hard to extrapolate into the future.
25000
2e+05
10000
0e+00
0
2002 2003 2004 2005 2006 2007 2002 2003 2004 2005 2006 2007
Time Time
0.4
20000
0.2
2002 2003 2004 2005 2006 2007 2002 2003 2004 2005 2006 2007
Time Time
1500
1000
500
0
0 3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51
of intertransaction times, with the former one donating about every year, and
the latter one donating regularly each month. As we will see, this particular
observed regularity will play a major role in the upcoming modeling phase.
0 76 178 292 406 520 634 748 862 976 1103 1243 1383 1524
400
0 76 178 292 406 520 634 749 870 994 1126 1385
Forecast Models
4.1.1 Assumptions
22
CHAPTER 4. FORECAST MODELS 23
0.4
0.4
r=1 r=1 r=3
0.3
0.3
0.3
p = 0.4 p = 0.2 p = 0.5
0.2
0.2
0.2
0.1
0.1
0.1
0.0
0.0
0.0
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9
Figure 4.1: Probability mass function of the Negative Binomial Distribution for
different parameter values.
Poisson Distribution
0.4
0.4
0.4
0.3
0.3
0.3
0.2
0.2
0.1
0.1
0.1
0.0
0.0
0.0
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9
Figure 4.2: Probability mass function of the Poisson Distribution for different
parameter values.
Gamma Distribution
0.5
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.2
0.1
0.1
0.1
0.0
0.0
0.0
0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10
Figure 4.3: Probability density function of the Gamma Distribution for different
parameter values.
CHAPTER 4. FORECAST MODELS 25
We will now apply the NBD model on the data set from the DMEF compe-
tition. First, we will estimate the according parameters, then check how well
the model fits the data on an aggregated level, and finally we will calculate
individual estimates.
Ehrenberg suggests an estimation method for the parameters α and r, that
only requires the mean number of purchases m and the proportion share of
non-buyers p0 (Ehrenberg, 1959). Yet, with modern computational power it
is no problem at all anymore, to apply a Maximum Likelihood Estimation
(abbr. MLE) for these parameters for the cohort size at hand. The MLE
method tries to find those parameter values, for which the “likelihood” of
the observed data is maximized, and has the favorable property of being an
asymptotically unbiased, asymptotically efficient and asymptotically normal
estimator.
The calculation of the likelihood for the NBD model requires two pieces of
information for each individual: The length of observed time T , and the
number of transactions x within interval (0, T ]. The time span T differs
for each donor, resp. customer, as the date of the first transaction has taken
place anytime within the first half of 2002. Also note, that x does not include
the initial transaction, as that transaction occurred by definition for each
person of our cohort. As we will see the upcoming models will also require
another piece of information, namely the recency for each customer, i.e. the
timing tx of the last recorded transaction. With this notation we closely
follow the variable conventions used in Schmittlein et al. and Fader et al..
Generally, the summary for each customer consisting of recency, frequency
and a monetary value is often referred to as RFM variables, and is commonly,
not just for probabilistic models, the condensed data base of many customer
base analyses. The layout of the transformed data can be depicted from
table 4.2.
The MLE estimation procedure applied on the transformed data results in
CHAPTER 4. FORECAST MODELS 27
id x tx T amt
10458867 0 0 1605 25.42
10544021 1 728 1602 175.00
10581619 7 1339 1592 80.00
.. .. .. .. ..
9455908 0 0 1595 25
9652546 4 1365 1612 450
9791641 4 1488 1687 275
with both parameters being highly significant unequal to zero. The general
shape of the resulting gamma distribution can be depicted from the left
chart of figure 4.3, i.e. it is reversed J-shaped. This implies that the mass
of donors have a very low donation frequency, with the mode being at zero,
the median being 0.00042 and the mean being 0.00095. In terms of average
intertransaction times this result reflects an average time period of 1048
days (=2.9 years) between two succeeding donations, and with half of the
donors donating less often than every 2406 days (=6.6 years). Considering
that the majority of donors has not re-donated at all during the observation
period (see section 3.2), these long intertransaction times are obviously a
consequence of the overall low observed donation frequencies.
But, how well does our model now represent the data? To answer this ques-
tion, we compare the actual observed donation counts with their according
theoretical counts, calculated by the NBD model. Table 4.3 contains the
result.
0 1 2 3 4 5 6 7+
Actual 10,626 3,579 2,285 1,612 1,336 548 348 832
NBD 10,617 3,865 2,183 1,379 918 629 439 1,135
Table 4.3: Comparison of actual vs. theoretical count data for the complete time
span.
more frequent donors drift apart for higher counts, indicating that the model
is not able to fit our data structure well.
Let’s now take a look at the predictive accuracy of the NBD model on an in-
dividual level, which is, considering the DMEF competition, our main focus
when evaluating models. We therefore split the overall observation period of
4 years and 8 months, into a calibration period of 3.5 years and a validation
period of one year6 . By considering only the first 3.5 years for model cali-
bration, slightly different parameter estimates (r = 0.53, α = 501) are being
returned than before. We will now calculate a conditional estimate for each
individual for a one year period, based on their observed frequency x and
their observed time span T . Table 4.4 contains the actual and the estimated
average number of donations during the validation period, split by the ac-
cording number of donations during the training period. I.e. those people,
that did not donate at all within the first 3.5 years, donated in average 0.038
times in the following year, whereas the NBD model only predicted an av-
erage of 0.001 donations. On the other hand, as can be seen, the future
donations of the frequent donors are being vastly overestimated. Overall on
an aggregated level, the NBD model estimates 11,088 number of donations,
which is nearly twice as much as the actual number (6,047).
0 1 2 3 4 5 6 7+
Actual 0.038 0.196 0.428 0.686 0.747 1.061 1.540 2.442
NBD 0.001 0.423 0.844 1.266 1.687 2.109 2.530 4.676
Table 4.4: Comparison of actual vs. theoretical avg. number of donations during
the validation period.
One possible explanation for the poor performance of the NBD model is the
long overall time period, in combination with the assumption, that all donors
remain active. The upcoming section will present a model, that explicitly
takes a possible defection of customers into account.
4.2.1 Assumptions
CONTINUE
6
We will only consider validation periods being multiples of one year, in order to cir-
cumvent modeling the strong seasonal influence, that has been detected in section 3.3
CHAPTER 4. FORECAST MODELS 29
4.3.1 Assumptions
4.4.1 Assumptions
4.5 Heuristic
CITE wuebben regarding similar results
4.8.1 Assumptions
Figure 4.4: Timing patterns of those donors, for whom the BG/NBD model
produced the worst estimates for the validation period.
Conclusion / Discussion
35
Appendix A
Derivation of CBG/CNBD-k
X∞
(a)j (b)j z j
2 F1 (a, b; c; z) = , c 6= 0, −1, −2, . . . ,
j=0
(c)j j!
Γ(a + j)
(a)j = .
Γ(a)
A.3 Assumptions
A1 While active, transactions of customers occur with Erlang-k (rate pa-
rameter λ) distributed waiting times.
36
APPENDIX A. DERIVATION OF CBG/CNBD-K 37
These assumptions differ from the CBG/NBD model only regarding the mod-
ified assumption A1, and the newly introduced assumption A6.
A.4 Erlang-k
The Erlang-k distribution with parameters k and λ is defined by the proba-
bility density
1
fΓ (t|k, λ) = λk tk−1 e−λt ∀t > 0; k ∈ N+ , λ > 0. (A.1)
(k − 1)!
X
k−1
Pk (X(t) = x) = PP (X(t) = kx + j). (A.2)
j=0
t
t0 = 0 t1 t2
3·2 × × × × × × × × × -
t00 t01 t02 t03 t04 t05 t06 t07 t08
3·2+1 × × × × × × × × × -
t07 t08
3·2+2 × × × × × × × × × -
t07 t08
Inserting the Erlang-k pdf (A.1) and our previous result (A.2) it follows that
L(λ, p|t1 , . . . , tx , T ) =
λk tk−1
1 e−λt1 λk (tx − tx−1 )k−1 e−λ(tx −tx−1 )
= (1 − p)x · ···
(k − 1)! (k − 1)!
( )
X
k−1
· p + (1 − p) PP (X(T − tx ) = j|λ)
j=0
t̃ :=
z }| {
x kx −λtx
= (1 − p) λ e (1/(k−1)!)x (tx −tx−1 )k−1 · · · (t1 −0)k−1
( )
X
k−1 j
λ (T − tx )j
−λ(T −tx )
· p + (1 − p)e
j=0
j!
X
k−1 j
λ (T − tx )j
= t̃ · p(1 − p)x λkx e−λtx + t̃ · (1 − p)x+1 λkx e−λT (A.3)
j=0
j!
One major difference of this result from the likelihood methods of models with
exponential timing is, that we still have the actual timing of the transactions
t1 , ..., tx (which we subsumed into variable t̃) in our final formula. (x, tx , T ) is
therefore not a sufficient statistic anymore for the likelihood. But, as we will
see shortly, we do not need these timings for the estimation of the parameters,
and therefore actually do not impose any extra requirements regarding the
input data.
APPENDIX A. DERIVATION OF CBG/CNBD-K 40
(r̂, α̂, â, b̂) = argmax L(r, α, a, b|(ti,1 , ..., ti,x , Ti )i=1..N )
r,α,a,b
Y
N
= argmax L(r, α, a, b|ti,1 , ..., ti,x , Ti ))
r,α,a,b
i=1
And as we can now see, we can simply drop the cumulative term t˜i for the
exact timing patterns, since these multiplicative factor has no effect on the
location of the maximum, i.e. on the estimated parameters. Therefore we
can stick to (x, tx , T ) as input data for our further calculations.
To circumvent problems with numerical precision it is common to actually
optimize the logarithmic of the likelihood, which transforms the multiplica-
tion (of very small numbers) into a sum.
X
N
(r̂, α̂, â, b̂) = argmax log(L(r, α, a, b|ti,1 , ..., ti,x , Ti )) (A.14)
r,α,a,b
i=1
APPENDIX A. DERIVATION OF CBG/CNBD-K 42
X
kx+k−1
P (X(t) = x|λ, p) = (1 − p)x+1 PP (X(t) = j)
j=kx
à !
X
kx−1
+ p(1 − p)x 1 − δx>0 PP (X(t) = j) . (A.16)
j=0
Note, that we added the Kronecker-Delta which is 1 for x > 0, and 0 other-
wise, to correctly consider the case x = 0 for which the second summation
term simply becomes the dropout probability p at time zero.
Again we mix in our heterogeneity assumptions:
P (X(t) = x|r, α, a, b) =
Z 1Z ∞
= P (X(t) = x|λ, p)fΓ (λ|r, α)fB (p|a, b) dλ dp
0 0
Z 1 Z ∞ Ãkx+k−1
X (λt)j
!
= (1 − p)x+1 fB dp e−λt fΓ dλ
0 0 j=kx
(j)!
Z 1 Z ∞à X (λt)j
kx−1
!
x −λt
+ p(1 − p) fB dp 1 − δx>0 e fΓ dλ (A.17)
0 0 j=0
(j)!
APPENDIX A. DERIVATION OF CBG/CNBD-K 43
still being active at the end of the observation period, based on his past
transaction history. I.e. we ask for P (τ > T |t1 , .., tx , T, r, α, a, b) with τ being
the unobserved customer lifetime.
By expanding this term with t̃(1 − p)x λkx e−λtx , and comparing the denomi-
nator with equation (A.3) it follows, that
Pk−1 λj (T −tx )j
t̃(1 − p)x+1 λkx e−λT j=0 j!
P (τ > T |t1 , .., tx , T, λ, p) = (A.22)
L(λ, p|t1 , .., tx , T )
and using the following result from section 3.2.3 of (Hoppe and Wagner,
2008)
L(λ, p|t1 , .., tx , T )fΓ (λ|r, α)fB (p|a, b)
f (λ, p|t1 , .., tx , T ) = (A.24)
L(r, α, a, b|t1 , .., tx , T )
yields
Comparing this with equation (A.11), we can see that the numerator is actu-
ally one of the summation terms of the aggregated likelihood function in the
A A −1
denominator. And considering A+B = (1 + B ) the fraction can be reduced
to
P (τ > T |t1 , .., tx , T, r, α, a, b) =
à !−1
t̃ · IB (1, x, a, b) · IΓ (kx, tx , r, α)
= 1+ Pk−1 j
t̃ · IB (0, x + 1, a, b) · j=0 (T −t j!
x)
IΓ (kx + j, T, r, α)
(A.26)
Fortunately the term t̃ cancels out, and therefore we still do not need the
exact timing of the transactions for our calculations. Now we resolve the
integral functions, extract common terms and use the relation (r)kx+j = (r)kx ·
(r + kx)j and yield
P (τ > T |x, tx , T, r, α, a, b) =
Ã
B(a + 1, b + x) αr (r)kx (α + T )r+kx
= 1+ · ·
B(a, b + x + 1) (α + tx )r+kx αr (r)kx
!−1
X
k−1
(T − tx )j j
/ (r + kx)j (α + T )
j=0
j!
à µ ¶r+kx X k−1
!−1
a α+T (T − tx )j (r + kx)j
= 1+ / . (A.27)
b + x α + tx j=0
j! (α + T )j
1X
∞
= krPP (kr)
k r=1
à !
X
k−1
j X
∞
k−j X
∞
+ krPP (kr − k + j) + krPP (kr + j)
j=1
k 2
r=1
k 2
r=1
| {z }
=:Tj
Tj can be reduced to
j ³X X ´
Tj = 2 (kr−k+j)PP (kr−k+j) + (k−j)PP (kr−k+j)
k
k−j ³X X ´
+ 2 (kr+j)PP (kr+j) − jPP (kr+j)
k
j ³X X ´
= 2 (kr−k+j)PP (kr−k+j) + (k−j)PP (kr−k+j)
k
k−j ³X X ´
+ 2 (kr−k+j)PP (kr−k+j) − jPP (j) − jPP (kr−k+j) + jPP (j)
k
1X
= (kr − k + j)PP (kr − k + j),
k
and we receive our result for the unconditional expected number for asyn-
APPENDIX A. DERIVATION OF CBG/CNBD-K 47
chronous counting:
1X X 1X
∞ k−1 ∞
E(X(t)|λ) = krPP (kr) + (kr − k + j)PP (kr − k + j)
k r=1 j=1
k r=1
1X
∞
λt
= rPP (r) = (A.29)
k r=1 k
For a synchronous counting process with Erlang-k waiting times the deriva-
tion of the expectation unfortunately becomes a bit trickier. Using result (A.2)
we can deduce
X
∞ X
∞ X
k−1
E(X(t)|λ) = rPG (r) = r PP (rk + j)
r=1 r=1 j=0
k−1 ³ X
X ∞ ´
1
= rkPP (rk + j)
k j=0 r=1
1 X³X X ´
k−1 ∞ ∞
= (rk + j)PP (rk + j) − j PP (rk + j)
k j=0 r=1
P
|r=1 {z }
∞
r=0
PP (rk+j)−PP (j)
1³X X X ¡X ¢´
∞ k−1 k−1 ∞
= rPP (r) − rPP (r) − j PP (rk + j) − PP (j)
k r=0 r=0 j=0 r=0
1³ X ´
k−1 X ∞
= λt − j PP (rk + j) .
k j=1 r=0
1 ³ X∞
(λt)2r+1 ´
= λt − e−λt
2 r=0
(2r + 1)!
λt 1 −λt
= − e sinh(λt) (A.30)
2 2
The result for the synchronous counting process (A.30) differs from the asyn-
chronous result (A.29) only by an additional subtraction term, that converges
APPENDIX A. DERIVATION OF CBG/CNBD-K 48
for Erlang-2 to 1/4 for t → ∞. I.e. for a long time horizon we can assess the
error, that we make, if we use the simpler formula (A.29).
Unfortunately the author did not succeed in deriving a closed form for the
expression E(X(t)|r, α, a, b). We could derive a (rather complex) expression
for E(X(t)|λ, p) for k = 2, but subsequently incorporating heterogeneity would
have required solving double integrals of the form
Z 1Z ∞ √
pv4 (1 − p)v5 λv1 e−λ(v3 +v2 1−p)t
dλ dp. (A.33)
0 0
for the unconditional expected number of transactions until time t for their
CBG/NBD model.
Recalling our findings, that the expectation for asynchronous counting is
simple 1/k of the corresponding Poisson process (see equation (A.29)), and
that the synchronous counting only differs by some term, that becomes a
constant for long time horizon, we simply approximate the expected number
of transactions for the CBG/CNBD-k model with
1 b
Ê(X(t)|r, α, a, b) = · · G(r, b, b, α | α, t). (A.36)
k a−1
But even if we come up with a proper solution for the unconditional expec-
tation, the next hurdle is to calculate the expected number of future trans-
actions, based on a given purchase history. Due to the fact, that as opposed
to the exponential distribution the Erlang-k distribution is not memoryless,
we can not use the relation
for the CBG/NBD model. In their erratum (Wagner and Hoppe, 2008) to
(Batislam et al., 2007) they note, that it is possible to come up with the result
for the forecast by updating the parameters (r, α, a, b) to (r + x, α + T, a, b + x).
We use our exact derivation (A.27) for P (τ > T |x, tx , T, r, α, a, b), and combine
this with our approximation for the expectation from the previous section.
Additionally we will update the parameters accordingly from (r, α, a, b) to
(r + kx, α + T, a, b + x) (since we encountered kx uncensored events within
(0, T ]).
1 a+b+x
Ê(Y (T, T + t)|x, tx , T, r, α, a, b) = ·
k a−1
· G(r + kx, b + x, b + x, α + T | α, t)
· P (τ > T |x, tx , T, r, α, a, b) (A.39)
R Source Code
All calculations for this thesis have been carried out with the statistical soft-
ware R CITE. The author releases the following code bits under the liberal
open-source license Apache2 (FIXME url), and as such does not any respon-
sibility, whatsoever FIXME. Nevertheless the correctness of the implemen-
tations of Pareto/NBD, BG/NBD as well as of CBG/NBD could be verified
by comparing calculated figures regarding the CDNOW dataset with those
from the referred articles.
Listing B.1: Gaussian Hypergeometric Function
h 2 f 1 <− function ( a , b , c , z ) {
j <− 0
u j <− 1
y <− u j
l t e p s <− 0
while ( l t e p s < 1 ) {
l a s t y <− y
j <− j +1
u j <− u j ∗ ( ( a+j −1) ∗ ( b+j −1) ∗ z ) / ( ( c+j −1) ∗ j )
y <− y + u j
l t e p s <− sum( y==l a s t y ) /length ( y )
}
return ( y )
}
51
APPENDIX B. R SOURCE CODE 52
B.0.2 EDA
B.1 NBD
B.2 Pareto/NBD
B.3 BG/NBD
B.4 CBG/CNBD-k
P. Fader, B. Hardie, and K.L. Lee. Counting Your Customers the Easy Way:
An Alternative to the Pareto/NBD Model. Marketing Science, 24:275–284,
2005a.
P. Fader, B. Hardie, and K.L. Lee. RFM and CLV: Using Iso-Value Curves
for Customer Base Analysis. Journal of Marketing Research, 42:415–430,
2005b.
53
BIBLIOGRAPHY 54
D. Hoppe and U. Wagner. Customer Base Analysis: The Case for a Cen-
tral Variant of the Betageometric/NBD Model. Marketing - Journal of
Research and Management, 2:75–90, 2007.
D.R. Mani, J. Drew, A. Betz, and P. Datta. Statistics and data mining
techniques for lifetime value modeling. In Proceedings of the fifth ACM
SIGKDD international conference on Knowledge discovery and data min-
ing, pages 94–103. ACM New York, NY, USA, 1999.
D.G. Morrison and D.C. Schmittlein. Generalizing the NBD Model for Cus-
tomer Purchases: What Are the Implications and Is It Worth the Effort?
Reply. Journal of Business and Economic Statistics, 6(2):165–66, 1988.
R.D. Wheat and D.G. Morrison. Estimating Purchase Regularity with Two
Interpurchase Times. Journal of Marketing Research, 27(1):87–93, 1990.