Vous êtes sur la page 1sur 12

Applied Mathematics and Computation 296 (2017) 277288

Contents lists available at ScienceDirect

Applied Mathematics and Computation


journal homepage: www.elsevier.com/locate/amc

Extracting clusters from aggregate panel data: A market


segmentation studyR
Graa Trindade a, Jos G. Dias a,, Jorge Ambrsio b
a
Business Research Unit, Instituto Universitrio de Lisboa (ISCTE-IUL), Lisboa, Portugal
b
LAETA, IDMEC, Instituto Superior Tcnico, University of Lisbon, Av. Rovisco Pais 1, 1049-001 Lisboa, Portugal

a r t i c l e i n f o a b s t r a c t

Keywords: This paper introduces a new application of the Sequential Quadratic Programing (SQP) al-
Sequential quadratic programing gorithm to the context of clustering aggregate panel data. The optimization applies the
Cluster analysis
SQP method in parameter estimation. The method is illustrated on synthetic and empirical
Panel data
data sets. Distinct models are estimated and compared with varying numbers of clusters,
Market segmentation
explanatory variables, and data aggregation.
Results show a good performance of the SQP algorithm for synthetic and empirical data
sets. Synthetic data sets were simulated assuming two segments and two covariates, and
the correlation between the two covariates was controlled in three scenarios: = 0.00
(no correlation), = 0.25 (weak correlation), and = 0.50 (moderate correlation). The
SQP algorithm identies the correct number of segments for these three scenarios based
on all information criteria (AIC, AIC3, and BIC) and retrieves the unobserved heterogeneity
in preferences. The empirical case study applies the SQP algorithm to consumer purchase
data to nd market segments. Results for the empirical data set can provide insights for
retail category managers because they are able to compute the impact on the marginal
shares caused by a change in the average price of one brand or product.
2016 Elsevier Inc. All rights reserved.

1. Introduction

Cluster analysis or clustering is the research eld that deals with the denition of groups of objects (called clusters or
segments) in such a way that members of the same cluster are more similar to each other than to those in other groups.
Clustering techniques have been enhanced in many elds of research, such as machine learning, statistics, bioinformatics,
and marketing (e.g., market segmentation). In recent years, widespread data collection of longitudinal and stream data has
created the need for the identication of unique groups or trajectories in panel data. This type of observations tends to
be challenging for any clustering process given data dependency [1]. Despite having attracted much attention in statistics
and machine learning, most of the proposals have adapted the clustering of cross-sectional data to longitudinal data [2,3].
Heuristic cluster analysis for time series data may operate directly on the correlation matrix, a common practice in nancial
econometrics (e.g., [4,5]). More recently, hybrid algorithms have been introduced combining ltering processes that control

R
The authors would like to thank the editor-in-chief, an associate editor, and four anonymous reviewers for their constructive comments, which helped
us to improve the manuscript.

Correspondence to: Department of Quantitative Methods for Management and Economics, Edifcio ISCTE, Av. Foras Armadas, 1649-026 Lisboa, Portugal.
Fax: +351 217964710.
E-mail addresses: jose.dias@iscte.pt, jose.g.dias@gmail.com (J.G. Dias).

http://dx.doi.org/10.1016/j.amc.2016.10.012
0 096-30 03/ 2016 Elsevier Inc. All rights reserved.
278 G. Trindade et al. / Applied Mathematics and Computation 296 (2017) 277288

the longitudinal structure with heuristic clustering. For example, Sfadi [6] proposes ltering the time series using indepen-
dent component analysis and then, based on the coecients or correlation obtained, time series are clustered by complete
linkage. An alternative process of ltering using hidden Markov models prior to heuristic clustering has been suggested to
cluster time series [7,8].
Model-based clustering, also known as nite mixture or latent class models, has proven to be a powerful paradigm in
many scientic elds as a parametric alternative to heuristic clustering [9,10]. Many applications have been developed with
different purposes ranging from outlier detection to density estimation. Nevertheless, cluster analysis has been its main
objective by assuming that each component of the mixture is a distinct cluster. In the context of survival or reliability
analysis, Razali and Al-Wakeel [11] and Elmahdy [12] model survival data using mixtures of Weibull distributions, whereas
Alves and Dias [13] compare distinct specications of mixtures in the context of behavioral credit scoring analysis. Proposals
that accommodate for serial dependencies in clustering can have different denitions given data structure. For instance, a
mixture of Markov chains [14] and a mixture of hidden Markov models [15] have been applied to clustering times series
data.
Market segmentation is one of the most important applications of clustering. It simplies a complex market structure
by dividing the market into submarkets and provides the foundations for developing specic strategies for each segment. It
uses demographic, geographic, or other segmentation criteria that can help uncover behavioral differences associated with
specic groups of consumers. As a result of a lack of information about demand, the market analysis is often based on
supply-side data. For over a decade, mixture models have been the standard technique for market analysis [16]. The Latent
Segment Logit (LSL) model of Zenor and Srivastava [17], which is a generalization of the multinomial logit (MNL) model
of McFadden [18], allows heterogeneity through segment-varying parameters. In other words, it retrieves the structure of
market segments that is lost due to data aggregation when price is the variable to explain the choice between different
products/brands in the same market.
This study proposes a mixture model for clustering panel data that takes unobserved heterogeneity due to aggregation
into account. It extends the model proposed in Zenor and Srivastava [17] by taking multiple covariates. Apart from the
generalized specication of the model, a deterministic algorithm for model estimation is discussed and applied to a market
segmentation case study.
In the next section (Section 2), the model is dened by introducing a general clustering framework that can be used in
other contexts besides market segmentation. Section 3 addresses the algorithm for estimating the model. Section 4 discusses
inference and model selection. Section 5 explores the model using synthetic data. Section 6 illustrates the use of the model
in the context of market segmentation. Two levels of aggregation are discussed: brand- and product-level analyses. The
paper ends with a discussion of the main contributions, limitations, and suggestions for further extensions.

2. Denition of the model

Following the model of Zenor and Srivastava [17], there are two sets of latent variables that are not directly observed
in the input data which are modeled as (1) the multinomial logit choice of option i, at time t, within cluster s (mist ) and
(2) the expected frequencies within each cluster, where the observed choice frequencies of each option are combined in a
multinomial logit specication to generate (nist ). These variables are given by

exp (0is + v vs Xivt )
Mist =    
j exp 0 js + v vs X jvt

S
Mit = gs Mist
s=1
mist = E [Mist |s ]
gs mist
Nist = Nit 
r gr mirt
nist = E [Nist |s , gs ]

where Mist is the proportion of option i in cluster s at time t; Mit is the proportion of option i at time t; thus, mist is the
expected share of option i in cluster s at time t. The intercept parameters in cluster s are 0is and the slope parameters are
vs ; gs is the size of cluster s; Xivt is the explanatory variable v for option i observed at time t; and Nit is the total number
of counts of option i observed at time t and V is the number of explanatory variables in the model. The expectations of
the number of counts of option i in cluster s at time t is given by nist . To keep the model statistically identied, one of the
intercepts in each cluster needs to be xed at zero.
Assuming that the expected number of counts, nist , is the product of an independent multinomial, the likelihood function
is dened by
 
 ( i s nist )!  
L ( , g ) =   (gs mist )
nist
.
t i s nist ! s
i
G. Trindade et al. / Applied Mathematics and Computation 296 (2017) 277288 279

The maximizing of the logarithm of the likelihood function with respect to the model parameters (, g) is accomplished by
solving a minimization problem of F = ln L(, g ) and is formulated as [19]:

min F (, g )

S
s.t. gs = 1
s=1
0 < gs 1 , s = 1 , . . . , S
0m < 0is < 0M , s = 1, . . . , S, i = 1, . . . , P
m < vs < M , s = 1, . . . , S, v = 1, . . . , V,
where S is the number of clusters, P is the number of products/brands, and V is the number of explanatory variables con-
sidered in the model.

3. The optimization algorithm

The maximization of the log-likelihood function needs to be solved by an iterative method. Although it is possible to
apply either deterministic or non-deterministic methods of optimization based on the use of gradients for the maximization
problem, deterministic optimization methods require a small number of objective function evaluations and are faster than
the non-deterministic methods like genetic algorithms [20,21]. Moreover, the optimization methods based on the use of
gradients are suitable for this model given that the space of the values of the coecients is convex [17].
The Sequential Quadratic Programing (SQP) is the deterministic optimization algorithm selected for this work due to
evidence found by Trindade and Ambrsio [19] that it provides a better computational guarantee of reliability and precision
than other deterministic methods such as the Modied Admissible Directions (MAD) and the Sequential Linear Programing
(SLP) methods. Additionally, Shen et al. [22] suggest that the main advantage of using the SQP algorithm is that it is globally
convergent without requiring strong constraint qualications, and this convergence is quadratic near a stationary point.
In short, the main intent of the Sequential Quadratic Programing method (SQP) is to use the linear and quadratic
terms in the expansion of Taylors series of the objective function, designated here as F(Y) while still using the lin-
ear parts of Taylors series expansion of the constraints, which are dened here generically as G(Y). Both the ob-
jective function and the set of constraints depend on the model parameters, which are grouped in vector Y =
[g1 . . . gS 011 . . . 0P1 . . . 0PS 11 . . . V 1 . . . V S ]T . The Taylors series expansion of F(Y) and G(Y) are now written as
1
F (Y + Y )
= F ( Y ) + F ( Y )T Y + YT B Y (1)
2
G (Y + Y )
= G ( Y ) + G ( Y )T Y (2)
where B is a numerical approximation of the Hessian matrix. In the rst step, i.e., when the counter of the iterations of
the optimization procedure (j) is 1, B is approximated by the identity matrix. F(Y)T and G(Y)T are the gradients of the
objective function and constraints, respectively, which are calculated using forward nite differences. Y is the increment
for the variables in the search direction. The computer implementation of SQP builds the Hessian matrix using the Boyden
FletcherGoldfarbShanno formula (BFGS) [23], objective function and constraint gradients using forward nite differences,
and the search direction and increment step using the FletcherReeves method [24]. Note that the BFGS maintains the
symmetry and positive deniteness of the Hessian matrix.
The iterative process of the minimization problem starts with an estimation of the vector of the parameter constraints
( j) ( j) ( j) ( j) ( j) ( j) ( j) ( j)
Y( j ) = [g1 . . . gS 011 . . . 0P1 . . . 0PS 11 . . . V 1 . . . V S ]T for the initial iteration, i.e., for j = 0, the model parameters are
estimated either by trial and error or by any suitable methodology used for the design of experiments such as the Latin
hypercube, central composite design, random numbers, etc. [25,26]. Afterwards, the algorithm proceeds with the following
iterative steps:

(a) Update the counter: j j + 1;


(b) Compute the objective function F(Y(j) ) and the constraints G(Y(j) );
(c) Compute the gradients of the objective function F(Y(j) )T and of the constraints G(Y(j) )T ;
(d) Determine the search direction to be adopted in this iteration, h(j) , which plays the role of the direction of Y in
Eqs. (1) and (2) and is dened based on the gradients using the specic implementation of the SQP algorithm [24];
(e) Determine the dimension of the step for searching the minimum, (j) , which plays the role of the magnitude of Y in
Eqs. (1) and (2), i.e., Y  (j) h(j) , being dened according to the specic implementation of the SQP algorithm [24];
(f) Compute the new vector of model estimates as Y( j+1 ) Y( j ) + ( j ) h( j ) ;
(g) The iterative process stops if the minimum of the objective function is reached; otherwise, it goes back to step (a).

Fig. 1 depicts the owchart of the algorithm.



In this iterative process, G(Y( j ) ) Ss=1 gs is the only equality constraint of the optimization problem. All other Y(j) pa-
rameter restrictions are inequality constraints, corresponding to the limits of the range of variation of the parameters, which
280 G. Trindade et al. / Applied Mathematics and Computation 296 (2017) 277288

Fig. 1. Flowchart of the SQP optimization algorithm.

are not included in vector G(Y(j) ). These side constraints are handled differently by the optimization algorithm and are gen-
erally substituted by two inequality constraints [27]. The denition of the parameters contained in vector Y(j) is driven only
by the lower and upper limits of each coecient that are set at 0m = 0, 0M = 10, m = 10, and M = 10. The search
direction is usually the unit vector associated to the gradient of the objective function, although it may vary for differ-
ent optimization methods [28]. Here the search direction is calculated using the conjugate direction method proposed by
Fletcher and Reeves [24]. Similarly, the step in each iteration process is set in the optimization method according to the
FletcherReeves method [24].
In this work, the SQP method available in the library of mathematical functions for optimization DOT is used [29]. The
search strategy of directions of the optimization problem is based on the calculation of the sensitivities to the variables, ob-
tained internally to the SQP computer implementation by using numerical sensitivities given by forward nite differences.
As the problems addressed here do not have a closed-form solution, the derivation and numerical implementation of analyt-
ical sensitivities is not considered in this work. The use of nite differences for computing the sensitivities means that each
G. Trindade et al. / Applied Mathematics and Computation 296 (2017) 277288 281

model coecient must be independently disturbed thus allowing the optimization algorithm to dene the search direction
towards the optimum without violating the constraints.
The calculations of expected values of the latent variables under the optimization problem require that a set of starting
values for the model parameters, or coecients, is supplied. Because optimization methods based on the use of gradients
do not guarantee a global maximum, care must be taken to generate the initial estimates so that the space of solutions
is eciently covered. Aird and Rice [25] provide an algorithm to design the computational experiments required, i.e., to
determine the initial vector of estimates that optimally covers a parameter space. In order to generate the initial sets, the
only information needed is the range of variation of each of the coecients in the model and the number of estimates to
be generated. Therefore, the number of initial sets used to solve the problem, which equals the number of optimization
processes, is a users decision. Finally, implementation is not limited by the number of segments allowed in each model, but
the number of segments used in the present case study goes from one to six.1

4. Selection of the number of clusters and inference

Determination of the best number of clusters was based on the information criterion (AIC) introduced by Akaike [30],
the information criterion of Akaike with penalization 3 instead of 2 (AIC3), and the Bayesian information criterion (BIC)
of Schwarz [31]. Simulations studies show that AIC3 and BIC perform well in retrieving the right model (avoiding under
tting and overtting) where lower values indicate a better t [32]. BIC solution is chosen in case of disagreement between
criteria. Thus, various models were estimated, each with a different number of clusters, and the one with minimum values
of the criteria was selected. Within the same number of clusters, model comparison is based on the P seudo-R2 [17] given
by:
ln Lm ln L0
P seudo-R2 =
ln L p ln L0
where ln Lm is the logarithm of the likelihood function when Mit are the estimates of the respective model; ln Lp is the
logarithm of the likelihood function for a perfect (saturated) model, that is, when Mit are the observed proportions; ln L0 is
the logarithm of the likelihood function when Mit are the marginal proportions, i.e., when Mit = Mi .
The Wald test is used to calculate the p-values of each estimated coecient since the statistical test asymptotically
converges to a chi-square distribution with one degree of freedom under very general conditions. In the inference of the
parameter of the model, the standard errors associated to the estimated coecients can be computed if the inverse of the
Hessian matrix is dened. The Hessian matrix in these models is often badly-conditioned because small perturbations in its
coecients lead to large changes in the solution of the system of equations. Furthermore, the need to use nite differences
to calculate the Hessian matrix inevitably introduces numerical truncation and approximation errors. In general, for cases
in which a matrix is not positive denite or it is bad-conditioned, a generalized inverse of MoorePenrose [33] can be
dened. Here, the pseudo-inverse Hessian matrix is computed using the method proposed by Dennis and Schnabel [34] and
implemented in routine DFDHES of mathematical IMSL library [35].

5. Synthetic data sets

This section examines the performance of the model using synthetic data sets. It is assumed ve brands and 500 time
points. Table 1 gives the true values of the parameters. A structure with two segments (S = 2) with sizes 60% and 40%
is considered. Intercepts and slopes are shown in the same table (Brand 1 is the reference). Sales and market shares are
obtained using the denition of the model in Section 2. Two covariates are included in the model: X1 and X2 . To mimic the
case study, it is assumed that the former is the price (true values of the slopes are negative) and the latter is availability
varying between 0 and 1. Thus, X2 values were sampled from a beta distribution. Heterogeneity between brands is retained
by changing the expected values of the beta distribution to 0.9, 0.8, 0.7, 0.6, and 0.5 for brands 1, 2, 3, 4, 5, respectively. The
sampling of X1 was conditional on the availability (X2 ) to reveal the potential impact of the correlation between covariates
(multicollinearity) on the results. Thus, prices (X1 ) are sampled from the conditional normal distribution given by
X1 |X2 N (X1 |X2 , X21 |X2 )
with
X1
X1 |X2 = X1 + (X X2 )
X2 2
and
X21 |X2 = X21 (1 2 ).

1
The lower bound to the number of clusters is the aggregate model and the maximum number of clusters is set to be six, i.e., a partition of the data set
that is still interpretable. After computing the information criteria (AIC, AIC3, and BIC), the best solution for uncovering the unobserved heterogeneity can
be identied. In the case that the best solution is K = 1, there is not unobserved heterogeneity. In this case covariates are able to capture the variability in
the data (the multinomial logit model of McFadden). At the other extreme, if the best solution is K = 6, then it is necessary to continue to check whether
a model with K = 7 is worse than K = 6. The process goes on until the minimum of the information criteria is found.
282 G. Trindade et al. / Applied Mathematics and Computation 296 (2017) 277288

Table 1
Synthetic data.

True values = 0.00 = 0.25 = 0.50


Segments #1 #2 #1 #2 #1 #2 #1 #2

Segment sizes 0.6 0.4 0.559 0.442 0.536 0.464 0.530 0.471
Intercepts
Brand1 0 0
Brand2 3 1 3.428 3.096 3.800 3.600 3.880 2.856
Brand3 4 2 3.986 5.0 0 0 4.900 5.900 3.910 3.100
Brand4 1 5 1.927 5.650 3.092 3.900 3.660 3.794
Brand5 2 4 2.556 6.0 0 0 3.300 4.012 3.759 3.470
Covariates
X1 3 2.6 3.437 2.148 3.690 1.806 3.780 1.193
X2 2.5 0 2.315 0.040 3.090 0.070 3.350 0.099
Std. errors
X1 0.024 0.149 0.028 0.222 0.044 0.240
X2 0.106 0.150 0.129 0.187 0.148 0.199

Heterogeneous mean prices (X1 ) were set for each brand: 1.00 (Brand 1), 1.75 (Brand 2), 1.50 (Brand 3), 2.00 (Brand
4), and 1.5 (Brand 5). The standard deviation (X1 ) is 0.1 for all brands. The range of these values are in accordance with
the empirical data set (Section 6). Interestingly, the structure of these simulations allows the correlation between the two
covariates to be controlled. More specically, three scenarios are dened: = 0.00 (no correlation), = 0.25 (weak correla-
tion), and = 0.50 (moderate correlation).
Models in which the number of segments vary from 1 to 4 were estimated using the SQP algorithm with 400 distinct
random starting values. The algorithm identies the correct number of segments for these three scenarios and based on all
information criteria (AIC, AIC3, and BIC). Thus, multicollinearity does not affect model selection.
Regarding model estimation, Table 1 shows the true parameter values, the result of parameter estimation, and standard
errors of parameter estimation for the slope parameters. Segment sizes tend to be close to the true value, where the largest
segment size is 0.559 ( = 0) and the true value 0.60. Results also show that part of the information is lost due to data
aggregation and it is dicult to retrieve. However, intercept and slope estimates are close to the true values in most of
the cases. Results tend to worsen for the smallest segment and the bias grows slightly with increasing correlation. Finally,
the estimates of the standard errors of the slopes (needed to compute the p-values) do not seem to be strongly affected
by the level of correlation between covariates. For these three synthetic data sets, there is a slight growth with increasing
correlation. Thus, the SQP algorithm does not seem to be affected by multicollinearity as the Hessian matrix is still stable
and able to provide estimates of the standard errors.

6. An empirical application

6.1. Market segmentation

Market segmentation is an important step in strategic marketing. Segmentation models divide the market into submar-
kets (clusters of homogeneous consumers) and provide the foundations for developing specic strategies within each one
of them. It uses demographic, geographic, or other segmentation criteria that can help uncover behavioral differences asso-
ciated with specic groups of consumers. The seminal denition of market segmentation is given by Smith [36]: Market
segmentation involves viewing a heterogeneous market as a number of smaller homogeneous markets, in response to dif-
fering preferences, attributable to the desires of customers for more precise satisfactions of their varying wants. This case
study focuses on the unobserved heterogeneity in choice modeling literature using choice-based market segmentation with
aggregate scanner data [37].

6.2. Data set

The scanner data for products in the non-food category, powder detergent for laundry machines, were supplied daily by
the store, a hypermarket, which belongs to an important chain of hypermarkets in the Lisbon area during the time period
under analysis. The panel duration is 70 weeks and it covers the sales of 23 products and 8 brands. The data set has the
following format (Table 2). There are 4 products that differ in weight per package from brands F and G (F1, F2, F3, F4; and,
G1, G2, G3, G4), 3 products from brands A, C, D, and H (A1, A2, A3; C1, C2, C3; D1, D2, D3; and, H1, H2, H3), two products
from brand B (B1, B2); and brand E only sold one product in this market during the time period under analysis.
The price variable (Xij1 ) of product i of brand j is measured in euros per kilo and price promotion (Xij2 ) is measured as
the difference between the price that consumers would have paid if price promotion was not implemented and the price
that they really are saving under the price promotion campaign. Therefore, promotion is measured as the residual that
G. Trindade et al. / Applied Mathematics and Computation 296 (2017) 277288 283

Table 2
Weekly data and brands.

Time Brand1  BrandP

Product11   ProductPV

M11 N11 X111 X112 X113 MPV NPV XPV1 XPV2 XPV3

1 M111 N111 X1111 X1121 X1131 MPV1 NPV1 XPV11 XPV21 XPV31
          
t M11t N11t X111t X112t X113t MPVt NPVt XPV1t XPV2t XPV3t
          
T M11T N11T X111T X112T X113T MPVT NPVT XPV1T XPV2T XPV3T

Table 3
Model selection (brand level).

Models ln L AIC AIC3 BIC Pseudo-R2 Estimated segment sizes

Model 1 65762.8 131577.6 131576.6 131597.8 0.176 0.409, 0.590, 0.001


Model 2 36849.0 73714.0 73722.0 73732.0 0.544 1
Model 3 49911.6 99861.2 99880.2 99903.9 0.391 0.342, 0.658
Model 4 24917.9 49873.8 49892.8 49916.5 0.717 0.259, 0.741
Model 5 36824.9 73667.8 73676.8 73688.0 0.544 1
Model 6 25485.9 51013.8 51034.8 51061.0 0.716 0.288, 0.712

Model 1: Promotion; Model 2: Price; Model 3: Promotion + Availability; Model 4: Price + Availability;
Model 5: Price + Promotion; Model 6: Price + Promotion + Availability.

Table 4
Model 4 estimates (brand level).

Seg. # 1 Seg. # 2

Intercepts
A 5.801 1.900
B (reference)
C 2.066 1.026
D 4.168 1.930
E 0.335 <0.001
F 3.995 4.327
G 5.095 1.660
H 2.023 0.058
Coecients
Price 3.979 3.620
Availability 2.916 0.480
Segment size 0.259 0.741

Note: p < 0.001.

consumers are in fact saving in euros per kilo because a price promotion is being implemented. This denition of price
promotion avoids the potential multicollinearity problem between price and price promotion. Availability (Xij3 ) is a dummy
variable that takes (1) the value of one if the product is in the store at time t, and a value of zero otherwise, or (2) the value
resulting from a fraction between the number of products of brand i that are available in the store during a particular week
and the total number of products of the same brand. This denition of price and price promotion variables minimizes the
potential problem of multicollinearity due to the way price promotion is measured as the residual that consumers in fact
are saving. At same time, the endogeneity problem associated with the correlation between the exogenous variable and the
error term is also minimized or even canceled out by adding explanatory variables besides price. Each specied model was
initialized from 100 different starting values, which is enough to mitigate the chance of convergence to a local maximum.

6.3. Aggregation at brand level

Table 3 shows that only two segments were identied for Model 1 given that the size of the third segment converges
to zero. The estimation procedure could not detect more than one segment for Models 2 and 5 and, in these cases, the
multinomial logit model of MacFadden [18] was revealed.
At brand-level aggregation Model 4 (Price + Availability) is the best in terms of P seudo-R2 and AIC, AIC3, and BIC, because
it has the highest value for the P seudo-R2 and the lowest value of AIC, AIC3, and BIC. Therefore, the best model identies
two segments with sizes of 0.741 and 0.259.
Based on the results depicted in Table 4, the conclusion is that both segments are sensitive to permanent price changes
and to the availability of the brand in the store. These two segments may be distinguished by the relative positions that
284 G. Trindade et al. / Applied Mathematics and Computation 296 (2017) 277288

Table 5
Market share (MS) estimates for model 4 (brand level).

Brands MS at average prices MS at observed weekly prices

Seg. #1 Seg. #2 Marginal Seg. #1 Seg. #2 Marginal

A 0.162 0.023 0.059 0.167 0.024 0.061


B 0.011 0.037 0.030 0.011 0.037 0.030
C 0.019 0.039 0.033 0.026 0.046 0.041
D 0.138 0.056 0.078 0.144 0.061 0.083
E 0.030 0.047 0.043 0.030 0.046 0.042
F 0.171 0.662 0.535 0.176 0.645 0.523
G 0.292 0.060 0.120 0.279 0.065 0.121
H 0.177 0.075 0.102 0.167 0.076 0.099

Brand A Brand F Other

1 1 1
Segment 1

0.5 0.5 0.5

0 0 0
1 1 1
0.5 2 0.5 2 0.5 2
Availability 0 0 Price Availability 0 0 Price Availability 0 0 Price

1 1 1
Segment 2

0.5 0.5 0.5

0 0 0
1 1 1
0.5 2 0.5 2 0.5 2
Availability 0 0 Price Availability 0 0 Price Availability 0 0 Price
Fig. 2. Variations of the market shares of brands A and F in each segment.

each brand occupies in each segment. Brands A, G, and D have the best image/market power (given by the magnitude of the
intercepts) in segment 1, the smaller segment in terms of size (25.9%), compared with brand B, the baseline brand. Brand F
has the best image (4.327) in the larger segment size (74.1%), segment 2. Brands B and E have the weakest brand image in
both segments.
Table 5 shows the within-segment and the marginal shares across estimated segments using model parameters at average
prices and at observed weekly prices to understand the product purchase sensitivity within each segment. Based on these
results, brand G (29.2%) is the dominant brand for the entire period in terms of market shares in segment 1 at average
prices, followed by brands H (17.7%), and F (17.1%), in decreasing order of importance; however, at observed weekly prices,
both brand G, which remains the dominant brand (27.9%), and brand H (16.7%) have slightly lower estimated shares in favor
of brand F (17.6%). In segment 2, brand F is dominant with 66.2% market share at average prices for the entire period and
64.5% of market share at observed weekly prices, which means that brand F is safe from competition from other brands. In
marginal terms, brand F is dominant in both cases, i.e. brand F is the leader brand in this market, and is followed at a great
distance by brands G and H.
Fig. 2 shows graphically that the market share of brand A is dominant in segment 1 (the smallest segment), followed by
brand G, while brand F is dominant in segment 2 (the largest).
G. Trindade et al. / Applied Mathematics and Computation 296 (2017) 277288 285

Table 6
Model selection (product level).

Models ln L AIC AIC3 BIC Pseudo-R2 Estimated segment sizes

Model 1 256841.0 513774.0 513823.0 513881.7 0.155 0.386, 0.614


Model 2 203032.0 406206.0 406133.0 406161.7 0.345 0.396, 0.209
Model 3 107952.0 216052.0 216051.0 216112.2 0.667 0.468, 0.299, 0.234
Model 4 86609.7 173367.4 173366.4 173427.6 0.730 0.426, 0.395, 0.180
Model 5 196765.0 393678.0 393602.0 393632.0 0.366 0.395, 0.366, 0.239
Model 6 88731.3 177616.6 177615.6 177679.3 0.726 0.536, 0.268, 0.196

Model 1: Promotion; Model 2: Price; Model 3: Promotion + Availability; Model 4: Price + Availability; Model
5: Price + Promotion; Model 6: Price + Promotion + Availability.

Table 7
Model 4 estimates (product level).

Seg. #1 Seg. #2 Seg. #3

Intercepts
A1 4.897 1.951 3.931
A2 7.041 3.061 5.070
A3 5.817 3.586 6.664
B1 1.572 2.553 2.141
B2 1.465 3.121 2.226
C1 2.599 1.593 3.762
C2 2.859 2.39 2.908
C3 3.479 2.343 3.670
D1 (reference)
D2 6.352 2.083 4.600
D3 5.639 1.404 4.254
E 2.019 2.834 2.444
F1 6.339 2.697 4.512
F2 7.751 3.716 5.252
F3 6.997 3.625 8.163
F4 6.168 1.590 9.682
G1 4.021 2.243 3.374
G2 6.371 2.173 4.124
G3 5.501 1.661 7.058
G4 4.697 3.832 3.457
H1 1.663 2.609 1.837
H2 1.283 2.652 1.780
H3 1.845 2.581 2.300
Coecients
Price 7.709 0.642 3.594
Availability 3.645 4.564 4.952
Segment size 0.426 0.395 0.180

Note: p < 0.001.

6.4. Aggregation to product level

The results for the type of aggregation at product level are summarized in Tables 68.
Table 6 suggests that Model 4 (Price + Availability) continues to be the best model in terms of the P seudo-R2 and AIC,
AIC3, and BIC, but three segments are now revealed with sizes of 0.426, 0.395, and 0.180. Moreover, two and three segments
are now extracted in Models 2 and 5, respectively.
Table 7 shows that segment 1, the largest one, is the most sensitive to permanent price changes; segment 3, the smallest
one, is also sensitive to permanent price changes. However, segment 2 is not sensitive to price chances which means that
this segment could be assigned by consumers that are loyal to a certain product and therefore not sensitive to price changes.
It can also be seen that all three segments are sensitive to the availability of the products in the store.
Finally, segments 1 and 3 are also different in relation to product image, i.e., when product D1 is the baseline product:
products F4 and F3 have better images in segment 3 than in segment 1, while products F2 and A2 have a better image
in segment 1. In segment 2, the least sensitive to permanent price changes, products G4, F2, F3, B2, and A3 have the best
image in their relative positions facing the reference product (D1) and the other products.
Table 8 shows the within-segment and the marginal estimated shares across segments using the calibrated model at
average period prices and at observed weekly prices. This table gives product purchase sensitivity within each segment,
namely:

in segment 1, the largest one, F4 and F3 are the dominant products in terms of market shares at average prices; and these
products increase their dominance in terms of observed weekly prices (0.177 vs. 0.358 and 0.220 vs. 0.334, respectively);
286 G. Trindade et al. / Applied Mathematics and Computation 296 (2017) 277288

Table 8
Market shares (MS) estimates for model 4 (product level).

Products MS at average prices MS at observed weekly prices

Seg. #1 Seg. #2 Seg. #3 Marginal Seg. #1 Seg. #2 Seg. #3 Marginal

A1 0.001 0.022 0.001 0.010 0.001 0.025 0.001 0.011


A2 0.017 0.065 0.018 0.037 0.025 0.077 0.023 0.046
A3 0.034 0.096 0.032 0.059 0.007 0.015 0.006 0.010
B1 0.025 0.020 0.017 0.018 0.032 0.035 0.020 0.026
B2 0.016 0.042 0.012 0.024 0.006 0.025 0.004 0.013
C1 0.005 0.011 0.006 0.008 0.025 0.011 0.026 0.019
C2 0.005 0.026 0.005 0.014 0.007 0.044 0.006 0.022
C3 0.004 0.024 0.004 0.013 0.001 0.007 0.001 0.004
D1 <0.001 0.003 <0.001 0.001 <0.001 0.004 <0.001 0.002
D2 0.025 0.019 0.028 0.024 0.047 0.032 0.051 0.043
D3 0.071 0.012 0.065 0.043 0.063 0.010 0.057 0.038
E 0.021 0.036 0.016 0.025 0.026 0.061 0.018 0.036
F1 0.005 0.044 0.008 0.023 0.008 0.080 0.012 0.040
F2 0.034 0.110 0.056 0.078 0.056 0.235 0.088 0.149
F3 0.177 0.052 0.211 0.145 0.358 0.099 0.409 0.281
F4 0.220 0.058 0.256 0.174 0.334 0.082 0.370 0.251
G1 0.002 0.027 0.002 0.012 0.002 0.031 0.002 0.014
G2 0.030 0.027 0.032 0.030 0.039 0.033 0.038 0.036
G3 0.099 0.012 0.104 0.066 0.120 0.015 0.118 0.075
G4 0.095 0.184 0.055 0.109 0.017 0.028 0.010 0.018
H1 0.029 0.038 0.018 0.026 0.036 0.067 0.021 0.040
H2 0.033 0.034 0.018 0.025 0.041 0.059 0.021 0.037
H3 0.050 0.040 0.036 0.038 0.035 0.032 0.024 0.027

in contrast, products G4, A3, and D3 lose some of their weight (0.095 vs. 0.017; 0.034 vs. 0.007; and, 0.071 vs. 0.063,
respectively) which shows some vulnerability to price competition with the dominant products in this segment (products
F4 and F3);
in segment 2, the most inelastic segment to permanent price changes, G4 and F2 are the dominant products at average
prices (0.184 and 0.110, respectively) and are followed by products A3, A2, and F4; but in terms of observed weekly
prices, the dominant products change in favor of F2 (which increases from 0.110 to 0.235), showing that there is some
competition between them;
in segment 3, the smallest one, F4 and F3 are the dominant products at average prices (0.256 and 0.211, respectively),
while F3 and F4 continue to be the dominant ones in terms of observed weekly prices (0.409 and 0.370, respectively),
followed by product G3 (0.118);
in terms of marginal shares at average prices, products F4, F3, and G4 dominate the market (0.174, 0.145, and 0.109,
respectively); on the other hand, at observed weekly prices, products from brand F dominate the market as the brand
leader (F2: 0.149; F3: 0.281; F4: 0.251);
the most vulnerable products are D1, A1, A3, B2, and C3 in terms of marginal market shares.

7. Conclusion

This work proposes and illustrates an optimization-based methodology for the identication of clusters from aggregate
panel data. The likelihood-based model for the market analysis was dened and implemented as an optimal problem. A
deterministic optimization algorithm, the Sequential Quadratic Programing method, was selected to solve the problem due
to its reliability in handling the type of models involved. In the selection process of the optimal model for identifying the
clusters from aggregate panel data, special care was taken by developing different sets of initial solutions dened by exper-
imental design procedures. Consequently, several optimal problems were solved to identify the best model that corresponds
to the maximum likelihood estimate, deemed here as global optimum.
The case study shows that this method can retrieve the heterogeneity in the market. But which is the best level of
aggregation: brand- or product-level data? Are there two or three segments? The least aggregate data level should be chosen
because aggregation always fades out data characteristics. Therefore, the solution with three segments seems to be better.
The way the models were estimated leads us to rule out the overtting problem as the inclusion of more coecients for
estimation does not violate parsimony. Wedel et al. [38] argue that the choice of the most adequate data should be made
according to nancial and/or marketing criteria rather than statistical criteria. This is an issue that needs further research.
However, for both marketers and retailers the problem of identifying two or three segments has different consequences.
That is, it is not the same to implement a price strategy on the basis that there are two or three different segments of
clients. In fact, segment 2, which is made up of more loyal clients than the others, may partially absorb the impact of a
price strategy. Therefore, money can be saved if marketers know that their price strategy would benet only 60.6% of the
clients, because loyal clients would always buy the product with or without a reduction.
G. Trindade et al. / Applied Mathematics and Computation 296 (2017) 277288 287

Multicollinearity is a problem in most models with regression components. Results with synthetic data sets show that
this may not be a severe problem in this model. The potential multicollinearity problem between the variables of price
and price promotion is reduced by dening promotion as the residual that consumers are in fact saving and, at same time,
this improves the eciency of the estimation of parameters, i.e., by reducing multicollinearity, the variance of the estimates
becomes smaller [39]. Therefore, heterogeneity in preferences as well as endogeneity are taken into account. The multi-
collinearity between covariates can be analyzed by traditional indicators such as tolerance, Variance Inator Index (VIF), or
condition numbers. However, these results give evidence that a moderate level of correlation does not affect the computa-
tion of the Hessian matrix or the performance of the SQP algorithm. Further research should explore the characteristics of
the model in depth by running extensive Monte Carlo simulations. More specically, in line with the expectations, the loss
of information due to data aggregation is not entirely retrieved by the estimation and should be further explored.
It must be reiterated that the proposed algorithm does not ensure that a global maximum is reached. However, multiple
starting points mitigate the risk of converging to a local maximum. Therefore, this methodology can be used with aggregate
data, the most commonly available and cheapest in the market, in order to identify heterogeneous segments. Moreover, this
approach can be applied to any panel data of observed counts or proportions. The methodology can accommodate data from
more stores from the same or different chains and can be extended to different product categories of frequently purchased
goods in large supermarkets or hypermarkets. Further variables can be added to the model. For example, advertising (such
as yers and/or discount coupons) and special displays are typically under a category managers control. The use of more
variables in the model is straightforward as it only affects the computation of Mist . For instance, adding two additional
covariates adds new slopes and the number of covariates is now P = 5. It is the same as moving the multinomial logistic
model from 3 to 5 covariates.
Therefore, this can be a fruitful approach for retail and product category management as it computes the impact caused
on the marginal shares by a change in the average price of one brand (or product) taking market heterogeneity into account.
In particular, it allows the denition of distinct trade-offs and elasticities of changes in price on market shares as illustrated
in Fig. 2. Given the solution of the problem, a what-if scenario spreadsheet can be created to dene the optimal combina-
tion of independent variables in such way that specic impacts are optimized. For instance, in this case study, some rms
have more than one brand/product. In such a case product/brand managers may want to run what-if scenarios under mul-
ticriteria conditions that minimize the cannibalization of their own brands/products, i.e., they may want to avoid consumer
switching between their own brands/products and maximize their market share by capturing consumers from other rms
brands/products.
The contribution is this paper builds on the seminal proposal of Zenor and Srivastava [17]. This model is set under a
latent variable framework for longitudinal data [40]. Other modeling paradigms such as Generalized Estimating Equations
(GEE) [41] and alternatives to the multinomial specication may provide further possible extensions to this model. Finally,
this procedure can be extended to a dynamic setting where consumers can switch between different clusters.

References

[1] Y. Kakizawa, R. Shumway, M. Taniguchi, Discrimination and clustering for multivariate time series, J. Am. Stat. Assoc. 93 (1998) 328340.
[2] T.W. Liao, Clustering of time series data a survey, Pattern Recognit. 38 (11) (2005) 18571874.
[3] P. Esling, C. Agon, Time-series data mining, ACM Comput. Surv. 45 (1) (2012) 12:112:34.
[4] R.N. Mantegna, Hierarchical structure in nancial markets, Eur. Phys. J. B 11 (1999) 193197.
[5] N. Basalto, R. Bellotti, F. De Carlo, P. Facchi, E. Pantaleo, S. Pascazio, Hausdorff clustering of nancial time series, Phys. A Stat. Mech. Appl. 379 (2)
(2007) 635644.
[6] T. Sfadi, Using independent component for clustering of time series data, Appl. Math. Comput. 243 (2014) 522527.
[7] L. de Angelis, J.G. Dias, Mining categorical sequences from data using a hybrid clustering method, Eur. J. Oper. Res. 234 (3) (2014) 720730.
[8] J.G. Dias, S.B. Ramos, The aftermath of the subprime crisis a clustering analysis of world banking sector, Rev. Quant. Financ. Account. 42 (2) (2014)
293308.
[9] C.C. Clogg, Latent class models, in: G. Arminger, C. Clogg, M. Sobel (Eds.), Handbook of Statistical Modeling for the Social and Behavioral Sciences,
Plenum, New York, 1995, pp. 311359.
[10] J.G. Dias, J.K. Vermunt, Latent class modeling of website users search patterns: implications for online market segmentation, J. Retail. Consum. Serv.
14 (6) (2007) 359368.
[11] A.M. Razali, A.A. Al-Wakeel, Mixture Weibull distributions for tting failure times data, Appl. Math. Comput. 219 (24) (2013) 1135811364.
[12] E.E. Elmahdy, A new approach for Weibull modeling for reliability life data analysis, Appl. Math. Comput. 250 (1) (2015) 708720.
[13] B.C. Alves, J.G. Dias, Survival mixture models in behavioral scoring, Expert Syst. Appl. 42 (8) (2015) 39023910.
[14] J.G. Dias, Model selection criteria for model-based clustering of categorical time series data: a Monte Carlo study, in: R. Decker, H.-J. Lenz (Eds.),
Advances in Data Analysis, Springer, Berlin, 2007, pp. 2330.
[15] J.G. Dias, J.K. Vermunt, S.B. Ramos, Clustering nancial time series: new insights from an extended hidden Markov model, Eur. J. Oper. Res. 243 (3)
(2015) 852864.
[16] M. Wedel, W.A. Kamakura, Market Segmentation: Conceptual and Methodological Foundations, Kluwer Academic Publishers, Dordrecht, 20 0 0.
[17] M. Zenor, R. Srivastava, Inferring market structure with aggregate data: a latent segment logit approach, J. Mark. Res. 30 (1993) 369379.
[18] D. McFadden, Conditional logit analysis of qualitative choice behavior, in: P. Zarembka (Ed.), Frontiers in Econometrics, Academic Press, New York,
1974, pp. 105142.
[19] G. Trindade, J. Ambrsio, An optimization method to estimate models with store-level data: a case study, Eur. J. Oper. Res. 217 (3) (2012) 483678.
[20] D. Golberg, Genetic Algorithms in Search, Optimization and Machine Learning, Kluwer Academic Publishers, Boston, Massachussetts, 1989.
[21] R. stermark, Solving irregular econometric and mathematical optimization problems with genetic hybrid algorithm, Comput. Econ. 13 (2) (1999)
103111.
[22] C. Shen, W. Xue, X. Chen, Global convergence of a robust lter SQP algorithm, Eur. J. Oper. Res. 206 (1) (2010) 3445.
[23] S.S. Rao, Engineering Optimization: Theory and Practice, John Wiley & Sons, Hobokan, New Jersey, 2009.
[24] R. Fletcher, C.M. Reeves, Function minimization by conjugate gradients, Comput. J. 7 (2) (1964) 149154.
[25] T. Aird, J. Rice, Systematic search in high dimensional sets, SIAM J. Numer. Anal. 14 (1977) 293312.
288 G. Trindade et al. / Applied Mathematics and Computation 296 (2017) 277288

[26] D. Myers, R. Montgomery, C.M. Anderson-Cook, Response Surface Methodology: Process and Product Optimization Using Design and Experiments, John
Wiley & Sons, Hoboken, New Jersey, 2009.
[27] V. Tavares, N. Correia, Optimizao Linear e No-Linear: Conceitos, Mtodos e Algoritmos (Linear and Nonlinear Optimization: Concepts, Methods and
Algorithms), Fundao Calouste Gulbenkian, Lisboa, Portugal, 1999.
[28] D. Luenberger, Introduction to Linear and Non-linear Programming, Addison-Wesley, Reading, Massachussetts, 1984.
[29] Vanderplaats Research & Development, in: DOT-Design Optimization Tools, Users Manual, version 5.0, Vanderplaats Research & Development, Colorado
Springs, Colorado, 1999.
[30] H. Akaike, A new look at the statistical model identication, IEEE Trans. Autom. Control 19 (6) (1974) 716723.
[31] G. Schwarz, Estimating the dimension of a model, Ann. Stat. 6 (2) (1978) 461464.
[32] J.G. Dias, Performance evaluation of information criteria for the NaiveBayes model in the case of latent class analysis: a Monte Carlo study, J Korean
Stat. Soc. 36 (3) (2007) 435445.
[33] C.R. Rao, S.K. Mitra, Generalized Inverse of Matrices and Its Applications, John Wiley & Sons, New York, 1971.
[34] J.E. Dennis, R.B. Schnabel, Numerical Methods for Unconstrained Optimization and Nonlinear Equations, Prentice-Hall, Englewood-Cliffs, New Jersey,
1983.
[35] Visual Numerics, IMSL Fortran Numerical Libraries, version 5.0, Microsoft Corporation, Houston, Texas, 1995.
[36] W.R. Smith, Product differentiation and market segmentation as alternative marketing strategies, J. Mark. 21 (1956) 38.
[37] G.M. Allenby, P.E. Rossi, Marketing models of consumer heterogeneity, J. Econ. 89 (12) (1999) 5778.
[38] M. Wedel, W. Kamakura, U. Bckenholt, Marketing data, models and decisions, Int. J. Res. Mark. 17 (23) (20 0 0) 203208.
[39] W.E. Griths, R. Hill, G. Judge, Learning and Practicing Econometrics, Wiley, New York, 1993.
[40] A. Skrondal, S. Rabe-Hesketh, Generalized Latent Variable Modeling: Multilevel, Longitudinal and Structural Equation Models, Chapman & Hall/CRC,
Boca Raton, FL, 2004.
[41] A. Ziegler, Generalized Estimating Equations, Springer, New York, 2011.

Vous aimerez peut-être aussi