Vous êtes sur la page 1sur 60

Product Offerings and Product Line Length Dynamics

Xing Li
October 6, 2014

Abstract
This paper provides a model that uses preference heterogeneity to rationalize the crosssectional and intertemporal variation in a firms product proliferation strategies. Product-line
dynamics arise from shocks to preference heterogeneity. For example, in the potato chip category I study, consumer concerns over fat levels in foods created two desirable alternatives
(low fat and zero fat) for each flavor. On the supply side, firms learn about these changing
tastes and adapt product lines accordingly. For tractability, the heterogeneity in preference
is captured within the nesting parameter in an aggregate nested logit demand model. I find
greater preference heterogeneity for smaller packages of chips and for markets with more demographic diversity. The dominant firm in the market bases its decisions primarily on past
experience in the market, with the latest preference shocks representing only 30% of the influence in product-line decisions. Gross margins are increased by 5% if firms have perfect
information about preference diversity. Costs for product line maintenance constitute about
2% of total revenue. Sunk costs incurred when expanding the product line are estimated to
be four times the per-product fixed cost, thereby limiting the flexibility of product-line adjustment. The probability of line length adjustment grows from 70% to 90% under a smooth cost
structure.

I am grateful to my advisors, Tim Bresnahan, Wes Hartmann, and Petra Moser, for their invaluable guidance,
discussion, and encouragement. I would also like to thank Chris Colon, Chen Cheng, Oystein Daljord, Michael
Dickstein, Liran Einav, Pedro Gardete, Daniel Grodzicki, Han Hong, Mike Kruger, Brad Larsen, James Lattin, Anqi
Li, Harikesh Nair, Sridhar Narayanan, Joe Orsini, Qiusha Peng, Peter Reiss, Gregory Rosston, Navdeep Sahni, Stephan
Seiler, Stephen Teng Sun, Paul Wong, Yiqing Xing, Constantine Yannelis, Pai-Ling Yin, and seminar participants at
Stanford Department of Economics, Stanford Marketing WIP, Marketing Science Conference in 2014 Atlanta for their
helpful comments. The usual disclaimer applies.

Department of Economics, Stanford University, 579 Serra Mall, Stanford, CA 94305-6072. xingli@stanford.edu

Electronic copy available at: http://ssrn.com/abstract=2462577

Introduction

One of the central decisions firms make is the level of their product proliferation. Product proliferation can be exercised in two dimensions: vertically or horizontally. Vertical proliferation means
providing an upgraded or downgraded model and charging a different price. Examples include
Apple iPhone5S and iPhone5C, Toyota Corolla and Camry, and Canon Digital Camera DSLR 5D
and DSLR 50D. Within the same model, firms can differente horizontally by providing different
features of colors, flavors, or designs. Apple offers iPhone5S with three choices of colors; Danone
produces 6oz yogurt in different flavors. Both companies are doing horizontal product proliferation
within the same model.
Vertical proliferation is mainly driven by leaps in R&D success (e.g., Goettler and Gordon,
2011), whereas horizontal proliferation is largely initiated by consumer tastes (e.g., Draganska and
Jain, 2005) that vary across different markets and over time. Furthermore, vertical proliferation can
also involve higher fixed costs of adapting production processes, whereas horizontal proliferation
typically utilizes the same process as existing products. For both reasons, horizontal proliferation is
more flexible and therefore creates more variation in a firms decisions. This variation in horizontal
proliferation is what motivates my investigation into firms extensions and contractions of their
product lines.
Acknowledging the fact that consumers preference heterogeneity on the demand side is the
primary driver of horizontal product-line, I propose the following framework to rationalize both
cross-sectional and intertemporal variation in product-line decisions. The extent of preference
heterogeneity varies across markets for reasons such as the concentration of different demographic
groups.1 Firms will provide a richer set of (horizontally differentiated) products in markets with a
more heterogeneous preference to serve a larger proportion of consumers and make more profits.
Within each market, firms can also adjust their product lines over time. When some changes occur
in the heterogeneity of preference,2 firms can respond by adjusting their product-line decisions.
When the level of preference heterogeneity increases, firms are more likely to expand their product
lines; when preferences become more homogenous, firms are more likely to contract their product
lines.
The main mechanism to support the above argument is that preference heterogeneity affects
the tradeoff between cannibalization and new sales creation when expanding the product line. For
1

In this paper, preference heterogeneity is an aggregate statistic for both variety seeking within individuals and
preference heterogeneity among individuals, which I demonstrate later.
2
For example, manufacturers of potato chip will consider the immigration of Asian and Hispanic population. They
will also be aware of the consumers growing concern for their own health.

Electronic copy available at: http://ssrn.com/abstract=2462577

multi-product firms, the newly launched product may bring in additional consumers, whereas at the
same time eat up market shares from existing products. When consumers preference is quite homogenous, it is difficult to initiate new sales by new product launching, and cannibalization effect
dominates new sales creation effect, and firms may not maintain a long product line. On the other
hand, when consumers preference is quite heterogeneous, new sales creation effect dominates and
expanding product line is more profitable.
To formalize and quantify the above argument, I model the demand side using Nested Logit.
Different products (features) from the same brand are clustered in the same nest (line) in the choice
structure. The nesting parameter has the same behavioral interpretation as the heterogeneity of
preference, which is an aggregate measure of both variety seeking within an individual and preference heterogeneity across individuals.3 When products are more nested within the line, they are
closer substitutes; consumers agree on the preference ranking among these products and the preference is more homogenous.4 On the other hand, when products are less nested within the line, they
are less substitutes; consumers have more varied views on their favorite products, and preference
is more heterogenous.
On the supply side, multi-product firms decides on the number and content of products offered.
This is a very complicated problem, because multi-product firms cannot simply display all products
in front of the consumers to choose. Instead, they are facing many constraints including shelf space,
distribution and storage cost, and advertising capacity. It is even more challenging to the modelers
for two reasons. First, if I model the launching decisions of every products, the model will become
exponentially complicated as the product line gets longer.5 Second, it is difficult to write down
a model to predict consumers taste for new products.6 For both reasons, I am focusing on the
product line length while abstracting away the product contents on the supply side model.
Firms are chasing time-varying preference heterogeneity by adjusting product line length on
the supply side,7 which is described by an empirical learning model similar to Hitsch (2006). Due
3

This modeling idea is rooted in the early motivation to use nested logit model, that is to use unobserved heterogeneity to capture substitution pattern.
4
Intuitively, consider two products that are equally favorable, and split the market. Suppose the price of one product
rises. When these two products are closer substitutes, the market share of the second product will increase more, which
means more people agree on the which product is their favorite.
5
The existing literature simplify this problem by only focusing on a subset of products. For example, in Draganska
et al. (2009), yogurt firms decides on whether to launch each of the six vanilla-flavored yogurt. Even if they disregard
6
the existence of other flavored yogurt, their action space is {Entry, N otEntry} , with 64 possible actions.
6
In the standard BLP model, consumers utility is derived from characteristics, and it is possible to predict consumers valuation for a new product. (e.g., Petrin (2002); Berry et al. (2004)) This is applicable to some of the
industries such as automobiles, but inapplicable to others, including the potato chip industry that I am studying, because there does not exist a vector of characteristics that capture consumers preference over potato chips with different
flavors.
7
As mentioned before, in the potato chip industry that I am studying, peoples preference heterogeneity changes

to the timing constraints firms face in having to decide on the product proliferation prior to the
market realization, I assume they are ignorant of the true preference heterogeneity at the time of
designing product lines. I model the uncertainty of true preference heterogeneity as some belief
on which firms base their proliferation decisions. After the market realization, the belief can be
updated in a Bayesian way. The elegant representation of nested logit model that is linear in the
nesting parameter makes modeling supply-side learning framework tractable.
I apply the model to the potato chip market, where there is a leading firm (I call it Company
A hereafter) with a market share of 60%. The average preference heterogeneity is estimated to be
0.41 in the market of small package chips and 0.67 in the market for large package chips, which
means that preference for small packages is more heterogenous. This is explained by consumers
more willingness to try new flavors when buying small-packaged potato chips. In addition, more
diverse population in the local market will tend to exhibit more preference heterogeneity, which
is confirmed by the estimation with a series of measure for population diversity in that market,
including the dispersion of income and age distribution, and the magnitude of ethnic groups. On
the supply side, Company A applies in-market learning on preference heterogeneity to adjust his
product line length. I find that Company A bases its decisions primarily on past experience in the
market, with the latest preference shocks representing 30% of the influence. The marginal cost of
offering one additional product is estimated to be $3,560 per million households by quarter; the
total maintenance cost is estimated to be 2% of total revenue for an average line with a length of
22. I also estimate the sunk cost incurred when expanding the product proliferation to be three
times the usual maintenance cost, which may limit the flexibility of product-line adjustments.
Two counterfactual exercises based on estimates obtained from above evaluate the firms optimal line length decisions under different line-length specific policy experiments. In the first exercise, I simulate the optimal line length decisions without the existence of extra cost for line length
expansion. This removes the restrictions on Company As flexibility adjustment line length, and
the probability of line length changes grows from 70% to 90% under a smooth cost structure. In the
second exercise, I consider the situation where Company A knows the precise value of preference
heterogeneity at the time of product line length decisions. She can make a better decision based
on the true value instead of some guess, and the gross margin is increased by 5%. A byproduct of
the second counterfactual is to test the hypothesis of learning or knowing preference heterogeneity
when making line length decisions. I construct a test based on gross margin, and the test result
supports the assumption of learning rather than knowing about heterogeneity. Both simulations
over time for reasons such as an increasing concern over health issues and some trends in taste within a group of
people.

shed lights on firms potential gain from product-line related improvement. The first one relates to
a more efficient cost for product line maintenance, say, a more flexible contract on shelf-space and
a better distribution system, and the second one relates to a better knowledge on consumers from,
say, consumer study.
This paper is related to several strands of literature. First, there is a growing literature on
firms product proliferation and product-line design, both theoretically and empirically. Theoretical works have discovered varies factors to determine product proliferation, including communication cost to consumers (Villas-Boas, 2004), quality signaling Kamenica (2008), vertical structure
of distribution (Liu and Cui, 2010), consumers deliberation on their preference (Guo and Zhang,
2012), variety preference and purchase cost (Bronnenberg, 2014) and other rational interpretations
as well as behavioral explanations such as cognitive overload Iyengar and Lepper (2000), articulated preference Chernev (2003b,a), and contextual effects (Simonson and Tversky, 1992; Orhun,
2009). However, there are few empirical papers in this field. Hui (2004) takes product proliferation as given in the demand estimation of nested logit. Draganska and Jain (2006) further explore
different nesting assumptions in their demand analysis. Draganska et al. (2009) offer a supply-side
model for product line design but restricts to a small subset of products and mainly use supply-side
competition environment to explain the product line design. This paper proposes a supply-side
model for product line length of the whole product line where the driving force for variation in
product-line length is preference heterogeneity on the demand side. The second strand of related
literature is on variety seeking. Models for variety seeking find negative state dependence on past
choice (Chintagunta, 1998, 1999; Seetharaman et al., 2005; Dub et al., 2009, 2010). They are
estimating individual-level variety-seeking behavior between intertemporal purchases, whereas I
am incorporating variety-seeking effect in an aggregate measure of preference heterogeneity. Both
models should provide similar inference on the effect of variety seeking on product-line design.
In addition to the static inference, this model focus more on dynamic proliferation decisions over
time.
This paper is most related to papers in the statistical learning literature. Numerous papers study
consumers learning the quality of new products (Roberts and Urban, 1988; Erdem and Keane,
1996; Ching et al., 2013; Lin et al., 2014). On the supply side, Urban and Katz (1983) and Urban and Hauser (1993) address firms market experimentation in designing new products. Hitsch
(2006) studies firms learning the quality of new products when making exit decisions. Another
series of papers(Crawford and Shum, 2005; Narayanan and Manchanda, 2009; Dickstein, 2014)
considers physicians and patients learning about the effectiveness of drugs when making prescription decisions. This paper differs from those empirical learning papers in two perspectives. First,

the learning object is preference heterogeneity rather than mean preference in these existing literature. Second, the learning object is evolving over time whereas in standard learning framework,
the learning object is constant.8
This paper is also related to research on empirical entry and product positioning.9 Early research on empirical entry infer firms profitability from their entry decisions (Reiss and Spiller,
1989; Bresnahan and Reiss, 1990, 1991; Berry, 1992). Later research treats as endogenous variables the marketing mix other than price (Berry and Waldfogel, 2001; Mazzeo, 2002; Berry et al.,
2004; Seim, 2006; Einav, 2010; Sweeting, 2010; Crawford et al., 2011; Ryan and Tucker, 2012;
Fan, 2013). This paper contributes to this strand of literature by proposing a tractable model for
product line length dynamics for multi-product firms.
The rest of the paper is organized as follows. Section 2 introduces the data and some reducedform evidences on line length dynamics. Section 3 provides an empirical model to quantify firms
optimal line length decisions driven by preference heterogeneity. Section 4 describes the full specification and identification. Section 5 shows the results, and section 6 concludes.

Product Offerings in US Potato Chip Market

In this section, I will provide an overview of potato chip industry and description on the IRI Academic Dataset (Bronnenberg et al., 2008) that I use.10 The last part of the section shows some
reduced-form evidence on product line length dynamics.

2.1

The Potato Chip Market

Potato chips can be found in most American households. An average US household will spend $80
a year in salty snacks. Potato chips have a dollar share of 30% in the industry of salty snacks, which
means an average household will spend around $24 each year on potato chips (First-Research,
2011).
Chip manufacturers anticipate and respond to changes in consumer preferences. First of all,
in the potato chip industry, the ability to be innovative and differentiate a product is the key to
competition. As a result, manufacturers offer different choices of potato chips with different flavors, fat contents, and cut types. Furthermore, consumers tastes vary by region and over time.
8

Lovett et al. (2009) also model time-evolving learning parameters.


Dub et al. (2005) provide an excellent summary of these papers.
10
All estimates and analyses in this paper based on Information Resources Inc. data are by the author and not by
Information Resources Inc.
9

For example, Joon (2013) states that consumers in the Midwestern region prefer thick cuts and
consumers in the southwestern states prefer bold and spicy flavors. At the same time, many exogenous factors drive the evolution of tastes over time. Population migration is one such factor
(Bronnenberg et al., 2012). Manufacturers are creating new spicy flavors catering to a growing
Hispanic and Asian population (First-Research, 2011). Consumers awareness of the health cost
of eating potato chips high in trans fat and salt is another factor. To capitalize on this shift, leading
manufacturers have introduced a number of new products with reduced fat and low salt content
(Joon, 2013). A third factor is the change in taste for (new) flavors. Firms can elicit this change by
inviting consumers to submit their newly designed flavors. 11 With the existence of diehard fans
of classically flavored potato chips, the regional and temporal variations of tastes imply changes in
preference heterogeneity and have corresponding implications on product proliferation decisions.
A second feature of potato chip industry is that it is highly concentrated, with a leading player
(Company A) having a market share of 60%. The second largest player has a market share of
only 5.2% (Joon, 2013). Company A does not worry too much about potential entrants. First,
consumers have strong brand preference in picking potato chips. They are willing to pay extra
for branded chips. In addition, operating firms in this industry need to have good relations with
upstream suppliers and downstream retailers. They use long-term contracts to hedge against the
volatile prices for potatoes, sugars, oils, and fats from their suppliers, and they are competing for
the best shelf spaces in grocery stores.

2.2

Data

I use the IRI Academic Dataset from 2001 to 2007 to estimate the model.12 The IRI academic
dataset provides scanned sales data from a sample of grocery stores at the UPC-store-week level.
I restrict the analysis in this paper to the Salty Snack - Potato Chip industry. I aggregate the data
into feature-city-quarter as detailed in Appendix A and described briefly as follows.
In the product dimension of UPC, I am restricting to 8-13 serving sizes because they are the
main sales happened in grocery stores.13 Furthermore, I treat all non-Company A chips as ho11
For example, Frito Lay holds the contest called Do us a flavor in each year to invite consumers to submit their
newly designed flavors and launches the winners. The winning flavor will be awarded 1 million dollars.
12
Although the IRI Academic Dataset is available from 2001 to 2011, I only make use of seven years for the following reasons. First, the 2008 financial crisis heavily drove up prices of potato chips, which makes pricing decisions
non-trivial and complicate the model. Second, Company A did a national launch of zero-trans fat in 2008. The reasons
for the timing and scale of such a big event are beyond the scope of this project. Moreover, the concurrence of the two
events further complicates the analysis.
13
I choose the boundary of serving-size separation from the natural discontinuity point of serving-size density, as is
shown in Appendix A.

mogenous outside goods and aggregate their sales across different non-Company A brands. For
Company A chips, I aggregate sales into features, where each feature is a unique triplet of distinguishable characteristics of flavor, fat content, and cut type. I observe 41 different flavors, three
different fat contents (regular, fat free, reduced fat), and three different cut types (flat, ruffle, wavy).
Not all combinations are ever produced and sold. In the data, only a total of 63 features (flavor-fatcut combinations) have ever been sold in grocery stores.
Along the geographic dimension of store location, I restrict the analysis to a balanced panel of
stores to prevent artifactual variation in product line due to changes in sampling criteria. I further
aggregate to the level of 50 markets defined by IRI. Three reasons justify this aggregation. First,
grocery stores are different. Product lines displayed in grocery stores in neighborhoods where
the majority population is white differ from those in stores operating in more diversely populated
communities. Even within one diversely populated community, we might find some Asian stores
and other Hispanic ones, where product lines in both stores are short, but the aggregate diversity
of preference is high. Second, I can only observe the sale data, not the feature-launching data. In
other words, I do not know which features are displayed. It might be artifactual to be tagged as
feature withdraw if there are zero sales for some features in stores, while they are actually on the
shelf. If all features with zero sales in store are taged as unlaunched, I would observe quite frequent
changes in store-level line length, whereas some of them are mis-specified. Both these two issues
will largely be aggregated out at the level of geographical market. Third, it is easier to link the
demographic data at the level of market, which provides further information about consumers.
In the time dimension of week, I aggregate to the level of quarters to avoid artifactual product
assortments identified by observed sales instead of actual launching as described above. Some
features have non-zero sales for only part of the weeks because they are launched in the middle
of the quarter. Disregarding this effect will bias the estimates.14 For simplicity, I drop all these
features that are only partially observed within the quarter (that have non-zero sale for less than 12
weeks). For all 63 features in 28 quarters across 7 years, most of the features (96%) have positive
sales in either all or no weeks within serving-size-city-quarters. Market shares from these features
are also negligible as shown in Appendix A. After dropping these transient features, we have 58
features remained in 28 quarters across 50 markets.
Company A has wide variation in the length of its product line, defined as the count of features
sold in one city-quarter. Table 1 and Figure 1 present the distribution of line length. The average
14

As I will discuss later, the concentration of market shares among Company A chips is one key variable in the
analysis. Ignoring these marginal features that may be launched in the middle of the quarter and naively viewing
their small market share as low sales will bias the inference regarding the concentration measure and contaminate the
estimates.

line length is 22.09 with a standard deviation of 3.86. The shortest line is in Raleigh/Durham 2001q1, with a length of 8, whereas the longest line is in Chicago - 2002q2, with a supply of 30
different features. Variation in line length derives from two sources: cross-sectional and intertemporal. Chip lines vary widely in line length in both sources. Cross-sectionally, Pittsfield has
the shortest line, with an average length of 16.89, whereas Houston has the longest line, with an
average length of 25.05. Line length also changes over time, as is shown in Table 1 and right panel
of Figure 1. Line length is quite sticky, with about 30% cases there is zero changes, and in 85%
the changes are within 2 features, but there are still cases where Company A is quite aggressive in
line length adjustment.
I supplement the IRI dataset by merging with the IPUM CPS data to get the demographics.
Among 302 metropolitan area in CPS, I have identified 98 that can be merged with IRI markets.
In terms of population, the 50 IRI markets in the data set cover half of the total population nationwide. I calculate the total market size by number of households in 2007, whereas I calculate in
a quarter by city level other demographics that may correlate with preference heterogeneity are
calculated in a quarter by city level.

2.3

Reduced-form Evidence on Dynamic Product Offering

Before going into the structural estimation, I will show some reduced-form evidence on firms
changing proliferation decisions based on market outcomes over time. When preference is homogenous, consumers tend to agree on the preference ranking of all features within a line, and
in-line market shares for features are concentrated. In the extreme case, consumers fully agree on
the preference ranking, and the in-line market share is 1 for the most preferred feature and 0 for
others. In these cases, the model predicts that firms will contract the product line. On the contrary, when preference heterogeneity becomes high, consumers have various opinions about the
most favorable feature, and in-line market shares become less concentrated. In this case, the model
predicts that firms will expand the product line.
To illustrate the argument above, I run the following regression:
LineLengthmt = 0 + 1 HHIm,t1 + 2 LineLengthm,t1 + cm + ct + mt
where m indexes market, t denotes quarter, LineLengthmt is the length of the product line,
HHImt is the Herfindahl Index for in-line market share, that is,
HHImt =

X
f

s2f |l,mt

and sf |l,mt is the in-line market share for feature f in market-time mt. cm and ct are market and time
fixed effects to control for geographic unobservables and seasoning effects. Baseline regression
confirms the model prediction (Table 2, Column 1). The higher the market concentration, the
shorter the product line in response. A one standard-deviation change in HHI (0.03) will lead to a
change in line length of 0.2. Compared to the average change in line length of 0.19 (Table 1), this
magnitude of estimate is not small.
One of the challenge for the interpretation of the estimate is that the measure of HHI is mechanically decreasing in line length, and the estimated correlation is artifactual.15 The worry is partly
true as shown in Appendix B, and I use another measure of market concentration: the standard
deviation of log in-line market share defined as
StdLnShareInLinemt = Std ln sf |l,mt

which is not mechanically correlated with line length (Appendix B). When in-line market shares
are more concentrated, the standard deviation is high. Regression results still support our conjecture. A one standard-deviation increase in this concentration measure will lead to a 0.36 increase
in line length (Table 2, Column 2).
One alternative interpretation of the above findings is that firms will automatically withdraw
losing features that are unpopular. To deal with this challenge, I change the dependent variable to
be the indicator of line expansion. Regression results also confirm the initial theory proposed above
(Table 2, Column 3,4). The higher the market concentration, the less likely the line gets expanded.
A one standard-deviation decrease in HHI will lead to a higher chance of line expansion by 4.74%
(Table 2, Column 3) and 4.60% (Table 2, Column 4). Compared to an average chance of line
expansion of 34%, this increase is economically significant.
A final caveat is that all these reduced-form evidences are correlational, not causal. The complete model allows firms to adjust their line length based on all past market realizations rather than
just the last one. To quantify the above mechanism, we will estimate a structural model with a
richer set of specifications.

A Model of Product Line Length Dynamics

In this section, I propose a model that is structural in both demand and supply to capture the effect
of preference heterogeneity on the tradeoff between cannibalization and new sales creation when
15

I regress line length on one-period lagged HHI. The mechanics in calculating HHI will contaminate the inference
only when firms have some inertia to adjust line length.

10

firms are making product line length decisions. For simplicity, I assume that in each market m,
there is a separate monopolist. Within each market, the monopolist provides a line of nt products
indexed by j {1, 2, ..., nt } to compete with one single outside good j = 0 in each period t.

3.1

Demand Side

For each market m and period t (suppressed temporarily), the utility for consumer i from consuming Company A chip j {1, 2, ..., n} and outside goods j = 0 is
uij = ai + cij pj
= (a + i ) + (
cj + j ij ) pj
= j + (i + j ij )
ui0 = i0
where ai is consumers brand preference for company A, which can be decomposed into the average level a and consumer heterogeneity i ; cij is consumer is utility for product j, which also
include the mean value cj and consumers heterogeneity j ij ; pj is the price for product j. After
some rearrangement, the utility for consumer i consuming Company A product j equals the mean
utility level j = a + cj pj and consumers heterogeneity (i + j ij ). Following Berry (1994)
and Cardell (1997), both ij and (i + j ij ) follows i.i.d. type I extreme value distribution.
From the representation of cij = cj + j ij , the value of j measures consumers preference
heterogeneity over product j. When j is high, cij varies a lot across different individual i, and preference for product j is heterogenous. On the other hand, lower j implies the preference is more
homogenous. Unfortunately, with only the market level sales data available, I lack the statistical
power to identify all j . For tractability, I equalize all j and propose a market-level heterogeneity
measure that captures the overall level of preference heterogeneity in the market. By making
this assumption, the model is degenerated to Nested Logit model with nesting parameter equals
(1 ), with representation from Berry (1994) as
uij = j + (i + ij )
The nesting parameter of (1 ) has the behavioral interpretation of preference heterogeneity.
When = 0, the difference in utility from consuming product j and k is
uij uik = j k
11

which is the same for all i. Consumers agree on the preference ranking for all features within the
product line. When is small, different products in line are close substitutes and the preference
is more homogenous. On the contrary, when is large, consumers tend to have more divergent
opinions on the preference ranking for products in the line and the preference is heterogenous.
The nesting parameter is an aggregate statistics of both individual level variety seeking and
cross-individual preference heterogeneity. If we think about the repeated purchases of one individual as different purchase occasions, the variety seeking behavior can be rationalized as the low
correlation for individual-specific demand shocks among different products, which is captured as
high in current model. With market level data, I cannot identify between variety seeking within
individual or preference heterogeneity among individuals. But these two channels should have
similar implication for product assortment decisions, which is presented later.
Another advantage for equalizing all j is that the model is now degenerated into Nested Logit
which can be estimated in linear GMM. Within each market m,
ln s1t ln s0t = jt + t ln sj|l,t


= a + cj pjt + t ln sj|l,t + jt

(1)

where s1t is the market share for all Company A products, s0t is the market share for all nons
Company-A products, sj|l,t is the in-line market share, which equals 1sjt0t . Following the standard
model, I allow the taste for product j to vary by time, with cjt = cj + jt , where cj is the product

fixed effects, and jt is the unobserved demand shock, which is distributed as N 0, 1
.

3.2

Static Profit when is Known

I assume that at the time of product line length decision, Company A does not know the precise
value of mean utility jt so that she is taking expectation on over some distribution F (). There
are three reasons to justify this assumption. First, product line length decisions are made prior to
the realization of demand, so Company A is ignorant about the demand shock jt . Second, retailers
can observe the demand shock and adjust the retail price pjt , so pjt is also unknown before market
realization. Third, when Company A launches some new product, the value of cj is also unknown
to her.16 By making this assumptions, I abstract away the identity of each products in line and
16

This also assumes out the product launching in the vertical sense or mass market strategy. When a company
decides to launch a new product, she can either play mass market strategy so that the new product is attractive to all
consumers (i.e., with a high value of ) or niche market strategy that the feature is attractive to a set of consumers (i.e.,
similar ). In the potato chip industry, it is quite difficult to launch a potato chip that is favorable to all consumers and
play mass market strategy.

12

focus mainly on the length of product line. The total market share for Company from offering a
product line with length n follows the nested logit representation with

s (n, ) = E
where
I = ln

exp (I)
1 + exp (I)

n
X


exp

j=1

The total market share s (n, ) is increasing in n, increasing in , and super-modular in n and
under some conditions imposed on F .17 In other words, when expanding the line length, the
marginal gain in total share is larger when the preference is more heterogenous. Suppose there is
a constant cost of expanding the line length by one, the super-modularity means the optimal line
length choice n is increasing in .18 In general, let C (n, l) denote the cost of launching a line
with length n while the line length in the last period is l. A myopic firm will choose n to maximize
w M s (n, ) C (n, l)
where w is the manufacture margin, M is the market size. Let n (, m) be the optimal line length
choice made, from super-modularity, n is increasing in t .

3.3

Dynamic Learning on Time-evolving

As mentioned earlier, preference heterogeneity evolves over time due to many exogenous factors
including population migration, health concerns, as well as evolving tastes for new flavors. I further
assume that firms do not know the true value of preference heterogeneity when making line length
decisions. Instead, they have some beliefs on this value and update their beliefs based on market
realizations.19
17

Super-modular means s (n + 1, ) s (n, ) is increasing in . Proof of these properties are provided in Appendix

C.
18

This is consistent with the standard interpretation of price elasticity in the nested logit model. When products are
more nested within line, the price elasticity is higher within nests than between nests. Lowering price for one feature
will have larger cannibalization effect that will consume the market share of other products within the line than new
sales creation effect that will increase the total share of all products in line. The same logic applies to the strategy
of line expansion. The cannibalization effect of expanding the line dominates the business stealing effect when
features are more nested, and in this case firms are less likely to expand the line.
19
There is no direct test about the informational assumption that firms do not know the exact value of preference
heterogeneity because the stationary learning model (as described below in this paper) and complete information
model are not nested with each other. However, I will show some indirect test result based on simulation in later
section.

13

3.3.1

Learning from Market Realizations

Suppose at the beginning of period t, Company A has a prior belief on t , which is modeled as a
truncated normal with mean t and precision t , truncated at unit interval (0, 1), which is denoted

as T N t , 1
. After market gets realized, the market shares on all products are observed, and
t
Company A can observe one signal from each product j as derived from (1):

jt = ln s1t ln s0t a cj pj = t ln sj|l,t + jt
Aggregate signals from all products about the same t will get an aggregate signal20
P
t =



P
ln sj|l,t jt
j ln sj|l,t jt
= t +
P 2
P 2
j ln sj|l,t
j ln sj|l,t

with precision
!
ht =

ln2 sj|l,t

A nice property for truncated normal belief is that it is also a conjugate prior for normal data
generation process, which is shown in the next theorem
Theorem 1. Suppose the prior is truncated normal
t T N t , t2 = 1
t

and an unbounded signal is observed with value t and precision ht , then the posterior belief is
also truncated normal


2
1
t |t , ht T N 0t , (t0 ) = (0t )
with
ht
t
t +
t
t + ht
t + ht
= t + ht

0t =

(2)

0t

(3)

Proof is shown in Appendix D.


20

2
For convenience, the notation ln2 sj|l,t means ln sj|l,t .

14

3.3.2

Evolution of t

The next step is to model the time-evolution of preference heterogeneity t . The reason for allowing
t to evolve over time is two-folds. First, in the potato chip industry, we do observe preference
heterogeneity changes over time and chip manufactures responds by adjusting their product line
strategies. Second, for modeling perspective, if the preference heterogeneity is constant over time,
as an experienced firm operating in a mature market, Company A is sophisticated enough to know
the true value of preference preference heterogeneity and no intertemporal variation in product
line should be observed. The large intertemporal variation in product line length motivates the
assumption of time-evolving preference heterogeneity.
If t is not truncated, a natural candidate model is random walk, with
t+1 = t + t
where t N (0, 1) is the evolution error, or equivalently,
t+1 |t N t , 2

In the truncated case, I propose the following quasi random walk


t+1 |t f (|t ; )
which is similar to the random walk process as for unbounded case with acceptance-rejection at
unit interval. Convoluted with the truncated normality on t , we can approximate the prior belief

2
of t+1 as T N t+1 , t+1
= 1
t+1 with
t+1 = 0t

(4)
2

2
t+1
= (t0 ) + 2

Details are included in Appendix E.

15

(5)

3.4

Line Length Dynamics Chasing Time-evolving Preference Heterogeneity

When we combine the above two pieces of dynamic learning and evolution, we can have the full
description of firms dynamic problem. The action-specific flow profit
n (t , t , lt ) = w M E (s (n, t ) |t , t ) C (n, lt )
and the value value function is
Vn (t , t , lt ) = n (t , t , lt ) + E (V (t+1 , t+1 , lt+1 ) |t , t , lt , n)
V (, , l) = E max (Vn (, , l) +  n )
n

where the state variables are the belief mean, belief precision, as well as last period line length,
and the transition probability is defined as (2) (3) (4) (5), with an additional one for lt+1 = n.

Empirical Specification and Identification

In this section, I will present the full empirical specification and identification of the model. Similar
to Hitsch (2006), I apply two-step estimation, where the demand side is estimated in linear GMM,
and its parameters are plugged in to the supply side. I estimate the dynamic supply model by
maximizing likelihood. This section ends with a discussion on the identification of the model.

4.1

Demand Side

The demand side is modeled as a nested logit of with two nests where all Company A chips of
different features are nested in one line, and all non-Company A chips are treated as homogenous
outside products. Based on (1), for each market m,

ln s1mt ln s0mt = am + cj pjmt + mt ln sj|l,mt + jmt

(6)

Both pjmt and ln sj|l,mt are endogenous, because they are correlated with the unobserved demand
shock jmt . I employ the following sets of instruments for the two endogenous variables:
The summation of characteristics (flavors, fat content and cut type) of other Company A

16

chips sold in the same market-time


X

xj 0 mt

j 0 6=j

Average price of the same feature sold in other geographical markets in the same time
1 X
pjm0 t
# m0 6=m
Other cost for raw materials, including potatoes, sugar, soy bean oil, edible butter, and edible
tallow
Number of competitor brands and number of competitor UPCs other than Company A chips
within the same market-time
The first set of instruments are widely known as BLP instruments, which Berry et al. (1995) started
to use. The underlying assumption is that the characteristics are exogenous to demand shocks. In
the current model, the upstream wholesalers make product assortment decisions whereas downstream retailers make pricing decisions. In reality, grocery stores and manufacturers jointly decide
what to display in advance. If some of the features do not sell well, grocery stores will lower prices
to sell out the storage. In this case, it is natural to assume the assortment decision is made prior to
the realization of local demand shock.
The second set of instruments are known as Hausman instruments which Nevo (2001) started to
use in demand estimation. The underlying assumption is that demand shocks are independent over
different markets, but there are factors that may affect the pricing for all markets. These factors
include, but are not restricted to, common cost shifters and nationwide advertising campaign. In the
potato chip industries, prices across all markets are subject to common manufacturing cost from
Company A as well as common nationwide campaign, which validates the usage of Hausman
instrument.
The last set of instruments consider the competition environment that was used in Bresnahan
et al. (1997). The argument is that competition environments affect firms pricing decisions, which
is orthogonal to demand shocks. In this project, I can also exploit the huge variation in the competition environment across different markets measured by the number of competitor brands and
UPCs.

17

4.2

Supply Side - Flow Profit

In each market m, the action-specific flow payoff of Company A is


n,m (, , l) = wm Mm E (sm (n, ) |, ) Hm c (n, l)


exp (I)
sm (n, ) = EFm
1 + exp (I)
 
n
X
j
I = ln
exp

j=1
In other words, I allow a market-specific value profit function and calibrate the parameters as
follows:
wm : manufacturers margin, calibrated from average price in that market, adjusted by retailers markup (15%), distributors markup (25%) and manufacturers gross margin (30%),
i.e., wm = pm 0.85 0.75 0.3
Mm : market size, calibrated from total number of household Hm , with assumption that
an average household spend X dollars per quarter in buying potato chips, where X is
calculated from $24 spent by an average household in a year in potato chip consumption, adjusted, by quarters and market shares of large package sized chips, i.e., Mm =
Hm 6 ShareLargem /
pm
Cost of line length maintenance: assume a per-capita cost, i.e., Cm (n, l) = Hm c (n, l). In
the estimation, I tried two specifications of the per-capita cost: linear and kink. In the linear
specification, c (n, l) = c n. In the kink specification, c (n, l) = (c1 + c2 1 (n > l)) n
Fm : distribution of mean utility , assume normality, with mean and variance calibrated by
the empirical distribution of {jmt }j,t
The only parameters to estimate in the flow profit is the cost parameter {c1 , c2 }.

4.3

Supply Side - Dynamics

Firms dynamic problem is described as


Vn,m (t , t , lt ) = n,m (t , t , lt ) + Em (Vm (t+1 , t+1 , lt+1 ) |t , t , lt , n)
Vm (, , l) = E max (Vn,m (, , l) +  n )
n

18

The unspecified parameters are initial belief (1,m , 1,m ), the evolution rate ,m as well as the
scale of random fixed cost  . All parameters are identified as shown from below, but I still impose
the following cross-market restrictions to simplify the calculation.
1m : initial prior precision is assumed to be proportional to the precision of signal. This is
justified by stationary assumptions in the learning process. For markets with a more precise
signal, the learning speed is expected to be fast. However, this is only valid if the belief
precision is the same. I equalize the learning speed across all markets by assuming that the
P
prior belief is proportional to signal precision, i.e., 1m = k hm , where hm = #1 t hmt
be the average precision.
,m : evolution rate of preference heterogeneity. From stationary assumption, ,m = k
(k + 1) hm after combining stationarity and (5)
1
1
=
+ ,m
1m
1m + hm
1m : initial prior mean, integrated from calibrated normal distribution, with mean and variance estimated from {mt }t 21
So the dynamic parameters to identify is {k ,  }

4.4

Identification

This section briefly shows the identification of of supply side parameters without imposing any
cross-market restriction, i.e., market-specific parameters are separately identified. In the current
version, we assume that initial prior mean 1 is known (and integrated out in the estimation). However, the identification does not rely on this assumption. A stronger identification result without
knowing prior mean is described in Appendix F.
In our data, we can observe actual line length decisions, signal values and precisions, as well
as prior mean
{nt , t , ht , 1 }
Based on these information, I will show the non-parametric identification of preference evolution
rate, initial belief precision, and line length maintenance cost, and scale of fixed cost for launch21

Note that initial prior mean is also identifiable as is shown in Appendix F. However, I follow the convention of
learning literature to integrate out this value.

19

ing22
{ , 1 , c}
4.4.1

Preference evolution rate and prior precision 1

Signal evolution rate measures how fast evolves over time. Intuitively, t can be estimated
from demand, and this rate is identified by the demand side estimation t . Equivalently, the signal
value t is calculated based on demand estimation, and is identified from Var (t+1 |t ), because
t+1 deviates from t by three reasons: signal error in period t, signal error in period t + 1, and the
deviation of t+1 from 1 . The precision of the first two errors are known, so the rate of evolution
is identified.
Initial precision is identified by stationary assumptions that the precision belief does not explode. From the following equation
1
1
=
+
1
1 + h
we can pin down 1 . The intuition is that when making line length decisions, Company A cannot
She can neither rely too
rely too much on market signal, because signal is noisy, measured by h.
much on her prior belief, because evolves over time, as is measured by . The optimal balancing
between these two sources pin down the belief precision in the stationary level.
4.4.2

Cost of line length maintenance c

From the last part, I have shown identification of 1 and . With the knowledge of 1 , I can
calculate the whole process of belief process {t , t }, and the state variable is known. The cost
parameter is identified by the standard argument of Conditional Choice Probability E (nt |t , t , lt )
proposed by Magnac and Thesmar (2002). Intuitively, fixing the belief precision, when the cost is
low, optimal line length is more responsive to changes in belief mean, as is shown in Figure 2. The
cost is identified by regressing actual line length nt on the belief mean t , controlling for t .

Results

This section shows the model estimates and various simulation results based on estimates obtained.
22

A final supply side parameter  is a nuisance parameter which is not non-parametrically identified. But since we
have impose functional form assumption on the value function, including the estimation of this parameter will improve
the model fit a lot.

20

5.1

Demand Estimation

In the demand side, I estimate a Nested Logit model specified in (6). I report the average estimates
of preference heterogeneity by imposing mt = in this part, but in the supply side, I allow
preference heterogeneity to vary by market and time.
Table 3 reports the estimation result from the demand side. Column (1) disregards the existence of endogeneity problem and directly estimate the equation by OLS. Column (2) overcome
this problem by applying three sets of instruments as described before. By comparing column (1)
and column (2), I find that instrumental variables work well as expected. Both preference heterogeneity and price elasticity will be under-estimated without controlling for endogeneity, and the
characteristic vectors only become significant in 2SLS specification.
Note that the first two columns in Table 3 use characteristic vectors (flavor fixed effects, cut
types, fat contents) to describe one product. In column (3), I replace with a more precise control,
that is product fixed effects. The estimates for price elasticity does not change too much (-2.38
in Column 3 compared to -2.53 in Column 2), but the estimates for preference heterogeneity almost doubled. As mentioned, the characteristics vectors cannot capture consumers preference
completely, so I take the product fixed effects estimates as benchmark case, where the preference
heterogeneity is estimated to be 0.41 (with a standard error of 0.02, Column 3, Table 3). In Column
(4), I allow price elasticity to vary by demographics. I find that price is less elastic in markets with
a richer population measured by median income, or older population measured by median age,
which coincides with most previous findings.
The main parameter of interest is the preference heterogeneity in this paper, so in Table 4, I
explore the source of preference heterogeneity by interacting with different observables. Column
(1) copies the Column (3) from Table 3 to serve as a benchmark case. In Column (2), I estimate
the same model but in the data for small-package-sized potato chips. I find that preference is
more heterogenous (0.67 in Column 2 compared to 0.41 in Column 1) and price is more elastic
(2.74 in Column 2 compared to 2.38 in Column 1). This extra heterogeneity in preference may
come from the fact that consumers are more willingness to try new flavors when buying small
sized potato chips. There are two sources of preference heterogeneity estimated in this paper: one
is the preference heterogeneity between consumers, and the other is the preference heterogeneity
within consumer but in different purchase occasions. I cannot separately identify these two sources
with only market level data, but I believe that the second source is more significant in markets for
small packaged potato chips. The difference in heterogeneity estimation supports the existence
of heterogeneity within consumers in different purchase occasions, and this is related to variety
seeking behavior.
21

Another source of preference heterogeneity comes from population diversity. In Column (3)(7) of Table 4, I explore to what extent population diversity can explain preference heterogeneity.
The results are robust to a series of diversity measures. In Column (3), I uses interquartile of
income distribution to measure the population diversity. I find that in markets with a more disperse
income distribution, the preference heterogeneity is significantly higher. To quantify this estimates,
I take out two markets with minimum (0.04) and maximum (0.10) diversity measure, and the
implied difference in heterogeneity is 0.09,23 or 20% of the baseline heterogeneity of 0.41. In
column (4), the diversity measure is the dispersion of age distribution, and the implied difference
in heterogeneity is 0.07, or 17% of baseline value. Other than the above two dispersions, the
preference heterogeneity is also explained by diversity of ethnic groups. In Column (5), I use Asian
population ratio in that market and find that in markets with a 10% higher Asian population ratio,
the preference is more heterogenous by a measure of 0.047 out of baseline value of 0.41. In Column
(6), I use Hispanic population ratio, and the interaction term is not significant. This is because
there is a wide range of Hispanic population measure from 0 to 53%. If the true functional form
is non-linear, using linear function form to approximate may not get significant result. Instead, I
discretize the measure using a dummy for above median, and the estimates is reported in Column
(7). In markets with above-median Hispanic population ratio, the preference is more heterogenous
by a measure of 0.12 out of baseline value of 0.41.

5.2

Supply Estimation

I plug in the coefficients and estimate the supply side by maximum likelihood. Solving the original
problem with brute force is difficult, because calculating the line share sm (n, ), the flow payoff
n,m (, , l) and the state transition f (t+1 , t+1 |t , t , n) all requires simulation. However, I can
employ numerical methods to further simplify the calculations.
For sm (n, ), I use power polynomials to approximate. Because it does not contain any parameters to estimate, the approximation needs to be calculated only once. The reason for using
polynomials is the ease for preserving monotonicity and super-modularity in the approximated
function, which is the key for identification.24 To calculate n,m (, , l), I use quadrature to calculate the expectation with respect to although is distributed in truncated normal instead of
normal. When the precision is quite high, and the mean is far from the boundary, the truncated
normal can be approximated by standard normal because the probability of lying outside the
23

This is calculated by (0.1 0.04) 1.48


I use CVX to get the approximation, which is a regularized optimization package (Grant et al., 2008). See
Appendix C for details.
24

22

boundary is low. In terms of state transition probability, because the line length stays at a high
level (for the large package size, the line length ranges from 8 to 30, with an average of 22), and
the precision does not explode because of the time-varying , I simply assume the state transition
probability does not depend on action n, which relieves the computation burden. Finally, I use
Chebyshev polynomials to approximate the value function and estimate the single-agent dynamic
game with unobservable and time-varying state variables.25
Table 5 reports the estimation results. I estimate the model in two specifications. In the first
specification, I assume the maintenance cost per capita (1M household) is linear in the line length,
whereas in the second specification, the marginal cost is higher when manufactures are expanding
their lines. In the first specification, the marginal cost of expanding a line by length one is $3,560
per million of household. For an average line length of 22, the total (variable) cost of maintaining a
line length in an average-size city with 2.63 million household is approximately $0.2 million.26 As
a comparison, the industrial in an averaged-sized city with average line length selling at average
price is $8.96 million,27 the product line related cost constitutes about 2% of total revenue.
In the second specification, the cost is nonlinear, and I find an extra cost ($6.14K compared
to $2.08K) of expanding the product line. This extra cost comes from the inflexibility of displaying, distributing, storing or advertising additional products. The extra cost limits the flexibility of
line length adjustment in two senses. First, it restricts the possibility of line expansion because
expanding the product line may incur this extra cost. Second, it also restricts the possibility of
product line contraction, because when Company A considers withdrawing some products, she
might worry about the future cost of pulling them back again. Counterfactual analysis in the next
subsection may quantify this inflexibility caused by non-linear cost structure.
The precision ratio between belief and signal is estimated to be around 2.5 in both specifications. Note that this ratio determines the linear weight for prior and signal when updating the
belief. From the estimation, Company A places 30% of decision weight on in-market signal and
70% weights on past experience, summarized by prior belief. Even as an experienced player in a
matured market, Company A is still leveraging heavily on the in-market learning, because of the
evolutionary nature of preference heterogeneity. The market signal is a bit too noisy, so Company
A cannot rely completely on the market signal. Counterfactual analysis in the next subsection will
show the gross margin Company A may achieve if she knows the true value of heterogeneity in
advance.
25

The recent development of MPEC (Dub et al., 2013) is also applicable to this model.
$3, 560 22 2.63 = $0.2M, all numbers are taken from Table 1.
27
$0.25 0.03 22 54.31M = $8.96M, all numbers are taken from Table 1.
26

23

5.3

Model Fit

In order to evaluate how the model fit the data, I simulate the line length decisions in all 50 markets.
Within each market, the prior mean 1 is drawn from known distribution, and prior precision 1
is known from estimation, initial line length n1 = l2 is taken as given. After specifying the initial
condition, beliefs are updated from signals (1 , h1 ) to get belief in period 2 (2 , 2 ), and the
optimal line length n2 is simulated, and the process goes on to the end of data period.
I run simulations to check how the model fit the data. In the first simulation, signals (t , ht )
are taken from data. In the second simulation, I simulate these signals. Figure 3 compares actual
and simulated line length in two markets, and Figure 4 compares the whole distribution of line
length and line length changes for actual and simulated data. Both simulations fit the data quite
well in most markets. The first simulation fits the data almost perfect, because it makes use of
most information from the data. The second simulation also fits well. In the model, there are
three factors that determines the optimal line length choices. They are evolution of preference
heterogeneity, signaling error caused by demand shocks, and random fixed cost of product line
adjustment. The first simulation only average out random fixed cost, and simulation result confirms
that this cost is not the driving force for actual line length patterns. The second simulation averaged
out both random fixed cost and signaling error. The only remaining force that determines the line
length pattern is the evolution of preference heterogeneity, which is the main mechanism in this
paper. In the remaining part of this paper, I will always implement the second simulation.

5.4

Counterfactuals

I run two sets of counterfactual simulations to evaluate Company As optimal line length responses
to product-line related policy changes. In the first counterfactual exercise, I evaluate firms optimal
line length decisions under a smooth cost structure; in the second counterfactual exercise, I estimate
firms improvement in gross margin under complete information about preference heterogeneity
when making line length decisions. A byproduct of of the second counterfactual exercise is to
provide some indirect test on the information assumptions of the firm: does he know or learn?
5.4.1

Smooth cost structure

The non-linearity of cost structure restricts firms flexibility to adjust product line. This simulation
quantify how much. In this simulation, I take the cost structure as linear in the first specification
from supply side and simulate market signals as well as firms optimal responses. The results is
illustrated in Figure 5. The distribution of line length does not change too much, as is shown from
24

the left panel, whereas the distribution of line length changes becomes more dispersed in the right
panel, which means that Company A is more likely to adjust line length aggressively in the smooth
cost structure. To further quantify this change, the probability of line length adjustment grows
from 70% in the raw data to 90% in simulation.
The effect is quite symmetric in line length expansion and line length contraction, as is shown
in the right panel. Under a smooth cost structure, probability of line length expansion and line
length contraction both increases significantly. As is mentioned before, the increase in line length
expansion reflects the static concern that expanding the product line will incur more cost, whereas
increase in line length contraction reflects the dynamic concern that the firm is more cautious in
withdrawing some flavor because they might worry about the future cost of pull them back again.
Simulation result confirms the existence of both effects that restricts the flexibility of line length
adjustment.
5.4.2

Perfect information on preference heterogeneity

Figure 6 shows the simulation result for complete information on preference heterogeneity when
making line length decisions. The actual line length decisions under complete information deviate
a lot from the baseline case with learning heterogeneity. This is simply because Company A adapts
instantly to the time-evolving heterogeneity rather than chasing time-varying heterogeneity under
the learning model. The resulting gross margin is increased by 5% under complete information.
On the other hand, the change in line length adjustment does not change a lot.
Based on this simulation result, I can indirectly test the information hypothesis that Company
A learns rather than knows the true value of preference heterogeneity. First note that the two
hypothesis are not nested in the model of stationary learning,28 so there is no direct test based on
some parameters. Motivated by the fact that with complete information, Company A will enjoy a
higher gross margin, I propose the following test based on gross margin.
In the data, we can calculate the gross margin across 50 cities over 28 quarters, which gives
us a vector gm with a length of 1,400. Let FK denote the distribution of gm generated in model
where firms knows heterogeneity, and FL denote the distribution of gm generated from the model
where firms learns heterogeneity. To test the assumption of learning, it is equivalent to test
H0 : gm FK , H1 : gm FL
28
In the standard learning framework, the two hypothesis is nested. In order to test whether agent knows the true
value, it is equivalent to test whether the initial belief precision is infinity. (Hitsch, 2006)

25

It is quite difficult to calculate a test statistics in testing high-dimensional vector, but at least
we can sacrifice some of the power and focus on some statistics. Figure 7 reports the test result
for the median level of gross margin. We can see that the two distributions are quite separated,
and the actual data is observed to come from FL . We can reject the null and tend to believe in
the information assumption, that Company A learns about preference heterogeneity when making
product line decisions.

Conclusion

This paper links product line length decisions with heterogeneity of preference and rationalizes its
cross-sectional and intertemporal variation. Preference heterogeneity in this paper is an aggregate
measure of both preference heterogeneity across individuals and variety seeking within individuals, and it is measured by nesting parameters in the standard nested logit model. Cross-sectional
variation in preference heterogeneity, which is partly driven by the diversity of population demographics, explains differentials in line length among different cities. Within one city, a firms
in-market learning of preference heterogeneity drives line length adjustment.
I apply the model to the potato chip industry, where Company A is the lead player. The preference heterogeneity is estimated to be 0.41 in large package size chips and 0.67 in small package
size chips, which means preference for small packages is more heterogenous. This is driven by
more intensive variety seeking for small package chips. I also find that preference is more heterogeneous in markets with higher Hispanic population ratios or higher dispersions in age distribution.
On the supply side, Company A, as an experienced firm in a mature market, also applies inmarket learning about preference heterogeneity to adjust proliferation decisions. I find Company
A bases its decisions primarily on past experience in the market, with the most recent market
realization representing only one-third of the influence on product-line decisions. The cost for
maintaining an average line length constitutes about 2% of total revenue. I estimate the sunk cost
incurred when expanding product proliferation to be three times the usual maintenance cost, which
may limit the flexibility of product-line adjustment.
Counterfactual analysis based on the estimates evaluate firms optimal line length decisions
under smooth cost and in cases with complete information rather than learning about preference
heterogeneity. In the first case, Company A is found to be more aggressive in line length adjustment under a smooth cost structure; in the second case, Company As gross margin is increased by
5% when she knows the true value of preference heterogeneity. The result for the second counterfactual also help to test the information assumption that firms learns rather than knows the pref26

erence heterogeneity at the time of line length decisions. The test result supports the information
assumption of learning.
The whole model is easily applicable to other industries in which product proliferation is a key
decision. One example is the two MP3 players produced by Apple: iPod Classic and iPod Nano.
iPod classic provides a limited choice of colorsalways black or whitebut iPod Nano offers
a longer line of colors. The length of the Nano line also varies over time, from two in the first
generation to nine in the fourth generation and back to six in the most recent one. The mechanism
in this paper explains the difference between two MP3 players, as most consumers of the iPod
Classic are professional music lovers who care more about sound quality, control convenience,
and storage and less about colors, whereas consumers buying iPod Nano are younger on average
and care more about colors and have more diverse views on their favorite one. The time-varying
changes in line length for the Nano can be attributed to Apple gradual learning about preference
diversity.
The model simplifies the measure of preference heterogeneity. I use a nesting parameter in the
nested logit model for two primary reasons. First, nested logit is simple and clean. If I allow an
arbitrary substitution pattern for individual-specific demand shock, I can estimate a mixed logit,
but getting one statistic to measure the diversity of preference is difficult. Second, the linear representation of the nested logit model makes the supply-side learning tractable. Further research
should be directed toward finding a better way to model preference diversity and link it to product
proliferation decisions.
Another shortcoming of the paper is the measure of product proliferation. In this paper, I
use line length as a highly abstract measure while ignoring the real contents of the product. I
make the assumption primarily for the simplification of state space. If more detailed feature-level
information were incorporated, the state space might grow exponentially. One possibility is to
apply some heuristic rule in incorporating some statistics of the s for existing features, and this
also calls for future work.
A third limitation is that I make the monopoly assumption. This assumption is justifiable in the
potato chip market, but in other markets with competition, the supply-side learning model needs
to be modified.

References
Berry, S., J. Levinsohn, and A. Pakes (1995). Automobile prices in market equilibrium. Econometrica: Journal of the Econometric Society, 841890.
Berry, S., J. Levinsohn, and A. Pakes (2004). Differentiated products demand systems from a
27

combination of micro and macro data: The new car market. Journal of Political Economy 112(1),
68105.
Berry, S. T. (1992). Estimation of a model of entry in the airline industry. Econometrica: Journal
of the Econometric Society, 889917.
Berry, S. T. (1994). Estimating discrete-choice models of product differentiation. The RAND
Journal of Economics, 242262.
Berry, S. T. and J. Waldfogel (2001). Do mergers increase product variety? evidence from radio
broadcasting. The Quarterly Journal of Economics 116(3), 10091025.
Bresnahan, T. F. and P. C. Reiss (1990). Entry in monopoly market. The Review of Economic
Studies 57(4), 531553.
Bresnahan, T. F. and P. C. Reiss (1991). Entry and competition in concentrated markets. Journal
of Political Economy, 9771009.
Bresnahan, T. F., S. Stern, and M. Trajtenberg (1997). Market segmentation and the sources of
rents from innovation: Personal computers in the late 1980s. RAND Journal of Economics,
S17S44.
Bronnenberg, B. J. (2014). The provision of convenience and variety by the market. Available at
SSRN.
Bronnenberg, B. J., J.-P. H. Dub, and M. Gentzkow (2012). The evolution of brand preferences:
Evidence from consumer migration. American Economic Review 102(6), 24722508.
Bronnenberg, B. J., M. W. Kruger, and C. F. Mela (2008). Database paper-the iri marketing data
set. Marketing Science 27(4), 745748.
Cardell, N. S. (1997). Variance components structures for the extreme-value and logistic distributions with application to models of heterogeneity. Econometric Theory 13(02), 185213.
Chernev, A. (2003a). Product assortment and individual decision processes. Journal of Personality
and Social Psychology 85(1), 151.
Chernev, A. (2003b). When more is less and less is more: The role of ideal point availability and
assortment in consumer choice. Journal of consumer Research 30(2), 170183.
Ching, A. T., T. Erdem, and M. P. Keane (2013). Invited paper-learning models: An assessment of
progress, challenges, and new developments. Marketing Science 32(6), 913938.
Chintagunta, P. K. (1998). Inertia and variety seeking in a model of brand-purchase timing. Marketing Science 17(3), 253270.
Chintagunta, P. K. (1999). Variety seeking, purchase timing, and the "lightning bolt" brand choice
model. Management Science 45(4), 486498.
28

Crawford, G., A. Shcherbakov, and M. Shum (2011). The welfare effects of endogenous quality
choice: evidence from cable television markets. Technical report, mimeo. University of Warwick.
Crawford, G. S. and M. Shum (2005). Uncertainty and learning in pharmaceutical demand. Econometrica 73(4), 11371173.
Dickstein, M. J. (2014). Efficient provision of experience goods: Evidence from antidepressant
choice. Working Paper.
Draganska, M. and D. C. Jain (2005). Product-line length as a competitive tool. Journal of Economics & Management Strategy 14(1), 128.
Draganska, M. and D. C. Jain (2006). Consumer preferences and product-line pricing strategies:
An empirical analysis. Marketing science 25(2), 164174.
Draganska, M., M. Mazzeo, and K. Seim (2009). Beyond plain vanilla: Modeling joint product
assortment and pricing decisions. QME 7(2), 105146.
Dub, J., J. T. Fox, and C. Su (2013). Improving the numerical performance of blp static and
dynamic discrete choice random coefficients demand estimation. forthcoming in. Econometrica.
Dub, J.-P., G. J. Hitsch, and P. E. Rossi (2009). Do switching costs make markets less competitive?
Journal of Marketing Research 46(4), 435445.
Dub, J.-P., G. J. Hitsch, and P. E. Rossi (2010). State dependence and alternative explanations for
consumer inertia. The RAND Journal of Economics 41(3), 417445.
Dub, J.-P., K. Sudhir, A. Ching, G. S. Crawford, M. Draganska, J. T. Fox, W. Hartmann, G. J.
Hitsch, V. B. Viard, M. Villas-Boas, et al. (2005). Recent advances in structural econometric
modeling: Dynamics, product positioning and entry. Marketing Letters 16(3-4), 209224.
Einav, L. (2010). Not all rivals look alike: Estimating an equilibrium model of the release date
timing game. Economic Inquiry 48(2), 369390.
Erdem, T. and M. P. Keane (1996). Decision-making under uncertainty: Capturing dynamic brand
choice processes in turbulent consumer goods markets. Marketing science 15(1), 120.
Fan, Y. (2013). Ownership consolidation and product characteristics: A study of the us daily
newspaper market. The American Economic Review 103(5), 15981628.
First-Research (2011). Industry profile - snack foods manufacturing. Technical report.
Goettler, R. L. and B. R. Gordon (2011). Does amd spur intel to innovate more?
Political Economy 119(6), 11411200.

Journal of

Grant, M., S. Boyd, and Y. Ye (2008). Cvx: Matlab software for disciplined convex programming.
29

Griliches, Z. and J. A. Hausman (1986). Errors in variables in panel data. Journal of econometrics 31(1), 93118.
Guo, L. and J. Zhang (2012). Consumer deliberation and product line design. Marketing Science 31(6), 9951007.
Hitsch, G. J. (2006). An empirical model of optimal dynamic product launch and exit under
demand uncertainty. Marketing Science 25(1), 2550.
Hu, Y. and S. M. Schennach (2008). Instrumental variable treatment of nonclassical measurement
error models. Econometrica 76(1), 195216.
Hu, Y. and M. Shum (2012). Nonparametric identification of dynamic models with unobserved
state variables. Journal of Econometrics 171(1), 3244.
Hui, K.-L. (2004). Product variety under brand influence: An empirical investigation of personal
computer demand. Management Science 50(5), 686700.
Iyengar, S. S. and M. R. Lepper (2000). When choice is demotivating: Can one desire too much
of a good thing? Journal of personality and social psychology 79(6), 995.
Joon, H. (2013). Snack food production in the us. Technical report, IBISWorld.
Judd, K. L. (1998). Numerical methods in economics. MIT press.
Kamenica, E. (2008). Contextual inference in markets: On the informational content of product
lines. The American Economic Review 98(5), 21272149.
Lin, S., J. Zhang, and J. R. Hauser (2014). Learning from experience, simply. Marketing Science.
Liu, Y. and T. H. Cui (2010). The length of product line in distribution channels. Marketing
Science 29(3), 474482.
Lovett, M., W. Bolding, and R. Staelin (2009). Consumer learning models for perceived and actual
product instability. Working Paper.
Magnac, T. and D. Thesmar (2002). Identifying dynamic discrete decision processes. Econometrica 70(2), 801816.
Mazzeo, M. J. (2002). Product choice and oligopoly market structure. RAND Journal of Economics, 221242.
Narayanan, S. and P. Manchanda (2009). Heterogeneous learning and the targeting of marketing
communication for new products. Marketing Science 28(3), 424441.
Nevo, A. (2001). Measuring market power in the ready-to-eat cereal industry. Econometrica 69(2),
307342.

30

Orhun, A. Y. (2009). Optimal product line design when consumers exhibit choice set-dependent
preferences. Marketing Science 28(5), 868886.
Petrin, A. (2002). Quantifying the benefits of new products: The case of the minivan. Journal of
Political Economy 110(4), 705729.
Reiss, P. C. and P. T. Spiller (1989). Competition and entry in small airline markets. Journal of
Law and Economics 32(2), S179202.
Roberts, J. H. and G. L. Urban (1988). Modeling multiattribute utility, risk, and belief dynamics
for new consumer durable brand choice. Management Science 34(2), 167185.
Ryan, S. P. and C. Tucker (2012). Heterogeneity and the dynamics of technology adoption. Quantitative Marketing and Economics 10(1), 63109.
Seetharaman, P., S. Chib, A. Ainslie, P. Boatwright, T. Chan, S. Gupta, N. Mehta, V. Rao, and
A. Strijnev (2005). Models of multi-category choice behavior. Marketing Letters 16(3-4), 239
254.
Seim, K. (2006). An empirical model of firm entry with endogenous product-type choices. The
RAND Journal of Economics 37(3), 619640.
Simonson, I. and A. Tversky (1992). Choice in context: Tradeoff contrast and extremeness aversion. Journal of marketing research.
Srensen, M. (2007). How smart is smart money? a two-sided matching model of venture capital.
The Journal of Finance 62(6), 27252762.
Sweeting, A. (2010). The effects of mergers on product positioning: evidence from the music radio
industry. The RAND Journal of Economics 41(2), 372397.
Urban, G. L. and J. R. Hauser (1993). Design and marketing of new products, Volume 2. Prentice
Hall Englewood Cliffs, NJ.
Urban, G. L. and G. M. Katz (1983). Pre-test-market models: Validation and managerial implications. Journal of Marketing Research (JMR) 20(3).
Villas-Boas, J. M. (2004). Communication strategies and product line design. Marketing Science 23(3), 304316.

31

Table 1: Summary Statistics


Variable
Sales and prices
Line Length (# of features)
Change in Line Length
Line Expansion
HHI for In-line Market Share
Std. for Log In-line Market Share
Number of competitor firms
Number of competitor UPC
Market Share
Market Share In Line
Price ($/oz)
Fat Free
Reduced Fat
Ruffle Cut
Wavy Cut
Market size (Million Oz)
Demographics
Median Income (1K $)
Median Age
Interquartile Income (1M $)
Interquartile Age (10 yrs)
Asian %
Hispanic %
Number of Households (Million)
Cost shifters
Potato Price ($/100lb)
Refined Sugar Price (cent/lb)
Soy Bean Oil Price (cent/lb)
Edible Butter Price ($/lb)
Edible Tallow Price (cent/lb)

Obs

Mean

Std. Dev.

Min

Max

1400
1350
1350
1400
1400
1400
1400
30930
30930
30930
30930
30930
30930
30930
50

22.09
0.19
0.39
0.13
1.28
7.41
51.77
0.03
0.05
0.25
0.09
0.15
0.28
0.13
54.31

3.86
2.11
0.49
0.03
0.23
2.77
24.71
0.05
0.06
0.07
0.28
0.35
0.45
0.33
57.10

8.00
-8.00
0.00
0.07
0.72
3.00
12.00
0.00
0.00
0.12
0.00
0.00
0.00
0.00
6.06

30.00
9.00
1.00
0.36
2.06
20.00
166.00
0.38
0.53
0.43
1.00
1.00
1.00
1.00
278.27

1400
1400
1400
1400
1400
1400
50

56.81
35.08
0.06
3.40
0.04
0.10
2.63

8.98
2.89
0.01
0.20
0.04
0.11
3.10

23.10
26.00
0.04
2.75
0.00
0.00
0.26

89.09
48.33
0.10
4.47
0.31
0.53
17.10

28
28
7
7
7

12.37
45.20
28.28
1.41
19.60

4.29
3.58
11.58
0.27
5.53

7.42
41.93
16.46
1.11
13.71

21.90
51.93
52.03
1.82
30.76

Note: Sales and prices data for 58 Company-A features (a unique combination of 36 flavors, 3 fat contents
regular, reduced fat, fat free and 3 cut types flat, ruffle, wavy) across 50 markets, over 28 quarters in 7 years
(2001-2007) are aggregated from IRI Academic dataset. Features that have positive sales for less than 12 weeks
are dropped from the sample and their market shares are proportionally allocated to other features within serving
sizes city quarter. Demographic data over 50 cities and 28 quarters are merged from IPUM CPS dataset. Cost
shifters for 28 quarters or 7 years depending on the data availability are collected from various year books
published by Bureau of Labor Statistics and Department of Agriculture.

Table 2: Reduced Form Evidence on Dynamic Line Length Decisions


FE, Dependent variable is
Line Length, t+1
(1)
Concentration measure
HHI

-6.78***
(1.41)

Sdev Ln Share In Line


Line Length
City fe
Quarter fe
Observations
Adjusted R-squared
Mean Dependent Variable

(2)

0.86***
(0.01)
Yes
Yes
1350
0.86
22.34

1(Line Expansion, t+1)


(3)

(4)

-1.58***
(0.41)
-1.57***
(0.25)
0.79***
(0.02)
Yes
Yes
1350
0.73
22.34

-0.03***
(0.00)
Yes
Yes
1350
0.34
0.39

-0.20***
(0.07)
-0.05***
(0.01)
Yes
Yes
1350
0.34
0.39

Mean
(Std)
0.13
(0.03)
1.28
(0.23)
22.15
(3.89)

Note: This table illustrates the reduced-form evidence for line length adjustment in response to time-evolving preference
heterogeneity. Preference heterogeneity is inversely correlated with concentration for in-line market shares, i.e.,
concentrated in-line market-share means homogenous preference. All columns are panel data regressions with market fixed
effects. The dependent variables are next-quarter line-length in columns (1) and (2) and next-quarter dummy for line length
expansion in columns (3) and (4). Line length is the count of features (flavor-cut-fat) within each market-quarter after
dropping transient ones with less than 12 weeks of positive sales.
All data come from IRI Academic Dataset.

Table 3: Demand Estimation


Dependent Variable is Ln(Share1) Ln(Share0)
OLS
Preference Heterogeneity
Price

2SLS

(1)

(2)

(3)

(4)

0.02***
(0.00)
-0.13**
(0.05)

0.23***
(0.01)
-2.53***
(0.19)

0.41***
(0.02)
-2.38***
(0.22)

0.49***
(0.02)
-49.50***
(2.92)
3.08***
(0.30)
3.33***
(0.60)

-0.01***
(0.00)
0.01
(0.01)
0.01
(0.01)
-0.01
(0.01)
Yes
No
Yes
30930

-0.11***
(0.01)
0.03***
(0.01)
-0.06***
(0.01)
-0.15***
(0.01)
Yes
No
Yes
30930

No
Yes
Yes
30930

No
Yes
Yes
30930

Ln(Median Income)
Ln(Median Age)
Ruffle cut
Wavy cut
Fat free
Reduced fat
Flavor fe
Product fe
Market fe
Observations

Mean
(Std)

0.25
(0.07)
10.94
(0.16)
3.55
(0.08)
0.28
(0.45)
0.13
(0.33)
0.09
(0.28)
0.15
(0.35)

Note: This table shows the demand estimation induced by nested logit model. The dependent variable for all
columns are the difference between logarithm of total Frito Lay shares and total shares from outside goods. Column
(1) uses OLS, column (2) (4) uses 2SLS, with three sets of instrumental variables including BLP instruments
(summation of flavor, cut and fat dummies for other features in the same serving-city-quarter), Hausman instruments
(average price sold for the same feature in other city within serving-quarter, price of materials including potatoes,
sugar, soy bean oil, edible butter and edible tallow) and competition environment (number of competitor firms and
number of competitor UPCs other than Company-A chips within serving-city-quarter).

Table 4: Demand Estimation with Varieties of Preference Heterogeneity

Baseline
(1)
Preference Heterogeneity

2SLS, Dependent Variable is Ln(Share1) Ln(Share0)


Small
Interquartile
Interquartile
Asian
Hispanic
Packaged
Income (1M)
Age (10 yr)
(2)
(3)
(4)
(5)
(6)

0.41***
(0.02)

0.67***
(0.04)

-2.38***
(0.22)
Yes
30930
0.81

-2.74***
(0.29)
Yes
9155
0.41

Diversity Measure
Price
Product fe
Observations
Adjusted R-squared

Above p50
Hispanic
(7)

0.36***
(0.02)
1.48***
(0.19)
-2.99***
(0.24)
Yes
30930
0.79

0.30***
(0.04)
0.04***
(0.01)
-2.62***
(0.24)
Yes
30930
0.8

0.41***
(0.02)
0.47***
(0.15)
-2.57***
(0.23)
Yes
30930
0.80

0.42***
(0.02)
-0.09
(0.06)
-2.30***
(0.22)
Yes
30930
0.81

0.36***
(0.02)
0.12***
(0.01)
-2.87***
(0.23)
Yes
30930
0.78

0.06
0.04
0.10

3.39
2.75
4.47

0.04
0.00
0.31

0.1
0.00
0.53

0.5
0.00
1.00

Summary statistics of population diversity measure


Mean
Min
Max

Note: This table shows the demand estimation of nested logit model allowing preference heterogeneity to vary by observables. The
dependent variable for all columns are the difference between logarithm of total Frito Lay shares and total shares from outside goods.
All columns are estimated using 2SLS with three sets of instruments: BLP instruments, Hausman instruments, and competition
environments. Column (1) is the baseline estimates for large package size potato chips, which is identical to Column (3) in Table 3.
Column (2) reports the estimates with identical specification but in small-sized package chips (1-4 serving sizes). Column (3)-(7)
allow preference heterogeneity to vary by different measures of population diversity, where Column(3) uses interquartile of income,
Column (4) uses interquartile of age, Column(5) uses Asian population ratio, Column (6) uses Hispanic population ratio, and Column
(7) uses discretized Hispanic population ratio, which is the dummy for above-median Hispanic population ratio.

Table 5: Supply Side Parameter Estimation


Linear Cost

Nonlinear Cost

s.e.

s.e.

3.56

(0.86)

2.08

(1.38)

6.14

(0.65)

Precision Ratio ! /

2.55

(0.02)

2.42

(0.02)

Scale of fixed cost !

0.14

(0.00)

0.02

(0.00)

Cost ! (1K $ / 1M HH)


Cost ! (1K $ / 1M HH)

Prior mean !
Log Likelihood

Integrated

Integrated

-83.37

-63.61

Note: The cost function for linear specification is = ! , while the cost function for
nonlinear specification is ! , !!! = ! + ! (! > !!! ) ! .

Figure 1: Distribution of line length and line length changes


Change in Line Length

.2
0

.1

.05

.1

.3

.15

Line Length

10

15

20
Line Length

25

30

-10

-5
0
5
Change in Line Length

10

Note: Left figure plots the distribution of line length among 50 markets over 28 quarters, and right
figure plots the distribution of change in line length, which is first difference for line length over
two consecutive quarters within one market. Line length is defined as the count of products (unique
combination of flavor-fat-cut) within the city-quarters. Products with positive sales for less than
12 weeks within city-quarters are not counted.

37

Figure 2: Identification line length maintenance cost

.7
Total Share
.5
.6
.4
.3

.3

.4

Total Share
.5
.6

.7

.8

High c

.8

Low c

10
15
Line length (n)

20

10
15
Line length (n)

20

Note: This figure shows the identification of line length maintenance cost. In each plot, the thick
curves are the total market share as a function of line length. I plot three curves with identical
variance but different mean value of preference heterogeneity . We can see that the total market
share is increasing in line length, preference heterogeneity and super-modular in the two parameters. Straight lines are cost function, and the slope represents the marginal cost of expanding the
line length. The tangent point of cost line and market share curve represents the optimal line length
decisions. We can see that the implied optimal line length is higher when preference heterogeneity is higher. The two plots differ in marginal cost, and we can see that when cost is lower, line
length decisions are more responsive to change in mean for heterogeneity, which completes the
identification for cost.

38

Figure 3: Model fit - two markets

15

15

20

20

25

25

30

DETROIT

30

BOSTON

2001q3

2003q1

Actual

2004q3

2006q1

2007q3

2001q3

Signal from simulation

2003q1

2004q3

2006q1

2007q3

Signal from data

Note: This figure shows how the model fits the data in two cities: Boston and Detroit. Solid lines
are actual line length decisions, and two dashed lines are line length decisions from simulation.
In the first simulation, signal from data, market signals are taken from the data; in the second
simulation, signal from simulation, market signals are also simulated from the model. Prior
mean in the first periods are drawn from the known distribution, prior precision in the first periods
are estimated.

39

Figure 4: Model fit - distribution


Change in Line Length

.2
0

.1

.05

.1

.3

.15

Line Length

10

15

Actual

20
Line Length

25

30

-10

Signal from simulation

-5
0
5
Change in Line Length

10

Signal from data

Note: This figure shows how the model fits the distribution of line length and line length changes.
Solid bars are distribution of actual line length decisions, two lines are kernel density of simulated
line length. In the first simulation, signal from data, market signals are taken from the data; in
the second simulation, signal from simulation, market signals are also simulated from the model.
Prior mean in the first periods are drawn from the known distribution, prior precision in the first
periods are estimated.

40

Figure 5: Counterfactual - smooth cost


Change in Line Length

.2
0

.1

.05

.1

.3

.15

Line Length

10

15

Actual

20
Line Length

25

30

-10

Simulated, step cost

-5
0
5
Change in Line Length

10

Simulated, smooth cost

Note: This figure shows evaluates optimal line length decisions under a smooth cost structure.
Solid bars are distribution of actual line length decisions, and two lines are kernel density of simulated line length: dashed line represents simulated line length in original model with nonlinear
cost, whereas solid line represents simulated line length under linear cost structure. In both simulations, market signals are taken from simulation; prior mean in the first periods are drawn from
the known distribution, prior precision in the first periods are estimated.

41

Figure 6: Counterfactual - known heterogeneity


Change in Line Length

.2
0

.1

.05

.1

.3

.15

Line Length

10

15

Actual

20
Line Length

25

30

-10

Simulated, learning

-5
0
5
Change in Line Length

10

Simulated, knowing

Note: This figure shows evaluates optimal line length decisions when firms know the precise value
of time-varying preference heterogeneity. Solid bars are distribution of actual line length decisions, and two lines are kernel density of simulated line length: dashed line represents simulated
line length in original model with learning heterogeneity, whereas solid line represents simulated
line length assuming known heterogeneity. In both simulations, market signals are taken from
simulation; prior mean in the first periods are drawn from the known distribution, prior precision
in the first periods are estimated.

42

10

15

20

25

Figure 7: Testing learning assumption based on gross margin

2.1

2.15
2.2
2.25
Gross margin in median market (1M $)
Learning

2.3

Knowing

Note: This figure plots the distribution of simulated median level gross margin in two simulations:
learning preference heterogeneity and knowing heterogeneity. In both simulations, market signals
are taken from simulation; prior mean in the first periods are drawn from the known distribution,
prior precision in the first periods are estimated. Vertical line is the observed median gross margin
from the data.

43

Appendix
A

Data Description

A.1

Feature and Size Category

The IRI Academic dataset provide detailed sales data at the level of UPC-store-week.
UPC, or Uniform Product Code, identifies a unique product sold in the store-week. Products
with different UPCs may have come from different industry and different brand. In this paper,
I focus all products in the industry of Salty Snack - Potato Chips. For potato chips with the
same brand, they may also have different flavor, cut content, fat content, salt component, package
size and other characteristics. Some of the characteristics are very different, i.e., classic flavor
and barbecue flavor, while others are indistinguishable, i.e., 40% reduced fat and 60% reduced
fat. In this analysis, I define feature as the combination of flavor-fat-cut, where I have identified
41 flavors, 3 fat contents (regular, reduced, fat free) and 3 cut type (flat, wavy and ruffle) among
all potato chips from Company A. In total, there are 63 features that have ever been sold between
2001q1 and 2007q4 in the dataset. All features defined above are visually distinguishable by the
look of package.
Another continuous characteristics for products is the package size. There are both 1.04 oz
and 1.06 oz package size chips with the same feature sold at the same time. We first discretize
the continuous ounce measure into serving size, which is a standard measure of package size
specified by the Nutrition Labeling and Education Act. According to the Act, one serving size
generally equals 30 grams (1.05oz),29 with some detailed rounding rule.30 We further aggregate
the serving size into three categories: small (1-4), median (5-7) and large (8-13). This aggregation
is picked at natural discontinuous points in the density of market share by serving sizes for both
Company A and Non-Company A chips (Figure 8). Company A is a dominant player in markets
of small and large sizes, and I only use large package size potato chips for most of the analysis in
the paper.

A.2

Sales in Partial Weeks within Quarters

Lacking the data on consumers choice set in store is one challenge for all researches using only
sales data. The fact of zero-sales for one product has two interpretations: either the product is not
29
30

http://www.fda.gov/ICECI/Inspections/InspectionGuides/ucm114097.htm
http://www.fda.gov/ICECI/Inspections/InspectionGuides/ucm114098.htm

44

available on shelf, or the product is not attractive at all. Following the conventions, I use some
aggregation to alleviate this issue. Geographically, I aggregate the store-level sales into city-level.
Temporally, I aggregate the week-level sales in to quarters. Another reason to do the temporal
aggregation is the frequency of firms product proliferation decisions. By doing aggregation, I
implicitly assume that manufacturers are doing proliferation decisions once per quarter.
One follow-up issue raised after doing quarterly aggregation is the fact that some products
may have zero sales in some but not all weeks throughout the quarter. This might result from
two reasons: either the product is unattractive so that in some weeks there are de-facto zero sales,
or the product is launched or withdrawn in the middle of the quarter. If it is the second reason,
ignoring this fact will contaminate the estimates because we mis-interpret the small market share
of these products. The actual market share, as well as the attractiveness, should be scaled up. For
conservative, I drop all products that are observed to be sold in some but all weeks throughout the
size-quarter-city.
Fortunately, this dropping does not lead to significant changes in sales pattern. For all 63 features in 28 quarters among 50 cities, 96% have positive sales in either all or none weeks throughout
the quarter. After dropping these partial features from Company A, there are 58 features (36 flavors, 3 fat contents and 3 cut types) remaining in the dataset. Market share of Company A does
not change much by dropping these features. To keep the estimates of brand value consistent, I
allocate the sales of these dropped features proportionally to those remaining features.

Measure of Concentration

I propose two measures on the preference heterogeneity in the main part: HHI for share in line, and
standard deviation of logarithm of share in line. In this section, I will show that the first measure
is mechanically decreasing in the line length, while the second is not.
Suppose the share in line is
 

exp f
sf |l = P 
k

and the two measures are defined as


HHI =

s2f |l

StdLnInLineShare = Std ln sf |l

45

The mechanical decrease in HHI as line length becomes larger is straight forward. Think about
the case where does not change but the line length grows from n to n+1. On average, all features
1
split the market, and the HHI decreases from n1 to n+1
. This effect is also confirmed by simulation,
which is presented in Figure 9.
In terms of the second measure of standard deviation for logarithm of share in line, note
1X
exp
StdLnInLineShare = Std f log n log
n f
 !
1X
f
= Std f log
exp
n f

!

where the mechanical part of StdLnInLineShare that is decreasing in n (the standard deviation
of some average is decreasing as the sample size) is partly alleviated by the covariance of the two
terms, and partly alleviated by the logarithm operator. Simulation reveals that this measure is,
actually, slightly increasing in n (Figure 9).

Function of Line Share

Recall that the share for all Company A products is




exp (I)
s (n, ) = EF
1 + exp (I)
 
n
X
j
I = ln
exp

j=1
For simplicity, write f = + f , E (f ) = 0, and h (z) =
represented as

exp(z)
,
1+exp(z)

the share function can be

 !
j
s (n, , n = (1 , ..., n )) = h + ln
exp

j=1
n
X

s (n, ) = E (
s)

C.1

Desired Properties of s (n, )

As is mentioned in Section 3, the line share function should have the following three properties:

46

1. s (n, ) is increasing and concave in n.


2. s (n, ) is increasing in
3. s (n, ) is super-modular in (n, ), i.e.,
2s
>0
n
All three properties are quite natural. The first property means when the line length is longer, the
share of the whole line is larger, and the marginal increase is decreasing as the line gets longer.
Since product proliferation is a competitive tool for firm to acquire more customers, this property
is the foundation for this kind of strategy. The second property means more preference heterogeneity implies a larger the market share. In previous literature of nested logit model (Hui, 2004;
Draganska and Jain, 2006), is related to the inclusive value of variety (which is I in the above representation), and this property justify their calculation. When is higher, and the inclusive value
for variety is higher, so the share of the whole line is higher. The third property is a necessary
and sufficient condition for the intuitive claim that the optimal product line length is increasing in
preference heterogeneity. Intuitively, when the preference is more heterogenous, the product line is
expected to be long in order to serve more consumer. To guarantee its validity, the super-modularity
condition should be checked, which is similar to the single-crossing condition in contract design.
Fortunately, for the share function induced from nested logit, this super-modularity condition is
satisfied in most cases, as shown below.
All three properties are checked in simulation for a wide range of parameters. Some intuitive,
but not rigorous proof is provided as follows.
To show the first property, let
 !
j
s (n, , n = (1 , ..., n )) = h + log
exp

j=1
n
X

and
s (n + 1, , ( n , n+1 )) > s (n, , n )
so
s (n + 1, ) > s (n, )

47

To show the second property, note that


 

j


exp
X
X

s
1
j
 j
= h (1 h) log

exp
P
k

exp
k

j
j

= h (1 h)

1
(A B)

where the first equality makes use of the fact that h0 (z) = h (z) (1 h (z)), and from Jensens
inequality,
A = log n + log
log n +

1X
exp
n j

 !
j

1X
j
n j

B is also some weighted average of f with


 
j
X exp 1
 j max = max j
B=
P
k
j
exp
k
1
j
so
A B log n +

1X
j max > 0
n j

for large enough n or after taking expectation with respect to . So it is natural to expect
s
= E > 0

 
P

To show the third property, first define j = exp j , n = nj=1 j , and


An

Bn

 
X
j
= log
exp
= log
j

j
j
 
P
j
X exp
j j j
P

=

=
P
j
k
n
k exp
j
X

48

so


An+1 An

n+1
= log 1 + P

Bn+1 Bn

n+1
=
n

X 1
1
n+1
n+1 +

j j
=
n
n+1 n
j
!
n+1
n
1 X
=
n+1
j j
n
n+1
n+1 j

and
(An+1 Bn+1 ) (An Bn )
n+1
=
n

1
n+1

X j
n
n+1
j
n+1

n+1
j

!!

which is non-negative in expectation because the last term is approximately the difference in n+1
and the average of j . Thus,
s (n, , n )

s (n + 1, , n+1 )

>0

when h > 21 . For values of and n in this paper, the super-modularity conditions are valid in most
cases.

C.2

Approximate s (n, )

The standard way to calculate s (n, ) involves simulation. A fixed simulation of fr are drawn in
advance and for each value of (n, ), the line share value is calculated as
X
1X
s (n, ) =
h + log
exp
R r
j

jr

!

This is computationally expensive in the supply side, as for fixed parameter value and initial condition, there is a sequence of realized from different beliefs and the function value is calculated
many times. The number of simulation R cannot be too small, because the second order property
(sub-modularity) need to be preserved in calculation. This is the main computation challenge in
49

this paper, because the line share can be viewed as a component of flow profit, and in most dynamic
papers, the flow profit has a very simple reduced form representation.
I apply numerical approximation to the functional of s (n, ) (Judd, 1998). For each line length
n, I estimate a separate functional approximation of s (n, ) using power polynomials in a uniform
grid of . Suppose n {n1 , n2 , ..., nL } where nl+1 = nl + 1, ~ = {1 , 2 , ..., T } where
t+1 t is constant, and
slt = s (nl , t )
 
can be calculated by simulation. Let sl = (sl1 , ..., slT )0 , yl ~ = (yl (1 ) , ..., yl (T ))0 , the functional approximation is to find some function yl () to solve the following quadratic programming
min

X

 0 
 
~
sl y l
sl yl ~

such that
yl () yl1 () 0
yl+1 () + yl1 () 2yl () < 0

yl () > 0

(yl () yl1 ()) > 0

where the first two inequalities correspond to the first desired property that the line share is increasing and concave in line length, the third inequality implies the second property that the line
share is decreasing in nesting parameter, and the fourth inequality means the sub-modularity.


~ ~2 , ..., ~P 1 be the a T-by-P matrix
Using power polynomial to approximate, let X = 1, ,


~ ..., (P 1) ~P 2 be the matrix of first order derivatives, S =
of polynomials, X 0 = 0, 1, 2,
(s1 , ..., sL ) be an T-by-L matrix of line share calculated by simulation. The approximation function
can be written as
 
yl ~ = XAl
~
yl = X 0 Al

and A = (A1 , A2 , ..., AL ) is a T-by-L matrix of coefficients, the quadratic programming is equivalent as

min tr (S XA)0 (S XA)
A

50

such that
vec (XAQ0 )
vec (XAQ0 Q1 )
vec (X 0 A)
vec (X 0 AQ0 )

0
0
0
0

where the two matrix Q0 (L-by-(L-1)) and Q1 ((L-1)-by-(L-2)) have similar structure as

1
1

...

... 1
1

The above programming can be solved quickly using CVX, the modeling system for disciplined
convex programming (Grant et al., 2008). I solve the programming separately for each market m
and get a polynomial approximation for the function of line shares and use this approximation in
the supply side estimation.

Conjugate Prior in Truncated Normal Distribution

In this section, I will prove the following theorem to justify the learning process of the paper. If
the prior is truncated normal, the signal is unbounded normal, the corresponding posterior is also
truncated normal. In other words, the truncated normal distribution is also a conjugated prior for a
standard normal likelihood of signal generation.31
Theorem 2. Suppose the parameter
 of interest is distributed in normal distribution truncated at
0 and 1, i.e., T N 0 , 10 , 0, 1 , and the likelihood for signal
x=+


where N 0, 1 , then the posterior distribution


1
|x T N 1 , , 0, 1
1
31

Srensen (2007) shows similar results to justify his MCMC approach.

51

with

0
0 +
x
0 +
0 +
= 0 +

1 =
1

Proof. Lets define (t, , 2 ) be the normal pdf with mean and variance 2 , and (t, , 2 ) =
t
(s, , ) ds be the CDF. We know that

(, 0 , 1/0 )
(1, 0 , 1/0 ) (0, 0 , 1/0 )
f (x|) = (x, , 1/ )
f () =

so

f (x) =

f (x|) f () d =
0

(x, , 1/ ) (, 0 , 1/0 ) d
(1, 0 , 1/0 ) (0, 0 , 1/0 )
0

and
(x, , 1/ ) (, 0 , 1/0 )
f () f (x|)
= 1
f (x)
(x, , 1/ ) (, 0 , 1/0 ) d
0
+
(x, , 1/ ) (, 0 , 1/0 ) / (x, , 1/ ) (, 0 , 1/0 ) d
= 1
+
(x, , 1/ ) (, 0 , 1/0 ) d/ (x, , 1/ ) (, 0 , 1/0 ) d
0
(, 1 , 1/1 )
(, 1 , 1/1 )
= 1
=
(1, 1 , 1/1 ) (0, 1 , 1/1 )
(, 1 , 1/1 ) d

f (|x) =

where the second equality before last is obtained by the standard conjugate prior of normal distribution, which is
N (0 , 1/0 )
x| N (, 1/ )
will imply
|x N (1 , 1/1 )

52

Time Evolving t

Define the transitional process from t (0, 1) to t+1 (0, 1) by the acceptance-rejection process
as follows. Let (1) = t + (1) , where (1) N (0, 1). If (t) (0, 1), then t+1 = (1) . Else,
try (2) = t + (2) , and accept (2) if (2) (0, 1). Continue this process until I get some
(n) (0, 1). This process defines a transition process of
t+1 |t f (|t ; )
However, there is no explicit-form representation of f (). And to simplify the model, we may
expect that f () convoluted with truncated normal also returns a truncated normal, i.e.,
t T N

2
0t , (t0 )

t+1 |t f (|t ; )

will lead
gt+1 (t+1 ) =

f (t+1 |t ; ) g (t ) dt

 32
2
also be a truncated normal T N t+1 , t+1
. There is no guarantee that this is true, but I use truncated normal to approximate the distribution of gt+1 () in order to facilitate the model treatment.
The approximation is quite precise, as is shown in Figure 10.

Identification Conditions and Identification Moments

In the main text, I show identification of supply side model if the prior mean is known, so that the
state variables of the dynamic game are known to econometrician after identifying prior precision.
However, the actual identification does not need information about prior mean. Without the information of prior mean, the model becomes a dynamic model with unobserved and time varying state
variables. In this section, I will borrow the recent identification result to show the identification of
my model.
I borrow recent progress of using non-classical measurement error to identify dynamic models (Hu and Schennach, 2008; Hu and Shum, 2012). Similar to Griliches and Hausman (1986)
and Hu and Shum (2012), current signal serves as a measure of unobserved state variable with
measurement error, and past signal can serve as an IV. I first show the correspondence between
32

This property is true under non-truncated normal case.

53

non-classical measurement error model and dynamic model in this paper, followed by a review of
the identification argument. I will next verify the assumptions that is necessary for identification.
The section ends with some reduced form results on moments that are key to the identification.
The only unobserved state variable is the mean of belief t . The data observed is
{nt , t , ht }Tt=1
where nt is the line length, and t is the observed signal. The unobserved state variable (unobserved to econometrician) is t and the observed state variable is (t , lt ).

F.1

Mapping from Non-classical Measurement Error Model to Dynamic


Supply Model

The identification borrows from the non-classical measurement error model. In non-classical measurement error model, let y be the outcome variable, x be the true independent variable, x is some
measure of x with error, and z is some IV. The general assumption of the model is
f (y|x , x, z) = f (y|x )

(7)

f (x|x , z) = f (x|x )

(8)

where the linear form of the above model is


y = x +
x = x +
Cov (z, ) = 0
Cov (z, ) = 0
In the dynamic supply model in this paper, the outcome variable is the line length y = nt , the
unobserved true independent variable is the state variable x = t , the measure of which is t with
error, and the IV is t1 . The above two assumption is easy to verify. From now on, I suppress the
observed state variables (t , lt ) and only focus on the unobserved state variable t . (7) is
f (nt |t , t , t1 ) = f (nt |t )
which is true by model assumption of Markovian model, that the action is determined purely by

54

the belief. (8) is equivalent to

f (t |t , t1 ) =

f (t |t , t , t1 ) f (t |t , t1 ) dt

f (t |t ) f (t |t ) dt

= f (t |t )
where the second equality comes from the property of Bayesian learning, i.e., state variable is a
sufficient statistics on the truth.

F.2

Intuitive Proof of Non-classical Measurement Error Model

Hu and Schennach (2008) show the proof of the non-classical measurement error model using
linear operator and functional analysis. Here I provide an intuitive proof on the discrete case to
show intuition.

}. And
Let x X = {x1 , x2 , ..., xdx }, x X = {x1 , x2 , ..., xdx }, z Z = {z1 , z2 , ..., zdz
fixing y, let
my (x|z) = f (y, x|z)

=
f (y|x ) f (x|x ) f (x |z) dx
X
=
f (x|xi ) my (xi ) f (xi |z)
i

where my (xi ) = f (y|xi ). In addition,

f (x|x ) f (x |z) dx
X
=
f (x|xi ) f (xi |z)

f (x|z) =

Let My (X|Z) be dx-by-dz matrix with {my (xi |zj )}i,j , my (X ) = (my (x1 ) , my (x2 ) , ..., my (xdx ))0 ,
and F (X|Z) be dx-by-dz matrix with {f (xi |zj )}i,j , and F (X |Z) and F (X|X ) have similar
definition. The above two equation can be represented as
My (X|Z) = F (X|X ) diag (my (X )) F (X |Z)
F (X|Z) = F (X|X ) F (X |Z)

55

so
My (X|Z) (F (X|Z))1 = F (X|X ) diag (my (X )) (F (X|X ))1
The LHS of above equation is observed from the data, while the RHS is unknown. Under
some assumptions presented below, the RHS can be obtained from eigendecomposition or spectral
decomposition of the matrix.

F.3

Formal Identification Argument

Theorem 3. The following three elements are identified in the supply model:
1. The conditional choice probability f (nt |t )
2. Initial condition f (t )
3. State transition f (t |t1 , t1 )
Proof. In order to use the identification result from Hu and Schennach (2008) to show the identification of CCP, I need to check three additional assumptions to ensure the monotonicity of my (x )
and the invertibility of F (X|Z) and F (X|X ). In the current context, they correspond to the
monotonicity of f (nt |t ) as well as the non-degenerated conditional density of f (t |t1 ) and
f (t |t ).
The monotonicity of f (nt |t ) come from the sub-modularity of line share function s (n, ).
If the preference heterogeneity is known, there is a unique level of line length n that maximize

the payoff. f (nt |t ) can be decomposed into f (nt |t ) f (t |t ) dt , where both terms are
monotonic, so f (nt |t ) is monotonic, i.e., higher t is more likely to induce lower nt .
The non-degenerated condition density can be verified as follows.

f (t |t1 ) =
=

f (t |t , t1 , t1 ) f (t |t1 , t1 ) f (t1 |t1 ) dt dt1


f (t |t ) f (t |t1 ) f (t1 |t1 ) dt dt1

where all three terms are monotonically increasing, so is f (t |t1 ). Similarly,

f (t |t ) =
=

f (t |t , t ) f (t |t ) dt
f (t |t ) f (t |t ) dt

56

and both terms are monotonically increasing.


The identification of initial condition and state transition is similar to Hu and Shum (2012),
that

f (nt ) = f (nt |t ) f (t ) dt
so f (t ) is identified by deconvolution. Similarly,

f (t |t1 ) =

f (t |t1 , t1 ) f (t1 |t1 ) dt1

where f (t |t1 ) and f (t1 |t1 ) are both identified from the spectral decomposition above.

57

Figure 8: Identification line length maintenance cost

.05

.1

.15

Frito Lay

.05

.1

.15

Non Frito Lay

10 11 12 13 14 15 16 17 18 19 20

Serving size

Note: This figure shows the volume shares of potato chips in different serving sizes within Frito
Lay / Non Frito Lay chip categories. Vertical lines are picked according to natural discontinuity of
serving size distributions. Frito Lay mainly sells potato chips with 8-13 serving sizes, which is the
size categories I use in the analysis.

58

.2

.4

.6

.8

Figure 9: Counterfactual - known heterogeneity

12

17

22

=0.3

27

32

27

32

=0.7

1.5

2.5

3.5

(a) HHI

12

17

22

=0.3

=0.7

(b) Standard Deviation of In-line Market Share

Note: This figure evaluates the two measures of preference heterogeneity based on concentration.
In both figures, the preference heterogeneity is constant through different line length. X-axis are
line length, Y-axis are simulated measures in two figures. They are calculated by averaging across
300 simulations. In each simulation r

r
exp

/
j
srj|l = P
exp (kr /)
Xk
2
HHI r =
srj|l
j

StdLnShareInLine

59

= Std log srj|l

.5

1.5

Figure 10: Identification line length maintenance cost

.2
t

.4

.6

t+1, simulated

.8

t+1, approximated

Note: This figure shows the true density of t , t+1 and approximated density of t+1 . The true
density of t is T N (0.2, 0.32 ). The evolution rate = 0.3. The true density of t+1 comes
(n)
from acceptance-rejection process of (n) = t + v (n) with t+1 = (min{n: (0,1)}) . The
approximated density is T N (0.2, 0.32 + 0.32 ).

60

Vous aimerez peut-être aussi