Vous êtes sur la page 1sur 46

A Framework for Modeling How Consumers Form

Online Search Queries


Jia Liu and Olivier Toubia
Columbia University
October 16, 2015

Abstract

We explore how consumers form online search queries, and in particular the link
between consumers information needs and their search queries. Our goal is to provide
a framework for the development of search models that can infer consumers information needs from their queries. The semantic relationships between queries and results
differentiate query formation from traditional, discrete-choice based search. Accordingly, our specific research questions are as follows: (i) Are consumers able to leverage semantic relationships between queries and results when forming online search
queries? (ii) How should researchers represent these semantic relationships? (iii)
What are consumers beliefs on these semantic relationships? Using an experiment in
which information needs are manipulated exogenously, we find that consumers have
the ability to formulate queries that leverage semantic relationships. Consequently,
models of search query formation should capture consumers beliefs on a set of semantic relationships, which capture the probability that any query will activate any set
of words. Fortunately, we show that these semantic relationships may be approximated
parsimoniously by functions of asymmetric activation probabilities at the word level.
We find that consumers beliefs are biased upwards, and that they are not asymmetric
enough.
Keywords: online search engines, search models, information needs, preferences,
semantic relationships

Jia

Liu is a Ph.D. Candidate in Marketing, Columbia Business School. Email: jl3631@columbia.edu.


Toubia is Glaubinger Professor of Business, Graduate School of Business, Columbia University. Email:
ot2107@columbia.edu
Olivier

Electronic copy available at: http://ssrn.com/abstract=2675331

Introduction
Over the past decade, search engines like Google have become one of the primary tools con-

sumers use when searching for products, services, or content. According to a report by FleishmanHillard (2012), 89% of consumers visit Google, Bing or other search engines to find information
prior to making purchases. For major purchases like cars, 81% of consumers go online before
heading out to the store, and spend an average of 79 days gathering information (GE, 2013). That
is probably why search marketing spending is expected to reach $31.62 billion in the U.S. only in
2015 (Statista, 2015a).
The relevance of search results and of search-related advertising and targeting is a function
of how well the content presented to a consumer matches their underlying information needs.
Therefore, it is essential for search engines and the firms that use them to be able to correctly
infer consumers information needs based on their search queries. For example, the amount that
a marketer should be willing to bid on particular keyword is a function of how well their content
matches with the information needs of consumers who usually type that keyword. More generally,
search queries contain valuable information about users preferences (Pirolli, 2007), which users
tend to provide frequently, voluntarily and truthfully. As such, online search queries have the
potential to be leveraged in revealed preference frameworks.
A large literature in marketing (and economics) has linked preferences and utility to search,
in situations where search is performed by a series of discrete choices (e.g., purchases, clicks).
While text-based search may be viewed as a special case of discrete-choice search, some aspects
of text-based search are not captured by traditional search models. In particular, search queries and
online content are semantically related to each other. For example, consider a consumer typing the
following query: affordable sedan made in America. Inferring this consumers information needs
based on the query may not be as straightforward as concluding that they are simply interested in
affordable sedans made in America. For example, it might be possible that the most important
attributes for this consumer are in fact safety, comfort, and made in America, and that affordability
is of lesser importance. This consumer might have decided to type the query affordable sedan
2
Electronic copy available at: http://ssrn.com/abstract=2675331

made in America because they believe that cars made in America are generally safe (but that
the reverse is not necessarily true) and comfortable (again, the reverse may not be true), but not
necessarily affordable. In that case, the consumer anticipated that they would find relevant search
results (i.e., results that match their information needs) efficiently (i.e., with short queries) by only
including made in America and affordable in their queries, but not safe or comfortable,
although these are important attributes. In other words, the consumer may have leveraged the
semantic relationships between queries and results when formulating their query.
Before proceeding further, we define information needs and semantic relationships more
precisely in our context. The Information Retrieval (IR) literature defines information needs as
topics about which the user desires to know more (Manning et al. (2008), Page 5). Such topics
could be constructed as functions of individual words, using for example natural language processing tools such as Latent Dirichlet Allocation (Blei et al., 2003; Tirunillai and Tellis, 2014). In
that case information needs would be expressed as functions of topics, which themselves would be
expressed as functions of words, i.e., information needs would be indirectly expressed as functions
of words. More generally, information needs may be captured by utility functions that are specified
over a dictionary of relevant words.
In this paper, we define semantic relationships based on word occurrence on the web, following the general approach in the information retrieval literature (Khoo and Na, 2006). Semantic
relationships in our context are objectively defined probabilities of finding specific sets of words
in the top results of a given search engine, given a specific search query.1 Consumers hold beliefs
on these probabilities, which are approximations of the true, empirical probabilities. We note that
according to this definition, semantic relationships are specific to individual search engines. In this
paper we focus on Google, which as of April 2015 had a 88.44% share globally (Statista, 2015b).
We also note that this definition is different from the typical definition of semantic relationships
in cognitive psychology, which relates to spreading activation from memory (Collins and Loftus,
1975; Raaijmakers and Shiffrin, 1981).
1 If information needs are represented by topics instead of words, then the relevant semantic relationships may
describe the link between the topic distribution in the search query and the topic distribution in the search results.

3
Electronic copy available at: http://ssrn.com/abstract=2675331

We make a distinction between two non-mutually exclusive ways of selecting words to form
a search query. Preference-based search consists in selecting the most valuable words for inclusion in the query, i.e., the words that are most strongly related to the consumers information
needs. In our previous example, preference-based search would lead to the inclusion of safe and
comfortable in the query. However, semantic relationships may be leveraged to reach valuable
information with shorter queries. As argued above, this may lead a consumer to form the query
affordable compact car made in America, with the anticipation that the search results will be
likely to contain information about cars that are also safe and comfortable. We label this approach
to forming search queries as semantic-based. Semantic-based search enables consumers to shorten
their queries by leveraging semantic relationships between queries and results. This is consistent
with empirical evidence that users tend to form short queries (Jansen et al., 2000; Spink et al., 2001;
Kamvar and Baluja, 2006; Jansen et al., 2009). This is also consistent with a boundedly rational
view of users, whereby forming longer queries is cognitively costly (Ruthven, 2003; Azzopardi
et al., 2013).
Our motivation for this paper is the question of whether and how consumers leverage semantic
relationships when forming online search queries. This question has clear implications for the development of models that leverage online search queries as a source of information on consumers
preferences. Indeed, if consumers do not engage in semantic-based search, their information needs
may be learned directly and relatively easily from their queries. However, if semantic-based search
is relevant, it becomes necessary to understand how consumers translate their information needs
into search queries, in order to be able to infer information needs from queries. In that case, leveraging the information revealed by consumers in their search queries requires reverse engineering
queries to infer the underlying information needs.
Developing search models that are able to infer information needs from online queries is an
ambitious endeavor, which should probably be tackled by multiple researchers across multiple
papers. In this paper, we attempt to provide a framework for the development of such models,
by improving our basic understanding of the link between information needs and search queries.

Given this background, our specific research questions are as follows: (i) Are consumers able to
leverage semantic relationships between queries and results when forming online search queries?
(ii) How should researchers represent these semantic relationships? (iii) What are consumers
beliefs on these semantic relationships?
The rest of the paper is organized as follows. Section 2 reviews relevant research. Section
3 introduces relevant definitions and notations. Section 4 presents the experimental design for
Study 1. Section 5 addresses our first research question. Section 6 addresses our second research
question. Sections 7 and 8 address our third research question: Section 7 compares the estimated
users beliefs from Study 1 to the true activation probabilities; Section 8 describes Study 2 in
which we measure these beliefs directly and compare them to the truth. Section 9 concludes and
integrates our results into a framework for modeling query formation.

Relevant Literature

2.1

Search Models

Our paper is related to the large literature in marketing that has studied how preferences drive
search, and modeled the search behavior of utility-maximizing agents (Erdem et al., 2005; Hui
et al., 2009; Park and Chung, 2009; Dzyabura, 2013; Yang et al., 2015). Some of this literature
has even studied search in the context of search engines (Jeziorski and Segal, 2010; Kim et al.,
2010; Shi and Trusov, 2013). However, in this literature search is typically expressed by discrete
choices among items such as products or links. Consequently, the link between preferences (or
information needs) and text-based online search queries has largely been ignored in marketing. As
discussed above, text-based search is not a straightforward special case of discrete-choice search in
which consumers would select from a very large universe of queries. In particular, search words are
semantically related to each other and to the search results, which creates a rich set of dependencies
between queries and their results.
Such semantic relationships between words have been used in the cognitive science literature
5

to predict users navigation path on the web, under the framework of Information Foraging Theory
(Pirolli and Card, 1999; Fu and Pirolli, 2007; Wu et al., 2014). In these models, users develop
an updated assessment of a websites relevance after reviewing its content, and adjust their pageviewing strategies based on their ongoing evaluation of the websites utility and their own search
cost. The assessment is assumed to be a function of the association strength between different
words. Because users actual beliefs on these association strengths are unobservable, these researchers usually assume users beliefs are the same as the actual semantic relationships obtained
from online text corpora. Our research differs from these studies in two main ways. First, we focus
on text-based query formation behavior, whereas these models only study discrete search behavior
(e.g., clicking). Second, rather than assuming that users leverage semantic relationships and have
correct beliefs, we test whether users leverage semantic relationships in their queries, study the
accuracy of their beliefs, and identify systematic ways in which beliefs deviate from the truth.

2.2

Information Retrieval

A large body of research on online search queries comes from the Information Retrieval (IR)
literature, which has focused primarily on the problem of finding the most relevant documents
given a query (e.g., Salton and McGill (1986); Manning et al. (2008)). The IR literature was
developed well before web search engines existed, for situations in which professionals trained
in the art of phrasing queries searched over a collection of documents whose style and structure
they understood well. In such situations, queries tend to be well-formed and reflect information
needs accurately. Accordingly, this literature has typically focused more on optimizing information
retrieval systems using a query as input, than on understanding the process by which users form
their queries. However, with web search engines, the link between information needs and queries
may not be as direct, and it is not as well understood (Santos et al., 2015). In contrast to the
traditional approach in the IR literature, in this research we treat search engines as black boxes
and focus on the behavior of the end users. In particular, we attempt to understand how consumers
form online search queries based on their information needs and based on their beliefs on how
6

the search engine operates. Accordingly, we only review here a few areas within the IR literature
which are most relevant to our research.

2.2.1

Descriptive Studies on Online Queries

There is a considerable body of descriptive research on online queries in the IR literature. It has
been shown consistently that most queries are a list of one or more nouns; on average the length
of a query is two to three terms; and at least 80% of queries contain three terms or fewer (Kamvar
and Baluja, 2006; Jansen et al., 2000; Spink et al., 2001; Jansen et al., 2009). The average number
of queries per user session has been found to be between two and five, depending on the study and
the search task (Jansen et al., 2000; Spink et al., 2001; Wu et al., 2014). The underlying reason
could be that forming long queries is costly for users in terms of time, cognitive effort, physical
typing, and so on (Ruthven, 2003; Azzopardi et al., 2013). Importantly, these findings suggest that
it is reasonable for us to focus on short queries with nouns in this research.
Researchers have also proposed different ways to categorize the intent of queries. The first
and most popular categorization of online queries was proposed by Broder (2002) who defined
three broad classes: informational, navigational, and transactional. Informational search involves
looking for a specific fact or topic; navigational search seeks to locate a specific web site; and transactional search usually involves looking for information related to a particular product or service.
Jansen et al. (2008) found that about 80% of queries are informational, about 10% are navigational,
and under 10% are transactional. Rose and Levinson (2004) refine Broders taxonomy by introducing the concept of Resource search, where the users goal is to obtain a resource available
on web pages (e.g., files, movies, etc.). Although these empirical studies provide valuable insights
into what the intent is behind user queries, none of them has suggested a systematic framework for
modeling the query formation process so that more precise inferences could be derived.

2.2.2

IR Models

Models of information retrieval, in general, predict and explain which documents are most
relevant given a search query (Roelleke, 2013). Popular traditional IR models include models
based on the probability of relevance framework such as BM25 (Robertson and Zaragoza, 2009),
and language models such as Unigram (Zhai, 2008). Others have used topic modeling or other
language models to map queries and documents onto topics, and retrieve documents whose topic
distribution matches that in the query (Kurland and Lee, 2004; Wei and Croft, 2006).
We note a few simplifying assumptions about words in queries and documents often made by
IR models. One is the bag-of-word approach, which ignores the position or order of words in
queries and pages. This assumption is commonly used, because it is extremely hard to develop
a model of relevance with positional information without exploding the number of parameters,
and position information has been shown to have surprisingly little effect on retrieval accuracy
(Robertson and Zaragoza, 2009). Other approximations which relate to the independence of words
in queries and/or pages will be introduced more formally in Section 6. In the IR literature, these
approximations are usually assumed to be valid for convenience. In contrast, we will test the
validity of our proposed approximations in Section 6 using the actual semantic relationships on
Google.
We also note that the IR literature has recognized that search queries may not be perfectly representative of the users information needs, and has proposed solutions to this problem, for example
by increasing the diversity of the search results (Santos et al., 2015) or by combining information
across clusters of queries and documents (Kurland and Lee, 2004). However, as mentioned above,
IR models are typically focused on optimizing the search results given a query. In contrast, we attempt to understand how consumers translate their information needs into search queries, in order
to pave the way for the development of search models that can accommodate text-based search. To
the best of our knowledge, this type of research has not been conducted in the IR literature.

2.2.3

Semantic Relationships

Several researchers in IR have considered semantic relationships between queries and documents, typically in order to improve retrieval performance (Li and Xu, 2013). For instance,
Ruthven (2003) explores the use of semantic relationships for query expansion, by identifying new
words that are semantically related to the information the user is presumably seeking, and adding
these new words to existing queries. The effort to leverage semantic relationships took a new dimension with the advent of the semantic web(Berners-Lee et al., 2001) that is meant to identify
semantic relationships between web pages, people, organizations, places, etc. Guha et al. (2003)
use the term semantic search to refer to a system that leverages the semantic web to improve
traditional web searching. For example, their system leverages semantic relationships between
search terms to disambiguate queries by inferring the exact semantic meaning of each word in a
query. See Mangold (2007) for a review of semantic search in IR systems.

In sum, some descriptive research has been published in the IR literature that identifies various types of queries and reports statistics related to online search queries. Previous research has
leveraged semantic relationships to improve the efficiency of search engines, but has not focused
on exploring whether users also leverage such relationships when forming queries. In general,
the IR literature focuses primarily on improving the results of IR systems, taking queries as input. In contrast, we focus on the end users and attempt to understand how they form their search
queries, given their information needs and their beliefs on how search engines operate. As such, by
semantic-based search we refer to the strategies employed by users when forming their queries,
not to systems designed to improve the efficiency of search engines. Therefore, our definition of
semantic-based search should not be confused with the definition of semantic search in IR.

Definitions and Notations


We assume consumers derive value from consuming the content of webpages, based on how

well these pages satisfy their information needs. As argued earlier, the value of a webpage l for
a user may be written as a function of the words that are present in the page. In practice, such
function is likely to be very complex and highly non-linear. For example, some words complement
each other, some words are substitutes, some words have different meanings or value based on
which other words are present, etc. In particular, words may be mapped onto topics using natural
language processing tools such as Latent Dirichlet Allocation (Blei et al., 2003; Tirunillai and
Tellis, 2014). That is, the value of a page to a user may be specified as a function of the topics on
the page, which may themselves be specified as functions of the words on the page.
In this paper, we exogenously select simple functions that link words to value, instead of relying
on richer approaches such as topic modeling. This allows us to have no uncertainty on the correct
specification of the value of a page as a function of its text. This also allows us to manipulate
information needs exogenously, in ways that are easy to explain to users. In particular, we choose
a very simple value function that is linear and additive in a set of dummy variables indicating which
words are present in the page. This assumption is not critical to our analysis, and our results and
conclusions are easily generalizable to alternative specifications.
Let g = {t1 ,t2 , ...,tW } denote the set of relevant words. We denote by j the value of word t j
for a given user. In our experiment, this value is determined by us and communicated to the user.
The value of a webpage l for the user is as follows:

v(l|) =

j I(t j l)

(1)

t j g

where I(t j l) indicates whether webpage l contains word t j .


A search query q is defined as any ordered subset of the words in g. A user who would only
use preference-based search would simply form queries by selecting the words with the highest
j . In contrast, modeling semantic-based search requires capturing the probability that any given
10

query may retrieve webpages with any set of relevant words. In general, when evaluating search
results from a query, users tend to focus on the top results (Narayanan and Kalyanam, 2011; Shi
and Trusov, 2013; Yoganarasimhan, 2015). Accordingly, we focus on the top K = 10 results from
query q. For each possible subset of words s g, we define qs as the probability that a random
webpage l from the top K results of query q contains exactly all the words in set s. We say that a
webpage contains a word when the word appears anywhere on the actual page itself, not just the
page title or snippet displayed on the search engine result page. We refer to qs as the probability
of activating the words in s using query q. The expected value of each top search result l from
query q can be written as:
E(v(l)|, q) =

qs j

sg

(2)

t j s

Even if the value function above is linear and additive, the number of possible queries and the
number of subsets of g both grow exponentially with the number of words in g. In particular, with
W words there are 2W possible subsets of g. Therefore, in Section 6, we will introduce and test
some assumptions that allow approximating the relevant semantic relationships using functions
of a more parsimonious set of parameters. Before investing in building such representation, we
will first verify that it is indeed necessary to capture semantic relationships when modeling and
studying user queries, by providing model-free evidence for semantic-based search in Section 5.

Design of Study 1
Addressing our first research question requires distinguishing semantic-based search from preference-

based search empirically. As mentioned above, doing so requires either a modeling framework that
links information needs to query formation, or an appropriate set of exogenous variations. In order
to inform the development of such a modeling framework, in Study 1 we opt for the second option
and explore the relevance and existence of semantic-based search using experimental data. In particular, we develop a search query game, an experimental paradigm that allows us to manipulate
information needs exogenously.
11

4.1

Search Query Game

We designed our paradigm with the following specifications in mind: (i) the relevant words g
and their values should be set exogenously and provided to participants; (ii) participants should
be asked to form queries based on g and ; (iii) the game should reflect the fact that creating
longer search queries is costly to users; (iv) the game should be incentive-aligned, i.e., participants
payment should be a function of the value (as defined based on ) of the results of their queries
and the length of their queries; (v) the value of a query should be independent of the particular
computer on which the game is played; (vi) in order to focus exclusively on query formation, any
other type of search behavior such as evaluating results and clicking on links should be excluded
from the game; (vii) the game should capture the essence of query formation on search engines;
(viii) the game should be easy to explain to participants.
Taking these into consideration, we developed a search query game that asks each participant
to form search queries on Google to win a cash bonus. This game is played in independent tasks.
In each task a participant is given a set of three words g = (t1 ,t2 ,t3 ). Each word t j is randomly
assigned a monetary value j , which is either high ($2) or low ($1). The participant is asked
to form one search query based on the three given words, i.e., decide which word(s) to use and
in what order. For each query, we consider the pages associated with the top K=10 results. We
compute the value of each page based on the words it contains. For example, suppose the three
words in a task are fruit, salad, and chicken, and their respective values are $2, $1, and $2. A
webpage that contains chicken and salad would have a value of $3. The score associated with
each query is the value of its best result (among the top 10 results), minus the cost of the query in
dollars. This cost simply equals the number of words in the query. In our previous example, the
query salad has a cost of $1, and the query chicken fruit has a cost of $2. This mimics the
various costs reviewed above that are associated with longer queries. Participants are informed that
their queries will be run automatically in the background and that the webpages associated with
the top 10 results will be scanned. The actual instructions of the game to participants are displayed
in Appendix 1. To ensure that participants understand the instructions, they are given a short quiz
12

after reading the instructions. Participants proceed to the game only after having answered all quiz
questions correctly.
One obvious approach for participants is to form a query with all three words, which costs $3
and has the highest value (assuming all three words will be found on at least one page). However,
participants may be able to reduce the cost of the query and maintain its value by leveraging the
semantic relationships among these words. Moreover, picking a word with low value has zero
marginal return to a participant using preference-based search only (both its value and its cost are
equal to $1). However, if a low-value word has a positive probability of activating high-value
words, then picking a low-value word may have a positive marginal return to a user engaging in
semantic-based search. Figure 1 shows an example in which a participant is asked to search for
milk, cheese, and tea. In Figure 1(a), the participant is forming their query by deciding
which words to use and in which order. In this case, although cheese and tea are worth more
than milk, only milk has a strong association with both of the other two words. This implies
that forming the one-word query milk may achieve the highest score.
After submitting a search query, on the next page the participant is shown the url of the link
with the highest score, the list of words that are found on that page, and the score for this task.2
For example, Figure 1(b) is the result after a participant submits the query tea cheese in which
they pick the two words with a higher value, consistent with preference-based search. Figure 1(c)
displays the result after submitting the query milk, consistent with semantic-based search.
[Insert Figure 1 Here]
Recall that we count whether the word appears anywhere on the actual webpage associated
with the search result, not just the title and snippet provided by Google. Also, it does not matter
how many times each word appears on the webpage associated with a result, as long as it appears
at least once. Participants are not allowed to use any other website while playing the game. We
enforced this by running the study in a lab in which we could observe and control the sites accessed
by participants.
2 Sometimes there are multiple links that give the same maximum score for a query. In this case, we only present
one of them.

13

By manipulating information needs and costs exogenously, our query formation game allows
identifying preference-based search from semantic-based search empirically. However, the setting
of the game is somewhat artificial. Therefore, we view this game as an instrument for testing the
existence of semantic-based search by consumers, but not for measuring the extent to which consumers engage in semantic-based search in real life. An analogy may be made to the experimental
economics literature. Games such as the dictator game are used in this literature to show that
individuals have the potential to behave in ways that are inconsistent with maximizing their own
economic well being, although these games do not quantify the extent of such behavior in real life.

4.2

Methods

We chose nouns as our words. We selected these words mainly from the food domain, because
this is a very common domain on which we expect all participants to have at least some knowledge.
Each participant completed 10 tasks in a random order, i.e., 10 rounds of the game. We formed 10
overlapping sets of three words using the following 14 unique words: caffeine, cake, candy, cheese,
drink, Easter, egg, fish, ketchup, milk, pizza, sugar, tea, tomato. In the study, we randomized
the order in which the words were displayed to participants in each task, in order to avoid any
potential ordering effect. We also varied the word values (1 , 2 , 3 ) by selecting randomly (with
equal probabilities) one of the four sets for each of the 10 tasks and each participant: ($2, $2, $2),
($1, $2, $2), ($2, $1, $2), and ($2, $2, $1).
We formed these 10 sets of words so that different types of queries would be optimal across
tasks. In Table 1, we present the 10 sets, along with their corresponding optimal queries. There are
seven tasks in which there exists one trigger word that can activate both other two words in the
search result. For these cases, forming a query using the trigger word alone is the only optimal
query, irrespective of the set of word values. The words in the remaining three tasks have weaker
semantic relationships with each other, and forming queries using two words is optimal in these
cases. In these three tasks, different queries may be optimal based on the particular set of word
values, and more than one query may be optimal for a given set of values. Note that the same word
14

may be a trigger word in one task and non-trigger word in another task.
[Insert Table 1 Here]
Before running the study, we ran on Google all possible queries that may be formed in each
task, and downloaded the source code of the web pages related to the top 10 results associated with
each query. We scanned all the pages to identify which of the target words were included in each
page. We ran all queries on a single computer to ensure that the results given to participants during
the game would not be dependent on the computer on which the query was run. We used these
results during the game, i.e., we did not actually run any query during the game.3
The score in each task in the study can range from $2 to $5. For each participant, we randomly
chose at the end of the game the score from one of the 10 tasks and paid that amount as a bonus
to the participant, in addition to a $3 show-up fee. After participants finished the game, we also
collected demographic variables, measures of domain knowledge and search experience. Prior
research has shown that these factors might influence how users form their queries on the web
(Holscher and Strube, 2000; Hsieh-Yee, 2001). However, we did not find significant variations on
these measures, which may be because our participants all had similar levels of knowledge and
search experience. Therefore, we did not use these variables in our analysis.

Evidence for Semantic-Based Search


We obtained results from N=108 participants recruited at a large university in the northeast of

the United States. We first calculate the total score across the 10 tasks for each participant, and
compare it to the best achievable score for that participant. Figure 2 displays the histogram of
participants percentage deviation from the optimal score. The large variation suggests there is
heterogeneity in participants query choices. We also report the average score across participants
for each task (i.e., set of words) and each round (i.e., position of the task in the game) in Figure 3. Figure 3(a) indicates that performance varies across tasks. Figure 3(b) shows very stable
3 We also re-ran these queries using different computers, and the optimal queries and results were mostly consistent.

15

performance over rounds, which suggests that participants did not learn over time.
[Insert Figures 2 and 3 Here]
We then analyze the actual queries formed by participants. Table 2 summarizes the distribution
of the length of participants queries, crossed with whether the query is optimal. Participants were
most likely to form queries with two words (56%), followed by one word (30%) and three (14%).
Overall, 24% of the queries were optimal. Queries were more likely to be optimal conditional on
having one word: 65% of the one-word query were optimal. These observations suggest that at
least some participants were able to leverage the semantic relationships between words to increase
their score, and were able to recognize some cases in which a single-word query was optimal.
Additional evidence in support for semantic-based search may be found by looking specifically at
words that were valued at $1. Recall that with probability 0.25, all three words were valued at $2,
and with probability 0.75, two words were valued at $2 and one was valued at $1. We find that in
21% of the cases in which one word was assigned a value of $1, participants formed a one-word
query containing the $1 word. In these situations, the participants favored the $1 word over both
$2 words, which would not be optimal under pure preference-based search. Similarly, when one
word was assigned a value of $1, 32% of the two-word queries contained the $1 word, which was
favored over the third word valued at $2.
However, one may wonder whether this pattern of results may be the result of participants
forming queries randomly. We find that participants formed shorter queries in tasks in which the
optimal query had only one word. The average query length was 1.76 when the optimal query had
one word, vs. 2.01 when the optimal query had two words (p-value< 0.01). Such pattern of results
is not consistent with participants forming queries completely randomly.
To further test for the existence of semantic-based search, we compare how frequently participants used each word when its value was $2 versus $1, depending on whether it was optimal to
use the word. We find that among all cases in which a word was valued at $1 and it was optimal
to use this word, the word was actually used in 65.19% of the queries. This proportion dropped
to 47.17% among cases in which a word was valued at $1 and it was NOT optimal to use it. A

16

Chi-square test reveals that these two proportions are significantly different (p-value< 0.01), confirming that at last some consumers have the ability to leverage semantic relationships in search.
However, the fact that the proportions are quite far from 100% and 0% respectively also suggests
that participants did not leverage semantic relationships to their full potential. Among all cases in
which a word was valued at $2 and it was optimal to use this word, the word was used in 63.95%
of the queries. This proportion dropped only slightly to 61.42% when considering cases in which
a word was valued at $2 and it was NOT optimal to use the word. The difference in proportions is
not significant (p-value= 0.20). The fact that the use of $2 words was not significantly affected by
whether it was optimal to use them suggests that despite the existence of semantic-based search,
preference-based search played a large role in driving query formation in our data.
Finally, we can compare how likely the same word was to be used when it was the trigger
(i.e., when it could activate the other two words) vs. not. In our design, five words were used in
two different tasks, and were triggers in only one of these tasks. For two out of these five words,
we observe a significant increase in the probability of being included in the query when the word
is a trigger vs. not (candy: 73% vs. 37%, p-value< 0.01; Easter: 94% vs. 69%, p-value< 0.01).
However there was no significant difference for two words (sugar: 43% vs. 37%, p-value= 0.40;
tomato: 35% vs. 35%, p-value= 1.00), and one word was actually significantly less likely to
be used when it was a trigger (cake: 57% vs. 67%, p-value< 0.05). This further suggests that
although semantic-based search exists, it may not be as prominent as is optimal. This also suggests
that consumers may have erroneous beliefs on semantic relationships, which we will address in
Sections 7 and 8.
[Insert Table 2 Here]
To sum up, we find that performance varies across participants and tasks, but not over time. The
behavior we observe suggests that participants are able to leverage semantic relationships between
words, at least to some extent. However, the choice of whether a given word should be included
in a query seems to have been largely driven by the value of this word, which is consistent with
preference-based search being dominant in our data. Because our study uses a somewhat artificial

17

lab setting, we do not claim that the extent to which consumers use semantic-based search in
the real world is the same as in our study. Instead, we view our results as proof of existence
that semantic-based search by consumers is relevant. We believe this evidence should be enough
to convince researchers and practitioners that they should consider semantic-based search when
building models that link queries to information needs.

Parsimonious Representations of Semantic Relationships


Our results so far suggest that it would be unreasonable for researchers or practitioners to build

models of query formation, or to attempt to learn consumers information needs from their queries,
without modeling how these queries are influenced by consumers beliefs on the relevant semantic
relationships.
These semantic relationships are captured by the parameters in the framework introduced in
Section 3, where qs is the probability that a randomly selected result from the top K results from
query q contains exactly the words in set s g. For a search task with three words, we have 24
unknown parameters, which are displayed in Table 3. Each row contains the probability of each
possible outcome for a given query, and the sum within each row is one. Because we find that the
top K results always contain the words in the search query (at least in our data), some outcomes
happen with probability zero or one for certain queries. Moreover, the empty set (i.e., no word is
found on the webpage) is an outcome that happens with 0 probability, and it is therefore omitted.
Unfortunately, the number of s grows exponentially with the number of relevant words.
Moreover, these parameters are unique to each domain. Therefore measuring them systematically
would be very computationally costly, and estimating consumers beliefs on these probabilities on
a large scale would be practically infeasible. In this section, we address this problem by discussing
three possible approximations of these semantic relationships, which are based on the IR literature
and which we test on actual data from Google. Note that in this section we focus on approximating
the true semantic relationships. We will study consumers beliefs on these semantic relationships

18

in Sections 7 and 8.
[Insert Table 3 Here]

6.1

Independence in Pages, Independence in Queries, and Symmetry Approximations

The first approximation, independence in pages, simplifies the semantic relationships from
being from queries to sets of words to being from queries to individual words. Let aq j denote the
activation probability of query q on word t j . It is defined as the probability of observing word t j
on the webpage of a random top result when submitting q. The number of possible aq j s is much
smaller than the number of possible qs s. The independence in pages approximation assumes
that the occurrence of a word in a page is independent of the other words on that same page. In that
case, qs may be approximated as the product of the probabilities that each word t j s appears
in the page and that each word in g/s does not appear in the page, i.e.,

qs aq j
js

(1 aqk ).

(3)

kg/s

A similar assumption has been used in the traditional IR models. For example, the probability of
relevance framework computes the relevance between a query and a document by assuming that the
document is represented as a vector of different features which are independent events (Robertson
and Zaragoza, 2009).
The second approximation, independence in queries, simplifies the activation probabilities
further from being from queries to individual words, to being from individual words to individual
words. It assumes that the activation probabilities from different words to the same target word are
independent from each other, and that the order of these words in a query does not matter. This
is similar to the bag-of-word approach that is commonly used in IR models (see Section 2.2.2),
combined with the assumption that different terms within a query are independent to each other,
which for example has been assumed in the probability of relevance framework (Robertson and
19

Zaragoza, 2009) and the language model Unigram (Zhai, 2008). Mathematically, the independence in queries approximation implies:

aq j 1 (1 ak j ),

(4)

tk q

where the activation probability ak j is defined from word tk to word t j , i.e., it is the probability
that a top result contains word t j given the one-word query tk . The intuition behind Equation (4) is
that the probability of activating a target word equals the probability that at least one of the words
in the query activates this word.
Given both the independence in pages and independence in queries approximations, the semantic relationships relevant to query formation can be specified based on a directed graph where
nodes represent words, and edges represent asymmetric activation probabilities between pairs of
words. These approximations are extremely convenient, as they reduce the number of parameters
dramatically. With W words, the number of relevant semantic relationships is in the order of 22W
(the number of possible queries is in the order of 2W and the number of possible sets of words is
2W ), i.e., it grows exponentially with W . With the two independence assumptions, these semantic
relationships may be expressed as function of only W (W 1) asymmetric activation probabilities.
That is, the number of asymmetric activation probabilities at the word level grows only polynomially with W . For example, for a domain with three words, we only need six parameters (a12 ,
a21 , a13 , a31 , a23 , a32 ). Here we assume that akk = 1, i.e., all top K results from a
single-word query contain that word (which is true in our data). This assumption may be relaxed,
in which case the number of asymmetric activation probabilities would be W 2 .
The third and last approximation, symmetry, simplifies the activation probabilities further by
assuming that they are symmetric, i.e., a j j0 a j0 j . If all three approximations were valid, the
relevant semantic relationships could be approximated by functions of only

W (W 1)
2

parameters.

Note that as far as we know, this assumption has not been explicitly used in traditional IR models.

20

6.2

Empirical Test of Approximations

We test the three approximations described above using the true semantic relationships on
Google from Study 1. Specifically, we compute qs , the actual proportion of the top 10 results
from query q that contain exactly the words in set s.
To test the independence in pages approximation, we compute aq j , the proportion of the
top 10 results from query q that contain word t j , for all query-word combinations in Study 1. We
estimate a linear regression model where the dependent variable is the true qs , and the regressor is the approximation qs (ind pages) = js aq j kg/s (1 aqk ), including an intercept.
Because the dependent variable is constrained to be between 0 and 1, we constrain the coefficients
to be between 0 and 1, and their sum to be no greater than 1. None of these constraints are binding, and we ignore these constraints when computing confidence intervals on the coefficients. To
compute confidence intervals, we take into account the fact that observations from the same task
are not independent from each other. For instance, {1,2}3 is likely to be correlated with {1}3 .
This kind of correlation between observations within a group is also called an intraclass correlation, which will cause the standard errors of the estimates from regular ordinary least square to
be biased. We correct for this using clustered robust standard errors (Rogers, 1994). We find that
the estimated model has an intercept 0.0131 (p-value<0.02) and a slope 0.9552 (p-value<0.01).
The R2 of the linear model is 0.907. Therefore, we can conclude that at least in this dataset, the
independence in page approximation seems to be valid.
Given that the independence in pages approximation appears to be valid, next we consider
the approximated probabilities based on both the independence in pages and the independence
in queries approximations. The approximated semantic relationships become qs (ind pages +
ind queries) = js aq j kg/s (1 aqk ), where aq j = 1 tk q (1 ak j ) and ak j is the
proportion of the top 10 results from query tk that contain word t j . We then repeat the same regression analysis as above, using qs (ind pages + ind queries) instead of qs (ind pages). We
find that the fitted regression line has an intercept 0.0181 (p-value=0.40) and a slope 0.9418 (pvalue<0.01). The R2 of the linear model is 0.782. These results suggest that the approximation
21

qs (ind pages + ind queries) also fits the truth well. Therefore, at least in this dataset, the independence in pages and independence in queries approximations appear to be jointly valid.
Therefore, the parameters needed to capture the semantic relationships may be reduced to a
much smaller set of pairwise activation probabilities at the word level. For these simple relationships, we finally explore the symmetry approximation. For each pair of words t j and t j0 , we
compare a j j0 to a j0 j . With three pairs per task, we have 30 pairs to compare in total. A paired
0

two-sample t-test would not be appropriate here, because the labeling of j vs. j is arbitrary, i.e.,
observations are not naturally split into two samples. Instead, we compare the maximum activation
probability max{a j j0 , a j0 j } to an activation probability that is randomly selected between a j j0
and a j0 j . In other words, we consider a hypothetical user who would need to select one word
in order to activate both, and compare the performance achieved when using the optimal word vs.
choosing one word randomly (where each word has an equal probability of being chosen). We use
a bootstrapping approach, where at each iteration we randomly draw 30 pairs of words with replacement. We compute the average activation probability for the two samples (max{a j j0 , a j0 j }
vs. the randomly-selected one) at each iteration. Figure 4 displays the sample distribution with
1,000 bootstrapping iterations. We see a large difference between the optimal sample (solid line)
and the random sample (dashed line). This suggests that it would be incorrect to model semantic
relationships based on symmetric activation probabilities.
[Insert Figure 4 Here]
In sum, we find that the independence in pages and independence in queries approximations seem to be valid, but not the symmetry approximation. That means, the set of semantic
relationships may be approximated as functions of asymmetric activation probabilities between
individual words. Imposing such structure on the semantic relationships enables us to parameterize the semantic relationships more parsimoniously, reducing the dimensionality from being in the
order of 22W to W (W 1). This will help us estimate consumers beliefs on these relationships
in the next section. We note that while we focused on the top K =10 search results per query in
this analysis, we also tested these approximations based on the top 30 and 50 results, and reached

22

the same conclusions. Details are available from the authors. We also note that we focus on short
queries in this paper (which have been shown to be more common, see Section 2.2), and these
approximations may need to be tested again on longer queries.

Consumers Beliefs on Activation Probabilities


The previous section suggested that the complex set of semantic relationships relevant to

semantic-based search may be parameterized parsimoniously based on asymmetric activation probabilities at the level of individual words, rather than sets of words. The last key step toward being
able to build models of query formation is to specify consumers beliefs on these asymmetric activation probabilities. In particular, if we find that consumers beliefs are close to the truth, it may
be possible to build models that simply assume consumers beliefs are correct, which would address the issue of empirically identifying beliefs from information needs. Alternatively, if we find
that beliefs deviate from the truth in some systematic ways, it may be possible to build models
that express consumers beliefs as parsimonious functions of the truth, and that jointly estimate
consumers information needs and the (small) set of parameters on which their beliefs depend.
In this section we explore this issue by estimating users beliefs based on their query choices
in Study 1. Empirically identifying beliefs from information needs is not an issue in this particular
study, because we manipulated information needs exogenously. In the next section, we report the
results of another experiment in which we measured these beliefs directly.

7.1

Estimating Users Beliefs Based on Their Queries: Empirical approach

In order to estimate participants beliefs on activation probabilities in Study 1, we specify


a choice model that captures query formation as an outcome of the participant maximizing their
expected payoff from each task, given their beliefs on activation probabilities. In this choice model,
the utility of a query, U(q|i , {a j j0 }, ), is the expected value of the best score among the top
K = 10 results retrieved by the query, given the (known) preference vector i , the users belief on
23

the set of activation probabilities {a j j0 }, and a risk parameter > 0 ( = 1 implies risk neutrality,
< 1 risk aversion, and > 1 risk seeking). This utility is derived based on Equation (2), which
specified the expected value of one random top result retrieved by a query. That is, we derive the
closed-form expression for the expected value of the best result, given the expected value of each
result. Details are provided in Appendix 2.
Based on this utility function, we model the formation of a search query in our study using a
multinomial logit model. The probability of choosing query q among Q possible queries can then
be expressed as:


exp U(q|i , {a j j0 }, )


Pr(q|i , {a j j0 }, , ) =
0
Q
0 exp U(q |i , {a j j0 }, )

(5)

q =1

where is a logit scale parameter. In our data, the parameters i are known as they were selected
by us and communicated to the participants. The only parameters to estimate are the logit scale
parameter , the risk parameter , and the beliefs {a j j0 }.
Each pair of words appeared in only one task. Therefore, the likelihood function is separable
in the parameters {a j j0 }, i.e., each parameter enters into the likelihood corresponding to one
task only. Accordingly, we estimate the model for each of the 10 tasks separately. Because each
participant played each task only once, we impose some structure on the heterogeneity across
participants in order to estimate the beliefs at the individual level. We capture heterogeneity across
participants using a latent class approach. For a given task, the likelihood function with S segments
of beliefs can be expressed as:
I

L() = s
i=1 s=1

"

d
Pr q|i, {asj j0 }, , iq

(6)

q=1

where I is the number of participants, s is the share of segment s (for identification purposes
we assume s is decreasing in s, i.e., the first segment has the largest share), diq {0, 1} denotes
whether participant i formed query q, and {as

j j

} are the beliefs in segment s. For each task, the

24

beliefs are captured by six parameters a j j0 for j 6= j {1, 2, 3}. We estimate the parameters by
maximizing the above likelihood function, while constraining all the parameters a [0, 1].

7.2

Estimation Results

We use the AIC to select the optimal number of segments for each task. In Table 4, we report the
AIC for up to three segments. We can see that S = 1 is optimal for task 4, 5, and 10; whereas S = 2
is optimal for all the remaining tasks. We present the parameter estimates from the best model for
each task in Table 5. It seems that overall, participants tend to be slightly risk-averse. For the seven
tasks with two segments, Segment 1 (the larger segment) has relatively weaker beliefs (i.e., the a
parameters have lower values) and a share around 65-85%; and Segment 2 has stronger beliefs. We
also compare the estimated beliefs on activation probabilities for the same pair of words in both
directions. It seems that for some pairs of words (e.g., Task 4 Segment 1) the beliefs are highly
asymmetric, while for others (e.g., Task 1 Segment 1) the beliefs are closer to being symmetric.
[Insert Tables 4 and 5 Here]

7.3

Accuracy of Consumers Beliefs and Systematic Biases

We now turn to the question of whether consumers beliefs are accurate, and whether there
exist systematic biases in these beliefs. Based on the estimates in Table 5, we calculate the posterior
estimate of each participants belief on each activation probability in each task. This gives us 6,480
observations: 6 activation probabilities per task for 10 tasks from a total of 108 participants. The
distribution across all observations of the difference between the estimated belief and the truth is
presented in Figure 5. Positive values mean that participants beliefs are larger than the truth. The
distribution is slightly right-tailed, with a mean of 0.0753 and a standard deviation of 0.2450.
[Insert Figure 5 Here]
Next we explore systematic biases in consumers beliefs. Aitchison (2012) suggests that humans form semantic relationships from their mental lexicons which are developed through education, experience, and context, based on word similarity in meaning or sound. Hence, if two
25

words share strong similarity, consumers will form strong associations between them, no matter
which one is the target word in a directional relationship. This suggests that consumers beliefs
on activation probabilities may not be asymmetric enough, which is consistent with mental distance approaches in cognitive psychology (Shepard, 1962). Mental distance approaches assume
that concepts may be represented as points within a mental space. Similarity between concepts is
related to the distance between them in that space, which is symmetric. For example, if a consumer
feels that Coke is similar to Pepsi, then it follows that they must feel Pepsi to be equally similar
to Coke. However, other theories, such as featural approaches (Tversky, 1977), suggest that consumers representations are asymmetric. For example, Johnson (1986) found that a consumer may
find Coke to be very similar to Pepsi and at the same time Pepsi to be less similar to Coke. In sum,
we may expect consumers beliefs to have some level of asymmetry, but to be less asymmetric
than the truth.
To test and quantify such bias, we estimate a model that specifies a users belief a j j0 as a
function of the truth a j j0 and the relationship in the opposite direction a j0 j . This model is
developed based on the truth and bias (T&B) model of judgment proposed by West and Kenny
(2011). In addition to the potential bias toward a j0 j , we allow for bias toward the ends of the
scale. Accordingly, we build the following constrained linear model, where consumers are indexed
by i:
ai, j j0 = b0 0 + b1 1 + btruth a j j0 + bopposite a j0 j + i j j0 .

(7)

In this model, the parameters b0 and b1 capture directional bias toward 0 and 1 respectively; the
parameter btruth denotes what West and Kenny (2011) refer to as the truth force; and the parameter
bopposite to what they refer to as a bias force. The random error term i j j0 is assumed to have a mean
zero. We impose the following constraint: b0 + b1 + btruth + bopposite = 1, and all the coefficients
are non-negative. They guarantee that the fitted beliefs will fall into [0,1] in expectation.
We estimate this model by minimizing the sum of squared residuals subject to the above constraint on the parameters, based on the beliefs estimated in Study 1. We treat the estimation problem as a constrained quadratic programming problem. As quadratic programming can only give
26

us a point estimate, we use bootstrapping to estimate the standard errors of the coefficients. One
issue here again is that observations are not independent. Specifically, all observations related to
the same pair of words are correlated with each other. In order to preserve such nested correlation
structure in resampling, we adopt block bootstrapping for clustered data (Field and Welsh, 2007;
Ren et al., 2010). We sample among the 30 pairs of words with replacement, and for each pair
consider all observations across participants in both directions. We then obtain the solution to the
quadratic programming for this resampled data. Our inference is based on the parameter estimates
from 1,000 bootstrap iterations. We find that

ai, j j0 = 0.1051 + 0.7442a j j0 + 0.1005a j0 j + i j j0 ,


and b0 = 0.0502. All the coefficients are statistically significant (p-value<0.01). These results
indicate that participants beliefs have significant directional bias toward both ends of the scale,
especially toward 1, which is consistent with the observations in Figure 5. While the truth force
plays the dominant role in the model, the bias force toward the activation probability in the opposite
direction also seems to have significant impact on consumers beliefs. The residual of the fitted
model has a mean -0.0012 with a standard deviation 0.2669. The mean square error is 0.0712, and
the mean absolute value of the residuals is 0.1881.

Study 2
Our findings so far may be summarized as follows. Consumers leverage semantic relationships

between words at least to some extent when forming search queries. Moreover, the relevant semantic relationships may be simplified as functions of asymmetric activation probabilities between
individual words. When compared to the true activation probabilities, consumers beliefs appear
to be slightly biased upward, and to be not asymmetric enough. In Study 2 we test this last result further by measuring consumers beliefs directly (in an incentive-aligned manner), instead of
inferring them from query choices as in Study 1.
27

8.1

Design

We measured participants beliefs on activation probabilities directly, for example by asking


them to estimate how many of the top 10 search results from the query egg on Google contain the
word fish. Again, we told participants that a search result contains the word if it appears on the
actual page, and not just the description provided by Google. Participants chose a number between
0 and 10 as their answers, i.e., they entered their best guess of the number of results containing
the target word. We formed 30 pairs of words using the same words as in Study 1, giving us 60
possible questions. We chose the pairs of words based on the true activation probabilities, to have a
large range of a j j0 and a large range of |a j j0 a j0 j | across pairs. Each participant answered 30
questions that were randomly selected from the pool. After participants answered all 30 questions,
we presented them with the correct answers for all these questions (all at once). The correct answer
was derived using the same approach as Study 1. At end of the survey, we collected demographics,
and also asked participants to describe how they made decisions for these questions. This study
was incentive-aligned. In addition to a $3 show-up fee, each participant won a $0.20 cash bonus
for each correct answer.

8.2

Results

We conducted this study in a lab experiment with N=206 participants. We first analyze participants introspective statements about how they selected their answers. The most commonly used
words in these statements include relationship, correlations, related, association, similarity, and match. Very few participants mentioned directionality as a factor. Hence, it seems
that participants made choices by relying mostly on the general association and similarity between
words, rather than the directional relationship. This finding is consistent with our previous findings
that consumers beliefs on activation probabilities are not asymmetric enough.
Among the 60 questions we selected, 0 is the correct answer for about one third. Hence,
choosing 0 could be an attractive strategy for participants in this study. However, we found that
all participants gave at least three different answers across questions. Therefore, no participant
28

blindly selected 0 as their answers to all questions.


We now compare participants beliefs to the true activation probabilities. The distribution
(across participants and pairs) of the difference between participants beliefs and the truth is displayed in Figure 6. The distribution has a mean of 0.1297 and a standard deviation of 0.3005. Note
that the difference has a larger positive mean than was found in Study 1, i.e., the upward bias is
more severe. We again estimate the T&B model (7) based on participants answers in Study 2,
using quadratic programming and block bootstrapping as we did in Study 1. The estimation results
below is based on 1,000 bootstrap iterations:

ai, j j0 = 0.1549 + 0.5481a j j0 + 0.2960a j0 j + i j j0


and b0 = 0.0010. All the coefficient estimates are statistically significant (p-value<0.01). The
fitted error term has a mean -0.0076 with a standard deviation 0.2818. The mean square error is
0.0795, and the mean absolute value of the residuals is 0.2338. We can see that there is a larger bias
toward 1 compared to Study 1, which is consistent with what we observed in Figure 6. Compared
to Study 1, we also find an even stronger effect of the opposite relationship a j0 j on participants
beliefs, which is also consistent with participants self-statements on how they made decisions.
Therefore, the results of Study 2 are directionally similar to those in Study 1, although the bias was
more severe in Study 2.
[Insert Figure 6 Here]

Conclusions
To the best of our knowledge, our study is the first to document semantic-based search by

consumers on online search engines. We find that consumers do not necessarily form queries that
simply reflect their information needs. Rather, they have the ability to leverage semantic relationships in order to improve the efficiency of their queries. We show that the relevant semantic
relationships may be approximated parsimoniously as functions of asymmetric activation proba29

bilities at the word level. We further show that consumers beliefs on these activation probabilities
tend to be biased upward, and that they are not asymmetric enough.
The combination and integration of these results suggest a framework for building models of
query formation. In particular:
Models of query formation should capture semantic-based search by consumers.
In order to capture semantic-based search, it is necessary to capture consumers beliefs on
a set of relevant semantic relationships. The set of relevant semantic relationships grows
exponentially with the number of relevant words.
These relevant semantic relationships may be approximated as functions of asymmetric activation probabilities at the level of individual words, based on the independence in pages
and independence in queries approximations. This set of asymmetric activation probabilities grows only polynomially with the number of relevant words.
Consumers beliefs on the asymmetric activation probabilities may be specified as a function
of an intercept, the true activation probabilities, and the activation probabilities in the other
direction.
This proposed framework, by parameterizing beliefs very parsimoniously, allows building
models that empirically identify information needs from beliefs, at least parametrically. In this
paper we modeled consumers beliefs based on the truth and bias (T&B) model (West and Kenny,
2011). Future research may explore alternative modeling frameworks and identify additional covariates that influence consumers beliefs.
More generally, we hope our research will facilitate the development of marketing search models that go beyond discrete choices to incorporate text-based search. In particular, in todays environment search is primarily text-based, and marketing models of search should be adapted to
capture this reality. It is important to keep in mind that despite being ubiquitous, online search
queries are often only one type of behavior over the whole search path. Therefore, future search
models in marketing may combine text-based search with discrete-choice based search.
30

Future research may also explore the extent to which models of query formation need to be
specific to each search engine. We focused on Google as it is by far the most common search
engine with a worldwide market share of 88.44% (Statista, 2015b). However, if consumers adjust
their query formation strategies from one search engine to the other, semantic-based search may
be more or less relevant across search engines. In addition, the approximations of the semantic
relationships that we tested on Google would need to be verified on alternative search engines, and
the bias in consumers beliefs on activation probabilities may vary across search engines. Similarly,
our research could be extended to other online websites that have a search function, e.g., YouTube
and Amazon.
Finally, future research may combine our approach with more complex language models. In
our analysis, for practical reasons we assumed that the value of a webpage was a linear and additive
function of dummy variables capturing the presence of individual words. The framework outlined
above is compatible with any alternative specification. However, one challenge with field data is
that the number of relevant words may be very large, giving rise to an unmanageable number of
parameters to estimate. This dimensionality may be reduced, using for example topic modeling
(Blei et al., 2003; Tirunillai and Tellis, 2014). In that case, semantic relationships and activation
probabilities may be specified at the level of topics rather than individual words, using a similar
approach as the one outlined above. In particular, the relevant semantic relationships would describe the link between the topic distribution in the search query and the topic distribution in the
search results; and activation probabilities may be replaced with parameters that capture how the
weight on a given topic in a query influences the weight of another topic in the search results.
Going back to the IR literature, such ideas can also be used to extend the existing semantic
matching algorithms (Kurland and Lee, 2004; Wei and Croft, 2006), by relaxing the assumption
that the topic distribution in a query is a direct representation of the users information needs.
Instead, the topic distribution that captures the users true underlying information needs may be
inferred from the query, and documents may be found that match that topic distribution rather than
the querys topic distribution.

31

References
Aitchison, J. (2012). Words in the mind: An introduction to the mental lexicon. John Wiley &
Sons.
Azzopardi, L., D. Kelly, and K. Brennan (2013). How query cost affects search behavior. Proceedings of the 36th international ACM SIGIR conference on Research and development in
information retrieval, 2332.
Berners-Lee, T., J. Hendler, O. Lassila, et al. (2001). The semantic web. Scientific american 284(5),
2837.
Blei, D. M., A. Y. Ng, and M. I. Jordan (2003). Latent dirichlet allocation. the Journal of machine
Learning research 3, 9931022.
Broder, A. (2002). A taxonomy of web search. In ACM Sigir forum, Volume 36, pp. 310. ACM.
Collins, A. M. and E. F. Loftus (1975). A spreading-activation theory of semantic processing.
Psychological review 82(6), 407.
Dzyabura, D. (2013). The role of changing utility in product search. Available at SSRN 2202904.
u, and J. Strebel (2005). Learning about computers: An analysis
Erdem, T., M. P. Keane, T. S. Onc
of information search and technology choice. Quantitative Marketing and Economics 3(3),
207247.
Field, C. A. and A. H. Welsh (2007). Bootstrapping clustered data. Journal of the Royal Statistical
Society: Series B (Statistical Methodology) 69(3), 369390.
Fleishman-Hillard (2012).

2012 fleishman-hillard digital influence index.

Available at

http://www.harrisinteractive.com.
Fu, W.-T. and P. Pirolli (2007). Snif-act: A cognitive model of user navigation on the world wide
web. HumanComputer Interaction 22(4), 355412.
32

GE (2013).

Ge capital retail bank major purchase shopper study.

Available from

http://www.businesswire.com/ .
Guha, R., R. McCool, and E. Miller (2003). Semantic search. In Proceedings of the 12th international conference on World Wide Web, pp. 700709. ACM.
Holscher, C. and G. Strube (2000). Web search behavior of internet experts and newbies. Computer
networks 33(1), 337346.
Hsieh-Yee, I. (2001). Research on web search behavior. Library & Information Science Research 23(2), 167185.
Hui, S. K., E. T. Bradlow, and P. S. Fader (2009). Testing behavioral hypotheses using an integrated model of grocery store shopping path and purchase behavior. Journal of consumer
research 36(3), 478493.
Jansen, B. J., D. Booth, and B. Smith (2009). Using the taxonomy of cognitive learning to model
online searching. Information Processing & Management 45(6), 643663.
Jansen, B. J., D. L. Booth, and A. Spink (2008). Determining the informational, navigational, and
transactional intent of web queries. Information Processing & Management 44(3), 12511266.
Jansen, B. J., A. Spink, A. Pfaff, and A. Goodrum (2000). Web query structure: Implications for ir
system design. In Proceedings of the 4th World Multiconference on Systemics, Cybernetics and
Informatics (SCI 2000), pp. 169176.
Jeziorski, P. and I. Segal (2010). What makes them click: Empirical analysis of consumer demand for search advertising. Technical report, Working papers//the Johns Hopkins University,
Department of Economics.
Johnson, M. D. (1986). Consumer similarity judgments: A test of the contrast model. Psychology
& Marketing 3(1), 4760.

33

Kamvar, M. and S. Baluja (2006). A large scale study of wireless search behavior: Google mobile
search. In Proceedings of the SIGCHI conference on Human Factors in computing systems, pp.
701709. ACM.
Khoo, C. S. and J.-C. Na (2006). Semantic relations in information science. Annual Review of
Information Science and Technology (40), 157228.
Kim, J. B., P. Albuquerque, and B. J. Bronnenberg (2010). Online demand under limited consumer
search. Marketing Science 29(6), 10011023.
Kurland, O. and L. Lee (2004). Corpus structure, language models, and ad hoc information retrieval. In Proceedings of the 27th annual international ACM SIGIR conference on Research
and development in information retrieval, pp. 194201. ACM.
Li, H. and J. Xu (2013). Semantic matching in search. Foundation and Trends in Informational
Retrieval 7(5), 343469.
Mangold, C. (2007). A survey and classification of semantic search approaches. International
Journal of Metadata, Semantics and Ontologies 2(1), 2334.
Manning, C. D., P. Raghavan, and H. Schutze (2008). Introduction to information retrieval, Volume 1. Cambridge university press Cambridge.
Narayanan, S. and K. Kalyanam (2011). Measuring position effects in search advertising: A
regression discontinuity approach. Technical report, Working Paper.
Park, J. and H. Chung (2009). Consumers travel website transferring behaviour: analysis using
clickstream data-time, frequency, and spending. The Service Industries Journal 29(10), 1451
1463.
Pirolli, P. and S. Card (1999). Information foraging. Psychological review 106(4), 643.
Pirolli, P. L. (2007). Information foraging theory: Adaptive interaction with information. Oxford
University Press.
34

Raaijmakers, J. G. and R. M. Shiffrin (1981). Search of associative memory. Psychological review 88(2), 93.
Ren, S., H. Lai, W. Tong, M. Aminzadeh, X. Hou, and S. Lai (2010). Nonparametric bootstrapping
for hierarchical data. Journal of Applied Statistics 37(9), 14871498.
Robertson, S. and H. Zaragoza (2009). The probabilistic relevance framework: BM25 and beyond.
Now Publishers Inc.
Roelleke, T. (2013). Information retrieval models: Foundations and relationships. Synthesis Lectures on Information Concepts, Retrieval, and Services 5(3), 1163.
Rogers, W. (1994). Regression standard errors in clustered samples. Stata technical bulletin 3(13).
Rose, D. E. and D. Levinson (2004). Understanding user goals in web search. In Proceedings of
the 13th international conference on World Wide Web, pp. 1319. ACM.
Ruthven, I. (2003). Re-examining the potential effectiveness of interactive query expansion. In
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pp. 213220. ACM.
Salton, G. and M. J. McGill (1986). Introduction to modern information retrieval.
Santos, R. L., C. Macdonald, and I. Ounis (2015). Search result diversification. Foundations and
Trends in Information Retrieval 9(1), 190.
Shepard, R. N. (1962). The analysis of proximities: Multidimensional scaling with an unknown
distance function. i. Psychometrika 27(2), 125140.
Shi, S. W. and M. Trusov (2013). The path to click: Are you on it? working paper.
Spink, A., D. Wolfram, M. B. Jansen, and T. Saracevic (2001). Searching the web: The public and
their queries. Journal of the American society for information science and technology 52(3),
226234.
35

Statista (2015a). Digital marketing spending in the united states from 2014 to 2019. Available from http://www.statista.com/statistics/275230/us-interactive-marketing-spending-growthfrom-2011-to-2016-by-segment.
Statista (2015b).

Global market share of search engines 2010-2015.

Available from

http://www.statista.com/statistics/216573/worldwide-market-share-of-search-engines/ .
Tirunillai, S. and G. J. Tellis (2014). Mining marketing meaning from online chatter: Strategic
brand analysis of big data using latent dirichlet allocation. Journal of Marketing Research 51(4),
463479.
Tversky, A. (1977). Features of similarity. Psychological Review 84(4), 327352.
Wei, X. and W. B. Croft (2006). Lda-based document models for ad-hoc retrieval. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in
information retrieval, pp. 178185. ACM.
West, T. V. and D. A. Kenny (2011). The truth and bias model of judgment. Psychological
review 118(2), 357.
Wu, W.-C., D. Kelly, and A. Sud (2014). Using information scent and need for cognition to understand online search behavior. In Proceedings of the 37th international ACM SIGIR conference
on Research & development in information retrieval, pp. 557566. ACM.
Yang, L., O. Toubia, and M. G. De Jong (2015). A bounded rationality model of information
search and choice in preference measurement. Journal of Marketing Research 52(2), 166183.
Yoganarasimhan, H. (2015). Search personalization using machine learning. Available at SSRN
2590020.
Zhai, C. (2008). Statistical language models for information retrieval. Synthesis Lectures on
Human Language Technologies 1(1), 1141.

36

Tables
Table 1: Search Tasks in Study 1 and Optimal Queries
Task
1
2
3
4
5
6
7
8
9
10

t1
candy
fish
milk
Easter
tomato
Easter
sugar
egg
cake
ketchup

t2
caffeine
tea
cheese
candy
drink
caffeine
cake
candy
cheese
cake

t3
sugar
tomato
tea
egg
pizza
ketchup
pizza
drink
Easter
tomato

Optimal Query
candy
two words*
milk
Easter
tomato
two words*
sugar
two words*
cake
ketchup

Note: For the seven tasks in which the optimal query has a single word,
this trigger word is labeled as t1 . In the study, words were always shown
to participants in a random order. * indicates that the optimal query depends on the value assigned to each word.

Table 2: Number of Queries with Different Lengths and Optimality in Study 1


Query Length

Not Optimal

Optimal

Row Total

Percentage

1
2
3
Column Total
Percentage

180
498
147
825
76%

149
106
0
255
24%

229
604
147
1,080

30%
56%
14%

37

100%

Table 3: Original Parameterization of Semantic Relationships for a Task g = (t1 ,t2 ,t3 )
Query
t1
t2
t3
t1 t2
t2 t1
t1 t3
t3 t1
t2 t3
t3 t2
t1 t2 t3
t1 t3 t2
t2 t1 t3
t2 t3 t1
t3 t1 t2
t3 t2 t1

{t1 }
11
0
0
0
0
0
0
0
0
0
0
0
0
0
0

{t2 }
0
22
0
0
0
0
0
0
0
0
0
0
0
0
0

{t3 }
0
0
33
0
0
0
0
0
0
0
0
0
0
0
0

Possible Outcomes
{t1 ,t2 }
{t1 ,t3 }
1{1,2}
1{1,3}
2{1,2}
0
0
3{1,3}
(1,2){1,2}
0
(2,1){1,2}
0
0
(1,3){1,3}
0
(3,1){1,3}
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

{t2 ,t3 }
0
2{2,3}
3{2,3}
0
0
0
0

{t1 ,t2 ,t3 }


1{1,2,3}
2{1,2,3}
3{1,2,3}
(1,2){1,2,3}
(2,1){1,2,3}
(1,3){1,2,3}
(3,1){1,2,3}
(2,3){1,2,3}
(3,2){1,2,3}
1
1
1
1
1
1

(2,3){2,3}
(3,2){2,3}
0
0
0
0
0
0

Note: with three words, there are 15 possible queries (ordered sets of words) and 7 possible unordered sets of
words to be found on the result pages. qs is the probability of activating exactly the words in s with query q.

Table 4: AIC for Different Tasks and Numbers of Segments in Study 1


aa
a Task
# Seg.aaa
a

10

1
2
3

510
493
502

555
515
526

532
515
529

384
419
423

469
471
497

565
504
517

487
472
487

531
509
559

550
542
555

454
464
470

38

39

0.019
0.115
0.084
0.088
0.130

a21
a13
a31
a23
a32

0.399
0.774
1.035
4.629

0.123
0.985
0.178
0.989
0.544
0.668
0.896
6.930

a21
a13
a31
a23
a32
1

0.079

0.283

0.436

0.998

0.130

0.269

0.013

0.017

0.042

0.027

0.010

0.056

Task 2

a12

Segment 2

0.000

a12

Segment 1

Task 1

5.793

0.904

0.715

0.103

0.097

0.995

0.201

0.997

0.694

0.067

0.006

0.081

0.092

0.095

0.047

Task 3

3.113

0.838

0.000

0.144

0.075

0.879

0.020

0.870

Task 4

3.168

0.863

0.178

0.117

0.821

0.019

0.077

0.999

Task 5

6.671

0.938

0.762

0.101

0.163

0.999

0.428

0.642

0.278

0.001

0.025

0.000

0.049

0.073

0.070

Task 6

6.384

0.934

0.843

0.999

0.842

0.215

0.909

0.915

0.964

0.061

0.028

0.000

0.086

0.152

0.023

Task 7

5.586

0.985

0.788

0.122

0.244

0.999

0.961

0.430

0.148

0.016

0.071

0.041

0.007

0.034

0.055

Task 8

Table 5: Estimates of Participants Beliefs for Each Task in Study 1

6.385

0.882

0.783

0.097

0.105

0.994

0.960

0.992

0.111

0.042

0.085

0.128

0.036

0.023

0.086

Task 9

2.860

1.014

0.078

0.099

0.127

0.997

0.003

0.061

Task 10

Figures

(a)

(b)

(c)

Figure 1: Search Query Game Interface in Study 1


Figure (a) is the game interface where a participant forms a search query given the set of words, their
values ($1 or $2 per word) and costs ($1 per word). The participant decides which word(s) to use
and in which order. Figure (b) and (c) show the screens the participant will see after submitting the
queries tea cheese (b) and milk (c). The participant is shown the search result with the highest
score (score=value-cost), the list of relevant words found on its webpage, and the corresponding
score.

40

Figure 2: Distribution across Participants of Deviation from


Optimal Total Score - Study 1
We calculate the total score across the 10 tasks for each participant, and compare it to the best
achievable score for that participant.

(a)

(b)

Figure 3: Average Performance across Tasks and Rounds in Study 1


We compute the average score across participants for each task (i.e., set of words) and each round
(i.e., position of the task). Figure (a) indicates that performance varies across tasks. Figure (b) shows
very stable performance over rounds, which suggests that participants did not learn over time.

41

Figure 4: Testing the Symmetry of Activation Probabilities in Study 1


For each pair of words, we compare the maximum activation probability max{a j j0 , a j0 j } to an
activation probability that is randomly selected between a j j0 and a j0 j . We use a bootstrapping
approach (with 1,000 iterations), where at each iteration we randomly draw 30 pairs of words with
replacement. We compute the average activation probability for the two samples (max{a j j0 , a j0 j }
vs. the randomly-selected one) at each iteration. We see a large difference between the optimal
sample (solid line) and the random sample (dashed line). This suggests that it would be incorrect to
model semantic relationships based on asymmetric activation probabilities.

42

Figure 5: Distribution across Participants and Pairs of Words of


Deviation from Truth - Study 1
Based on the estimates in Table 5, we calculate the posterior estimates of each participants beliefs
on each activation probability in each task, and compare the estimated beliefs to the truth.

Figure 6: Distribution across Participants and Pairs of Words of


Deviation from Truth - Study 2
We compare participants beliefs (measured directly) to the true activation probabilities.

43

Appendix 1: Instruction Page for Search Query Game

44

Appendix 2: Choice Model Derivations


Note that the following analysis is specific to a given task g.
Let Y q = {y1 , ..., yK } be random variables that denote the value of the top K results retrieved
by query q. Each result contains one of the seven possible sets of words shown in Table 3. Given
the (known) preference vector i for a consumer i, we can calculate the value vs (i ) corresponding to each possible set s of words, and order these values from the smallest to the largest as
{v[1] (i ), v[2] (i ), ..., v[N] (i )}, where N is the number of unique values among the seven cases (N
is less than 7 if some sets of words have the same value). The expected utility from query q is
therefore written as a function of the expected score of its best result, i.e.,
N

U(q|i , {a j j0 }, ) =




0
Pr
max{y
,
...y
}
=
v
(
)|q,
{
a

}
v
(
)

c(q)
K
1
[n] i
[n] i

j j

(8)

n=1

where, a j j0 is the consumers belief on the activation probability from word t j to word t j0 , > 0
is a risk parameter ( = 1 implies risk neutrality, < 1 risk aversion, and > 1 risk seeking), and
the cost c(q) equals the length of query q (recall that in our game the cost of a query was equal to
its number of words).

In order to compute the expected utility from query q, we need to compute Pr max{y1 , ...yK } =

v[n] (i )|q, {a j j0 } for each possible value v[n] (i ). That is, we need to compute the probability
distribution of the score of the best result from query q.
First, based on the independence in pages and independence in queries approximations, we
express consumers beliefs on the semantic relationships at the level of sets of words { qs }, as a
function of the beliefs at the word level {a j j0 }. This approximation is given by Equation (3) and
(4). Let fn denote the probability that a random top result from query q has value v[n] (i ) based
on these beliefs (let f0 = 0). We compute this probability by simply summing qs over the sets s
whose value vs (i ) equals to v[n] (i ), i.e.:

fn = qs I vs (i ) = v[n] (i )
s

45

(9)

We can then write the cumulative density function of max{y1 , ...yK } as:


Pr max{y1 , ...y10 } v[n] (i )|q, {a j j0 } =

fi

K

(10)

in

That is, the probability that the best search result has value less than or equal to v[n] (i ) is the
probability that all K search results have value less than or equal to v[n] (i ). Given this cumulative
density function, we can now derive the probability that the best result has exactly a value of
v[n] (i ) as:


Pr max{y1 , ...yK } = v[n] (i )|q, {a j j0 } =

fi

in

K

fi

K

(11)

in1

Plugging Equation (11) into Equation (8) provides a closed-form expression of the expected
utility from each possible query.

46

Vous aimerez peut-être aussi