Vous êtes sur la page 1sur 14

Knowledge-Based Systems 26 (2012) 225–238

Contents lists available at SciVerse ScienceDirect

Knowledge-Based Systems
journal homepage: www.elsevier.com/locate/knosys

A collaborative filtering approach to mitigate the new user cold start problem
Jesús Bobadilla ⇑, Fernando Ortega, Antonio Hernando, Jesús Bernal
Universidad Politecnica de Madrid, FilmAffinity.com Research Team, Spain

a r t i c l e i n f o a b s t r a c t

Article history: The new user cold start issue represents a serious problem in recommender systems as it can lead to the
Received 23 November 2010 loss of new users who decide to stop using the system due to the lack of accuracy in the recommenda-
Received in revised form 29 July 2011 tions received in that first stage in which they have not yet cast a significant number of votes with which
Accepted 29 July 2011
to feed the recommender system’s collaborative filtering core. For this reason it is particularly important
Available online 30 August 2011
to design new similarity metrics which provide greater precision in the results offered to users who have
cast few votes. This paper presents a new similarity measure perfected using optimization based on neu-
Keywords:
ral learning, which exceeds the best results obtained with current metrics. The metric has been tested on
Cold start
Recommender systems
the Netflix and Movielens databases, obtaining important improvements in the measures of accuracy,
Collaborative filtering precision and recall when applied to new user cold start situations. The paper includes the mathematical
Neural learning formalization describing how to obtain the main quality measures of a recommender system using leave-
Similarity measures one-out cross validation.
Leave-one-out-cross validation Ó 2011 Elsevier B.V. All rights reserved.

1. Introduction instance, before throwing ourselves down a steep slope on a small


snowboard, we listen to all of our friends’ opinions, but we regard
Recommender systems (RS) [48] enable recommendations to be much highly those that we consider to have more in common with
made to users of a system in reference to the items or elements on our level in that sport, with our liking for risk, etc.
which this system is based (books, electrical appliances, films, e- There are also some so-called hybrid RS: they are systems
learning material, etc.). The core of a RS lie in its filtering algo- which combine different filtering approaches to exploit merits of
rithms: demographic filtering [26] and content-based filtering each one of these techniques, such as a combination of CF with
[28,47] are two well known filtering techniques. Content-based demographic filtering or CF with content based filtering [2]. Among
RS base the recommendations made to a user on the choices this the natural fields of action of these hybrid models we can highlight
user has made in the past (e.g. in a web-based e-commerce RS, if some bio-inspired models which are used in the filtering stage
the user purchased computer science books in the past, the RS will [14].
probably recommend a recent computer science book that he has CF based RS allow users to give ratings about a set of elements
not yet purchased on this website); demographic filtering based (e.g. hotels, restaurants, tourist destinations, etc. in a CF based
RS are based on the assumption that individuals sharing certain website), in such a way that when enough information is stored
common personal features (sex, age, country, etc.) will also share on the system we can make recommendations to each user based
common preferences. on information provided by those users we consider to have the
Currently, collaborative filtering (CF) is the most commonly most in common with them. Movie recommendation websites
used and studied technology [1,17]. CF RS are based on the way are probably the best-known cases to users and are without a
in which humans have made decisions throughout history: in addi- doubt the most thoroughly studied by researchers [25,2], although
tion to our own personal experience, we also base our decisions on there are many other fields in which RS have great and increasing
the experiences and knowledge coming from a relatively large importance, such as e-commerce [23,57,52,36], e-learning
group of acquaintances. We take this set of knowledge and we con- [12,4,21], music [15,31], digital libraries [40], playing games [32],
sider it ‘‘in a critical way’’, to obtain the decision we think will best social networks based collaborative filtering [13,35], etc.
suit our goal. ‘‘In a critical way’’ means that we are more inclined to A key factor in the quality of the recommendations obtained in
take into consideration those suggestions made by people with a CF based RS lies in its capacity to determine which users have the
whom we have more in common regarding the pursued goal; for most in common (are the most similar) to a given user. A series of
algorithms [19] and metrics [1,5,55,7–9] of similarity between
⇑ Corresponding author. Tel.: +34 670711147; fax: +34 913367522. users are currently available, enabling this important function to
E-mail address: jesus.bobadilla@upm.es (J. Bobadilla). be performed in the CF core of this type of RS.

0950-7051/$ - see front matter Ó 2011 Elsevier B.V. All rights reserved.
doi:10.1016/j.knosys.2011.07.021
226 J. Bobadilla et al. / Knowledge-Based Systems 26 (2012) 225–238

In order to measure the quality of the results of a RS, there is a via other means or not making CF-based recommendations until
wide range of metrics which are used to evaluate both the predic- there are enough users and votes.
tion and recommendation quality of these systems [17,1,18,6]. The new item problem [38,39] arises due to the fact that the
CF based RS estimate the value of an item not voted by a user new items entered in RS do not usually have initial votes, and
via the ratings made on that item by a set of similar users. The therefore, they are not likely to be recommended. In turn, an item
overall quality in the prediction is called accuracy [3] and the mean that is not recommended goes unnoticed by a large part of the
absolute error (MAE) is normally used to obtain it [17]. The sys- users community, and as they are unaware of it they do not rate
tem’s ability to make estimations is called coverage and it indicates it; in this way, we can enter a vicious circle in which a set of items
the percentage of prediction which we can make using the set of of the RS are left out of the votes/recommendations process. The
similar users selected (usually, the more similar users we select new item problem has less of an impact on RS in which the items
and the more votes the selected users have cast, the better the cov- can be discovered via other means (e.g. movies) than in RS where
erage we achieve). In RS, besides aiming to improve the quality this is not the case (e.g. e-commerce, blogs, photos, videos, etc.). A
measures of the predictions (accuracy and coverage), there are common solution to this problem is to have a set of motivated
other issues that need be taken into account [56,41,51]: avoiding users who are responsible for rating each new item in the system.
overspecialization phenomena, finding good items, credibility of The new user problem [42,43,46] is among the great difficulties
recommendations, precision and recall measures, etc. faced by the RS in operation. When users register they have not
The rest of the paper is divided into the following sections (with cast any votes yet and, therefore, they cannot receive any person-
the same numbering shown here): alized recommendations based on CF; when the users enter their
firsts ratings they expect the RS to offer them personalized recom-
2. State of the art, in which a review is made of the most relevant mendations, but the number of votes entered is usually not suffi-
contributions that exist in the CF aspects covered in the paper: cient yet to provide reliable CF-based recommendations, and,
cold-start and application of neural networks to the RS. therefore, new users may feel that the RS does not offer the service
3. General hypothesis and motivations: what we aim to contribute they expected and they may stop using it.
and the indications that lead us to believe that carrying out The common strategy to tackle the new user problem consists
research into this subject will provide satisfactory results that of turning to additional information to the set of votes in order
support the hypothesis set out. to be able to make recommendations based on the data available
4. Design of the user cold-start similarity measure: explanation for each user; this approach has provided a line of research papers
and formalization of the design of the similarity measure pro- based on hybrid systems (usually CF-content based RS and CF-
posed as a linear combination of simple similarity measures, demographic based RS). Next we analyze some hybrid approaches;
by adjusting the weights using optimization techniques based [30] propose a new content-based hybrid approach that makes use
on neural networks. of cross-level association rules to integrate content information
5. Collaborative filtering specifications: formalization of the CF about domains items. Kim et al. [24] use collaborative tagging em-
methodology which specifies the way to predict and recom- ployed as an approach in order to grasp and filter users’ prefer-
mend, as well as to obtain the quality values of the predictions ences for items and they explore the advantages of the
and recommendations. This is the formalization that supports collaborative tagging for data sparseness and a cold-start user
the design of experiments carried out in the paper. The method- (they collected the dataset by crawling the collaborative tagging
ology is provided which describes the use of leave-one-out del.icio.us site). Weng et al. [53] combine the implicit relations be-
cross validation applied to obtaining the MAE, coverage, preci- tween users’ items preferences and the additional taxonomic pref-
sion and recall. erences so as to make better quality recommendations as well as
6. Design of the experiments with which the quality results are alleviate the cold-start problem. Loh et al. [33] represent user’s
obtained provided by the user cold-star similarity measure pro- profiles with information extracted from their scientific publica-
posed and by a set of current similarity measures for which we tions. Martinez et al. [34] present a hybrid RS which combines a
aim to improve the results. We use the Netflix (http:// CF algorithm with a knowledge-based one. [10] propose a number
www.netflixprize.com) and Movielens (http://www.movie- of common terms/term frequency (NCT/TF) CF algorithm based on
lens.org) databases. demographic vector. Saranya and Atsuhiro [50] propose a hybrid
7. Graphical results obtained in the experiments, complemented RS that makes use of latent features extracted from items repre-
with explanations of the behavior of each quality measure. sented by a multi-attributed record using a probabilistic model.
8. Most relevant conclusions obtained. Park et al. [37] propose a new approach: they use filterbots, and
surrogate users that rate items based only on user or item
2. State of the art attributes.
All the former approaches base their strategies on the presence
2.1. The cold-start issue of additional data to the actual votes (user’s profiles, user’s tags,
user’s publications, etc.). The main problem is that not all RS dat-
The cold-start problem [48,1] occurs when it is not possible to abases possess this information, or else it is not considered suffi-
make reliable recommendations due to an initial lack of ratings. ciently reliable, complete or representative.
We can distinguish three kinds of cold-start problems: new com- There are so far two research papers which deal with the cold-
munity, new item and new user. The last kind is the most impor- start problem through the users’ ratings information: Hyung [22]
tant in RS that are already in operation and it is the one covered presents a heuristic similarity measure named PIP, that outper-
in this paper. forms the traditional statistical similarity measures (Pearson corre-
The new community problem [49,27] refers to the difficulty in lation, cosine, etc.); and Heung et al. [16] proposes a method that
obtaining, when starting up a RS, a sufficient amount of data (rat- first predicts actual ratings and subsequently identifies prediction
ings) which enable reliable recommendations to be made. When errors for each user; taking into account this error information,
there are not enough users in particular and votes in general, it some specific ‘‘error-reflected’’ models are designed.
is difficult to maintain new users, which come across a RS with The strength of the approach presented in this paper (and in the
contents but no precise recommendations. The most common Hyung and Heung works) lies in its ability to mitigate the new user
ways of tackling the problem are encouraging votes to be made cold-start problem from the actual core of the CF stage, providing a
J. Bobadilla et al. / Knowledge-Based Systems 26 (2012) 225–238 227

similarity metric between users specially designed for this purpose voted for a similar number of items than users for whom the num-
and which can be applied to new users of any RS; i.e. it has a uni- ber of items voted is very different. By way of example, it is more
versal scope as it does not require additional data to the actual convincing to determine as similar users two people who have only
votes cast. The main problem of this approach is that with it, it is voted for between 8 and 14 movies, all of which are science fiction,
more complex and risky to carry out an information retrieval from than to determine as similar users a user who has only voted for 10
the votes than to directly take the additional information provided movies, all of which are science fiction, and another who has voted
by the user’s profiles, user’s tags, etc., held by some RS. for 2400 movies of all genres. Whilst in the first case, the recom-
mendations will tend to be restricted to the movies of common
genre, in the second, the new cold-start user will be able to receive
2.2. Neural networks applied to recommender systems
thousands of recommendations of all types of movies of genres in
which they are not interested and which are very unsuitable for
Neural networks (NN) is a model inspired by biological neurons.
recommending based only on a maximum of 10 recommendations
This model, intended to simulate the way the brain processes
in common.
information, enables the computer to ‘‘learn’’ to a certain degree.
In the previous example, and under the restriction of recom-
A neural network typically consists of a number of interconnected
mendations made to new cold-start users, we can see the positive
nodes. Each node handles a designated sphere of knowledge, and
aspect of being recommended by a person who has cast a similar
has several inputs from the network. Based on the inputs it gets,
number of votes to yours: it is quite probable that the items that
a node can ‘‘learn’’ about the relationships between sets of data,
they have voted for and that you have not, will be related to those
pattern, and, based upon operational feedback, are molded into
you have rated, and, therefore, it is very possible that you are
the pattern required to generate the required results.
interested in them. We can also see the negative aspect: the capac-
In this paper, we make novel use of NN, by using them to opti-
ity for recommendation (coverage) of the other user will not be as
mize the results provided by the similarity measure designed. This
high.
approach enables NN techniques to be applied in the same kernel
As regards the distribution (or structure) of the votes of the
of the CF stage. The most relevant research available in which NN
users compared with the similarity measure we wish to design,
are used in some aspect of the operation of RS usually focuses on
there are two significant aspects for determining their similarity:
hybrid RS in which NN are used for learn users profiles; NN have
also been used in the clustering processes of some RS.
 It is positive a large number of common items that they have
The hybrid approaches enable neural networks to act on the
both voted for.
additional information to the votes. In [44] a hybrid recommender
 It is negative a large number of uncommon items that they have
approach is proposed using Widrow–Hoff [54] algorithm to learn
both voted for.
each user’s profile from the contents of rated items, to improve
the granularity of the user profiling. In [11] a combination of con-
If a user u1 has voted for 10 movies, a user u2 has voted for 14
tent-based and CF is used in order to construct a system providing
and a user u3 has voted for 70, it will be more convincing to classify
more precise recommendations concerning movies. In [29] first, all
u1 and u2 as similar if they have 6 movies in common than if they
users are segmented by demographic characteristics and users in
have only one. It will also be more convincing to classify u1 and u2
each segment are clustered according to the preference of items
as similar with 6 movies in common than u1 and u3 with 7 movies
using the Self-Organizing Map (SOM) NN. Kohonon’s SOMs are a
in common. In this case, we must also be cautious with the cover-
type of unsupervised learning; their goal is to discover some
age reached by the similarity measure designed.
underlying structure of the data.
The assumptions on which the paper’s motivation is based will
Two alternative NN uses are presented in [20,45]. In the first
be confirmed or refuted to a great extent depending on whether
case the strategy is based on training a back-propagation NN with
the non-numerical information of the votes used (proportions in
association rules that are mined from a transactional database; in
the number of votes and their structure) is a suitable indicator of
the second case they propose a model that combines a CF algo-
similarity between each pair of users compared. In order to clarify
rithm with two machine learning processes: SOM and Case Based
this situation we have applied basic statistical functions to all votes
Reasoning (CBR) by changing an unsupervised clustering problem
that are usually cast by the users.
into a supervised user preference reasoning problem.
Fig. 1 displays the arithmetic average and standard deviation
distributions of the votes cast by users of Movielens 1 M and Netf-
3. Hypothesis and motivation lix [5]. As we can see, most of the users’ votes are between 3 and 4
stars (arithmetic average), with a variation of approximately 1 star
The paper’s hypothesis deals with the possibility of establishing (standard deviation).
a similarity measure especially adapted to mitigate the new user By analyzing Fig. 1, we can assume that the 3–4 star interval
cold-start problem which occurs in CF-based RS; furthermore, dur- marks the division between the votes that positively rate the items
ing the recommendation process, the similarity measure designed from those that rate them in a non-positive way. In general, a po-
must only use the local information available: the votes cast by sitive vote will be placed at value 4 and in exceptional cases at 5,
each pair of users to be compared. whilst a non-positive vote will be placed at value 3 and in excep-
The main idea of our paper considers that it is possible to obtain tional cases at 2 or 1.
additional information to that used by the traditional similarity Taking into account the very low number of especially negative
measures of statistical origin (Pearson correlation, cosine, Spear- votes (2 and 1 stars) shown in Graphs 1a and 1b, we can assume
man rank correlation, etc.). Whilst the traditional similarity mea- that the users have a tendency to not rate items they consider in
sures only use the numerical information of the votes, we will a non-positive way, whilst to a lesser extent the opposite also
make use of both the numerical information of the votes and infor- occurs: when they cast a vote there is a great probability that it
mation based on the distribution and on the number of votes cast implies a positive rating (above 3.5 stars in Graphs 1a and 1b).
by each pair of users to be compared. Therefore, it seems that the users tend to simplify their ratings
As regards the number of votes of the users compared by the into positive/non-positive and then transfer their psychological
similarity measure that we want to design, the key aspect is that choice to the numerical plane. In order to check this hypothesis
it is more reasonable to assign greater similarity to users who have the following experiment has been carried out [5] on the
228 J. Bobadilla et al. / Knowledge-Based Systems 26 (2012) 225–238

Fig. 1. Arithmetic average and standard deviation on the MovieLens 1 M and NetFlix ratings of the items. (A) Movielens arithmetic average, (B) Netflix arithmetic average, (C)
Movielens standard deviation, (D) Netflix standard deviation [5].

Movielens 1 M database: we transformed all 4 and 5 votes into P


votes (Positive) and all of 1, 2 and 3 votes into N votes (Non-posi-
tive), in such a way that we aim to measure the impact made on
the recommendations by doing without the detailed information
provided by the numerical values of the votes.
In the experiment we compare the precision/recall obtained in a
regular way (using the numerical values of the votes) with that ob-
tained using only the discretized values P and N. Fig. 2 displays the
results, which show how the ‘‘positive/non-positive’’ discretization
not only does not worsen the precision/recall measurements, but
rather it improves them both, particularly the precision when the
number of recommendations (N) is high.
The reasoning shown and the experimental results obtained
Fig. 2. Precision/Recall obtained by transforming all 4 and 5 votes into P votes
encourage the use of the non-numerical information of the users’ (Positive) and all 1, 2 and 3 votes into N votes (Non-positive), compared to the
votes as a means of attempting to obtain a cold-start similarity results obtained using the numerical values. 20% of test users, 20% of test items,
measure which, by making use of this additional information, pro- K = 150, Pearson correlation, relevant threshold = 4. MovieLens 1 M (Bobadilla et al.,
vides better results than the traditional metrics. 2010).

4. Design of the proposed user cold-start similarity measure


Table 1
The cold-start similarity measure proposed is formed by carry- Parameters.
ing out a linear combination of a group of simple similarity mea- Name Parameters descriptions
sures. The scalar values with which each individual similarity
L # Users
measure is weighted are obtained in a process of optimization M # Items
based on neural learning; this way, after a stage to determine the min # Min rating value
weights of the linear combination, the cold-start similarity mea- max # Max rating value
sure can be used to obtain the k-neighbors of each cold-start user k # Neighborhoods
N # Recommendations
who request recommendations.
h Recommendation threshold
One of the simple similarity measures is Jaccard, which pro-
cesses the non-numerical information of the votes; the rest base
their operation on the simplest information with which two votes
can be compared: their difference.
U ¼ fu 2 NaturalNumberju 2 f1::Lgg; set of users ð1Þ
I ¼ fi 2 NaturalNumberji 2 f1::Mgg; set of items ð2Þ
4.1. Formalization
V ¼ fv 2 NaturalNumberj min 6 v 6 maxg [ fg; set of possible votes ð3Þ
Ru ¼ fði; v Þji 2 I; v 2 V g; ratings of user u ð4Þ
Given an RS with a database of L users and M items rated in the
range [min..max], where the absence of ratings will be represented We define vote v of user u on item i as r u;i ¼ v ð5Þ
by the symbol . We define the average of the valid votes of user u as r u ð6Þ
J. Bobadilla et al. / Knowledge-Based Systems 26 (2012) 225–238 229

We define the cardinality of a set C as its number of valid 4.2. Set of basic similarity measures
elements
In order to find the similitude between two users x and y, we
#C ¼ #fx 2 Cjx – g ð7Þ first take all of the information regarding the value of the vote
(or lack of vote) from these users in each of the items of the RS.
In this way: We will assess the following basic similarity measures between
two users x and y:
#Ru ¼ #fi 2 Ijr u;i – g ð8Þ
 Measures based on the numerical values of the votes
Below we present the tables of parameters (Table 1), measures
1. v0, v1, v2, v3, v4 (Eqs. (9)–(11)).
(Table 2) and sets (Table 3) used in the formalizations made in the
2. Mean squared differences (l), (Eqs. (12) and (13)).
paper.
3. Standard deviation of the squared differences (r), (Eqs. (12)
and (14)).
Table 2  Measure based on the arrangement of the votes
Measures. 4. Jaccard, (Eqs. (15) and (16)).
Name Measures descriptions

v 0x;y # Items with the same value in user x and user y (normalized)
v0 represents the number of cases in which the two users have
voted with exactly the same score, indicating a high degree of sim-
v 1x;y # Items with a difference of 1 stars in user x and user y
(normalized) ilarity between them; it contributes to increasing the number of
v 3x;y # Items with a difference of 3 stars in user x and user y cases of particularly accurate predictions.
(normalized) v4 represents the opposite case: the number of times that they
v 4x;y # Items with a difference of 4 stars in user x and user y
have voted in a completely opposite way; the aim is to minimize
(normalized)
lx,y Mean squared differences (user x, user y)
the importance given to the fact that they have voted for the same
rx,y Standard deviation item, as they have done so by indicating very different preferences.
Jaccardx,y Jaccard similarity measure It also contributes to reducing the number of cases of particularly
{w1, . . . , w6} Similarity measure weights incorrect predictions.
pu,i Prediction to the user on the item
mu,i Prediction error on user u, item i
v1, v2 and v3 represents the intermediate cases (v1 the number
mu User u mean absolute error of cases in which users have voted with a difference of one score,
m RS mean absolute error v2 with a difference of two scores, . . . ).
cu,i User u, item i, coverage l provides the simplest and most intuitive measure of simili-
cu User u coverage tude between users, but it could be a good idea to complement it
c RS coverage
qu,i Is i recommended to the user u?
with the importance held by the extreme cases in this measure,
tu,i Is i recommended relevant to the user u? using r.
tu Precision of the user u The Jaccard measure rewards situations in which the two users
t Precision of the RS have voted for similar sets of items, taking into account the propor-
nu,i Is i not recommended relevant to the user u?
tion regarding the total number of voted items by both.
xu Recall of the user u
x Recall of the RS 
Let V dx;y ¼ i 2 Ijr x;i –  ^ry;i –  ^jrx;i  r y;i j ¼ d where
d 2 f0; . . . ; max  mingg ð9Þ
d
Table 3 We define 8d 2 f0; . . . ; max  ming; bx;y ¼ #V dx;y ð10Þ
Sets.

Name Sets descriptions Parameters finally, we perform the normalization:


U Users L b
d
x;y
I Items M v dx;y ¼ max  min ; v dx;y 2 ½0; 1 ð11Þ
P d
V Rating values min, max bx;y
d¼0
Ru User ratings user  
Let Gx;y ¼ i 2 Ijrx;i –  ^r y;i –  ; the set of items voted simultaneously by both users
V dx;y Items rated with a difference of d stars user x, user
ð12Þ
y, d
Gx,y Items rated simultaneously by users x and y user x, user y 1 X  rx;i  ry;i 2
lx;y ¼ 1  () Gx;y – ;; lx;y 2 ½0; 1 ð13Þ
Ru Items voted by user u user #Gx;y i2G max  min
x;y
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Ku Neighborhoods of the user user, k u  2
u 1 X  rx;i  ry;i 2
Pu Predictions to the user user, k rx;y ¼ t  ð1  lx;y Þ () Gx;y – ;; rx;y 2 ½0; 1 ð14Þ
#Gx;y i2G max  min
Hu,i User’s neighborhoods which have rated the item user, i, k x;y
 
Xu Top recommended items to the user user, k, h Lets Ru ¼ i 2 Ijru;i –  ; the set of items voted by user u ð15Þ
Zu Top N recommended items to the user user, k, N, h #ðRx \ Ry Þ
Ut Training users Jaccardx;y ¼ ; Jaccardx;y 2 ½0; 1 ð16Þ
#ðRx [ Ry Þ
Uv Validation users
It Training items
Iv Validation items
Mu Items rated by user u where a prediction can be user 4.3. Similarity measures selected
determined
O Validation users with assigned MAE The total set of basic similarity measures to be applied to the
Cu Items not rated by user u where a prediction can be user proposed metric is as follows:
determined
O⁄ Validation users with assigned coverage value
fv 0 ; v 1 ; v 2 ; v 3 ; v 4 ; r; l; Jaccardg ð17Þ
S Validation users with assigned precision value
Yu Set of recommended relevant items (true-positives) user With the aim of reducing the number of basic similarity mea-
Nu set of not recommended relevant items user
sures, using Netflix and Movielens we have calculated various
S⁄ Validation users with assigned recall value
quality results (MAE, coverage, precision and recall) using the full
230 J. Bobadilla et al. / Knowledge-Based Systems 26 (2012) 225–238

set of basic similarity measures; subsequently, we repeated the problem, to make the adjustment of the weights using the gradi-
process eight times, eliminating one of the eight basic similarity ent descent method:
measures in each repetition (and conserving the other seven). After
concluding these experiments, we have rejected the similarity wi ðt þ 1Þ ¼ wi ðtÞ þ a  xi ðMAEðtÞMJDðtÞ  MAEJMSD Þ i 2 f1; . . . ; 6g
measures which on elimination caused a very slight worsening in where x1 ¼ v 0u1;u2 ; x2 ¼ v 1u1;u2 ; x3 ¼ v 3u1;u2 ; x4 ¼ v 4u1;u2 ;
the quality results.
x5 ¼ lu1;u2 ; x6 ¼ Jaccardu1;u2
Finally, we have seen that the quality results offered by all of
the similarity measures selected are only slightly worse than the ð20Þ
original ones, and therefore, the significant information is pro- The learning of the neuronal network is carried out using a set
vided by the set of similarity measures which have not been of pairs of users (u1, u2) where u1 represents a cold-start user (who
rejected. The empirical results have led us to select the following has rated between 2 and 20 items) and u2 represents any user in
subset of (17): the database. This pair of users are taken into account for updating
the weights wi.
fv 0 ; v 1 ; v 3 ; v 4 ; l; Jaccardg ð18Þ
MAE(t)MJD(t) stands for the MAE of the recommender system,
which is calculated using the proposed metric MJD based on the
4.4. Formulating the metric set of weights wi(t). This measure considers that only the cold-start
users are test users, and measures the accuracy of the recom-
The MJD proposed metric (Mean-Jaccard-Differences) is based mender system based on the MJD in the instant t.
on the hypothesis that by combining the six individual similarity MAEJMSD stands for the MAE of the recommender system
measures presented in (18), we will be able to obtain a global sim- calculated taking into account all test users (not only cold-start
ilarity measure between pairs of users. As each individual similar- users) and the similarity measure JMSD. This measure represents
ity measure presents its relative degrees of importance, it is the accuracy of the recommender system which we try to
necessary to assign a weighting (wi) to each of them. This way, reach.
the proposed metric is formulated as: The measure MAEJMSD is an upper bound of the accuracy ob-
tained in each instant t (MAE(t)MJD(t)) since we can reach better
1 
MJDx;y ¼ w1 v 0x;y þ w2 v 1x;y þ w3 v 3x;y þ w4 v 4x;y þ w5 lx;y þ w6 Jaccardx;y accuracy using all users as test users than using only cold-start
6
users. The metric JMSD has been selected as a reference since it
ð19Þ
provides good results [5].
At this point, it is necessary to determine the weights wi For the adjustment of the weights it is necessary to develop a
which enable us to obtain a metric that improves the results of training set in which the following are specified:
those commonly used; for this purpose we look for a potential
solution to an optimization problem based on neural learning.  The input data to the system for every pair of users, in our case
As an illustration, using Netflix database, the final weights after the values presented in (18).
the adjustment are those shown in Table 4. The values obtained  The desired output for each pair of users of the system. To
indicate the importance of each individual measure metric in the implement the error measure, we use the system MAEJMSD,
final result of the NN measure metric. Note how w4v4 (number of making use of a very small set of test users with the aim of
cases in which the votes of the two users are totally different) achieving reasonable execution times. These calculations are
has a very negative impact on the similarity result between the obtained using parallel processing through a cluster of
users considered. computers.
By analyzing the weights (wi) obtained in the neural learning
process, we can determine that the proposed similarity measure Fig. 3 shows the main modules involved in the whole neural
mainly uses the votes’ numerical information, and that this infor- learning process, where the time t + 1 weights are adjusted in
mation is complemented and modulated by the arrangement of accordance with the MAE(t)MJD(t) obtained.
the votes provided by Jaccard. The numerical information is based
on the measure of l; furthermore, the similarity between users is
reinforced with the results of v0 and v1 and is reduced with the re- 5. Collaborative filtering specifications
sults of v3 and v4.
In this section we specify the CF methods proposed to make rec-
ommendations. We also formalize the CF methodology used in the
4.5. Neural network learning
experiments; through this methodology, we calculate the quality
results of the predictions and recommendations for the similarity
Eq. (19) has a very similar form to the input (NET) of an arti-
measures studied.
ficial neural network, and more specifically to an ADALINE
Due to the scarce number of items voted for by the cold-start
network. As it is a problem of adjustment, and not of classifica-
users (which we have determined in the interval {2, . . . , 20}), we
tion, we can use a continuous activation function, e.g. linear acti-
have decided to use leave-one-out cross validation to ensure the
vation, instead of the sigmoid function used in the traditional
greatest possible number of training items in each validation pro-
perceptron. A system of this sort would technically be a percep-
cess. The proposed methodology includes the formalization of the
tron with linear activation function. This parallelism between our
processes to obtain the MAE, coverage, precision and recall using
metric and the propagation of the signal in a perceptron enables
leave-one-out cross validation.
the Widrow–Hoff method [54] to be used, adapted to our

5.1. Obtaining prediction and recommendations


Table 4
Weights obtained using the gradient descent method (Netflix RS). 5.1.1. Users’s k-neighbors
w1 w2 w3 w4 w5 w6 We define Ku as the set of k neighbors of the user u and we use
the desired user similarity measure: simx,y, where x and y are users.
0.66 0.35 0.21 0.43 1.07 0.31
The following must hold:
J. Bobadilla et al. / Knowledge-Based Systems 26 (2012) 225–238 231

Table 5
RS running example (users, items and ratings).

ru,i i1 i2 i3 i4 i5 i6 i7 i8 i9 i10 i11 i12


u1  2   3 5   3   
u2 1   2 5    4  5 
u3   3   4  5 1  2 5
u4 2 3   4  5 4  2 3 1
u5  3 4 3  5 4  2 2 3 4
u6 5 4 2 3  3 2 3 5 3  3

Table 6
Fig. 3. Neural learning process. Similarity measures between users using MJD.

MJD u2 u3 u4 u5 u6

K u  U ^ #K u ¼ k ^ u R K u ð21Þ u1 1.166 1.155 1.415 1.572 0.887

8x 2 K u ; 8y 2 ðU  K u Þ; simu;x P simu;y ð22Þ

Table 7
5.1.2. Prediction of the value of an item Predictions that u1 can receive using MJD, k = 2 and arithmetic average as aggregation
Based on the information provided by the k-neighbors of a user approach.

u, the CF process enables the value of an item to be predicted as Pu1,i i1 i3 i4 i7 i8 i10 i11 i12
follows: u1 2 4 3 4,5 4 2 3 2,5

Let Pu ¼ fði; pÞji 2 I; p 2 RealNumberg;


set of prediction to the user u ð23Þ
If we want to make N = 3 recommendations: Zu = {i7, i3, i8}.
We will assign the value of the prediction p
made to user u on item i as pu;i ¼ p ð24Þ
5.2. Obtaining the collaborative filtering quality measures
Once the set of k users (neighbors) similar to active u has been
calculated (Ku), in order to obtain the prediction of item i on user u In this section we specify the way in which we will obtain the re-
(24), we use the aggregation approach deviation-from-mean (Eqs. sults of the quality offered by the proposed similarity measure. We
(25)–(27)). provide equations which formalize the process of obtaining the
Lets Hu;i ¼ fn 2 K u jrn;i – g ð25Þ selected prediction quality measures: MAE and coverage, and the
X selected recommendation quality measures: precision and recall.
1
pu;i ¼ r u þ P simu;n ðr n;i  r n Þ () Hu;i – ; ð26Þ We have determined that a user will be considered as a cold-
n2Hu;i simu;n n2H start user when they have between 2 and 20 items voted. This very
u;i

pu;i ¼  () Hu;i ¼ ; ð27Þ limited number of items involves that the use of the most common
cross validation (random sub-sampling and k-fold cross validation)
is not appropriated either in the prediction quality measures or in
5.1.3. Top N recommendations the recommendation quality measures; there are not enough items
We define Xu as the set of predictions to user u, and Zu as the set to feed suitably the training and validation stages.
of N recommendations to user u. The cross validation method chosen to carry out the experi-
The following must hold: ments is leave-one-out cross validation; this method involves
using a single observation from the original sample as the valida-
X u  I 8i 2 X u ; ru;i ¼ ; pu;i – ; ð28Þ
tion data, and the remaining observations as the training data. Each
Z u # X u ; #Z u ¼ N; 8x 2 Z u ; 8y 2 X u pu;x P pu;y ð29Þ observation in the sample is used once as the validation data. This
method is computationally expensive, but it allows us to use larger
If we want to impose a minimum recommendation value: h 2 Real-
sets of training data.
Number, we add pu,i P h
Starting from the sets of users and items defined in RS (Eqs. (1)
and (2)), we establish the sets of training and validation users and
5.1.4. Running example
training and validation items which will be used in each individual
We define a micro RS example with 6 users, 12 items and a
validation process of an item using leave-one-out cross validation.
range of votes from 1 to 5:

U ¼ fu1 ; u2 ; u3 ; u4 ; u5 ; u6 g; U t  U; set of training users ð30Þ


I ¼ fi1 ; i2 ; i3 ; i4 ; i5 ; i6 ; i7 ; i8 ; i9 ; i10 ; i11 ; i12 g; V ¼ f1; 2; 3; 4; 5; g U v  U; set of validation users ð31Þ
It  I; set of training items ð32Þ
Table 5 shows the votes (ru,i) cast by each user.
In order to make recommendations for user 1 (by way of exam- Iv  I; set containing the validation item ð33Þ
ple), using MJD and the weights of Table 4, we calculate their sim-
Using leave-one-out cross validation, the following holds:
ilarity with the rest of the users (Table 6).
v
Taking k = 2 neighbors, we obtain the set of neighbors of U [ U t ¼ U; U v \ U t ¼ ;; #Iv ¼ 1; Iv [ It ¼ I; Iv \ It ¼ ;
u1 : K u1 ¼ fu5 ; u4 g.
ð34Þ
To calculate the possible predictions that can be made to u1, we
use the arithmetic average as an aggregation approach instead of In the process to obtain the RS quality measures we use valida-
the equation proposed in (27). The predictions obtained are sum- tion items, reserving the training items to determine the k-neigh-
marized in Table 7. bors. Therefore, the similarity measures make their calculations
232 J. Bobadilla et al. / Knowledge-Based Systems 26 (2012) 225–238

using training items; Eq. (35) replaces Eq. (12). In the same way, We define the coverage of a user u on an item i as cu,i; this value
Eq. (22) must reflect that the predictions are made on the valida- indicates that we can make a prediction of item i to user u.
tion items (Eq. (36)). )
  cu;i –  () u 2 U v ; i 2 Iv ; pu;i –  ^ru;i ¼ 
Gx;y ¼ i 2 It jr x;i –  ^ry;i –  ð35Þ ð50Þ
  cu;i ¼  () u 2 U v ; i 2 Iv ; pu;i ¼  _ r u;i – 
Pu ¼ ði; pÞji 2 Iv ; p 2 RealNumber u 2 U v ð36Þ
Let Cu be the set of items on which a prediction can be made to
We modify the equations to obtain k-neighborhoods (21) and user u:
(22), obtaining Eqs. (37) and (38).

u 2 Uv K u  Ut #K u ¼ k u R K u ð37Þ C u ¼ fi 2 Ijcu;i –  where u 2 U v g ð51Þ


t
8x 2 K u ; 8y 2 ðU  K u Þ; simu;x P simu;y ð38Þ The coverage of the user u (cu) is obtained as the proportion be-
tween the number of items not voted for by the user which can be
predicted and the total items not voted for by the user.
5.2.1. Quality of the prediction: mean absolute error/accuracy
In order to measure the accuracy of the results of a RS, it is usual #C u
cu ¼ 100  () Ru – I; cu 2 ½0; 100 ð52Þ
to use the calculation of some of the most common error metrics, #I  #Ru
amongst which the mean absolute error (MAE) and its related met- cu ¼  () Ru ¼ I ð53Þ
rics: mean squared error, root mean squared error, and normalized
The coverage of the RS: (c) is obtained as the average of the
mean absolute error stand out. The MAE indicates the average er-
user’s coverage:
ror made in the predictions; therefore, the lower this value, the
better the accuracy of the system.
Lets O ¼ fu 2 U v jcu – g ð54Þ
Using leave-one-out cross validation, we carry out a validation
process for each item of each validation user: We define the system’s coverage as:
v
Let u 2 U ^ i 2 Ij#Ru 2 f2; . . . ; 20g ^ r u;i –  ð39Þ
1 X
Iv ¼ fig; It ¼ fj 2 Ijr u;i –  ^j – ig ð40Þ c¼ cu () O – ;; c 2 ½0; 100 ð55Þ
#O u2O
Through It we can calculate the k-neighbors (Ku). Through Ku and Iv, c ¼  () O ¼ ; ð56Þ
the prediction pu,i can be calculated.
We define the absolute error of a user u on an item i(mu,i) as:
5.2.3. Quality of the recommendation: precision
mu;i ¼ jpu;i  r u;i j () u 2 U v ; i 2 Iv ; pu;i –  ^r u;i – ; The precision refers to the capacity to obtain relevant recom-
mu;i 2 ½0; max  min ð41Þ mendations regarding the total number of recommendations
made. We define h as the minimum value of a vote to be considered
mu;i ¼  () u 2 U v ; i 2 Iv ; pu;i ¼  _ r u;i ¼  ð42Þ
relevant.
The MAE of the user u(mu) is obtained as the average of its mu,i: Let u 2 Uv ^ i 2 Ij#Ru 2 {2, . . . , 20} ^ ru,i – 
 
Let Mu ¼ i 2 Ijmu;i –  ; u 2 U v ð43Þ Iv ¼ fig; It ¼ fj 2 Ijr u;i –  ^j – ig ð57Þ
1 X
mu ¼
#M u i2M
mu;i ; mu 2 ½0; max  min ð44Þ Through It we obtain k-neighbors (Ku), and through Ku and Iv we
u
make the prediction pu,i.
The MAE of the RS: (m) is obtained as the average of the user’s MAE: Each qu,i term indicates whether item i has been recommended
to user u.
Let O ¼ fu 2 U v jmu – g ð45Þ
)
qu;i –  () u 2 U v ; i 2 Iv ðpu;i –  ^pu;i P hÞ ^ ru;i – ; recommended item
We define the system’s MAE as: v v
qu;i ¼  () u 2 U ; i 2 I ðpu;i ¼  _ pu;i < hÞ ^ r u;i – ; not recommended item
1 X ð58Þ
m¼ mu () O – ;; m 2 ½0; max  min ð46Þ
#O u2O
m ¼  () O ¼ ; ð47Þ Each tu,i term indicates whether item i recommended to user u has
been relevant.
The accuracy is defined as: )
m t u;i –  () u 2 U v ; i 2 Iv qu;i –  ^ru;i P h; recommended and relevant
accuracy ¼ 1  ; accuracy 2 ½0; 1 ð48Þ t u;i ¼  () u 2 U v ; i 2 Iv qu;i –  ^r u;i < h; recommended and not relevant
max  min
ð59Þ

5.2.2. Quality of the prediction: coverage The precision of the user u(tu) is obtained as the proportion be-
The coverage could be defined as the capacity of predicting from tween the number of recommended relevant items to the user and
a metric applied to a specific RS. In short, it calculates the percent- the total items recommended to the user.
age of situations in which at least one k-neighbors of each active
#fi 2 Ijtu;i – g
user can rate an item that has not been rated by that active user. tu ¼ () fi 2 Ijqu;i – g – ;; tu 2 ½0; 1 ð60Þ
Once again, using leave-one-out cross validation we carry out a #fi 2 Ijqu;i – g
validation process for each item of each validation user: tu ¼  () fi 2 Ijqu;i – g ¼ ; ð61Þ
Lets u 2 Uv ^ i 2 Ij#Ru 2 {2, . . . , 20} ^ ru,i – 
The precision of the RS: (t) is obtained as the average of the
Iv ¼ fig; It ¼ fj 2 Ijr u;i –  ^j – ig ð49Þ user’s precision:
With It we obtain k-neighbors (Ku) and with Ku and Iv we make
the prediction pu,i. Let S ¼ fu 2 U v jt u – g ð62Þ
J. Bobadilla et al. / Knowledge-Based Systems 26 (2012) 225–238 233

We define the system’s precision as: and training items. The third column (MJDu1,ui) informs about the
similarity between u1 and each validation user. The fourth column
1 X
t¼ t u () S – ;; t 2 ½0; 1 ð63Þ (Ku1) describes the k-neighbors of u1. The fifth column (Pu1,Iv)
#S u2S
shows the prediction of item i on user u1 using the arithmetic
t ¼  () S ¼ ; ð64Þ mean as aggregation approach instead of Eq. (27). The sixth col-
umn (ru1,Iv) informs about the vote of U1 for the validation items:
the column ‘MAE’ describes the mean absolute difference between
5.2.4. Quality of the recommendation: recall
prediction and vote (41); the column ‘Precision’ indicates when an
The recall refers to the capacity to obtain relevant recommenda-
item has been recommended (q!=) and when it has been recom-
tions regarding the total number of relevant items.
mended and it is relevant (t!=); the column ‘Recall’ shows when
We define h as the minimum value of a vote to be considered as
a relevant item has been recommended (t!=), and when it has
relevant.
not (n!=).
Lets u 2 Uv ^ i 2 Ij#Ru 2 {2, . . . , 20} ^ ru,i – 
In the example, we make the following calculations which
Iv ¼ fig; It ¼ fj 2 Ijr u;i –  ^j – ig ð65Þ determine the similarity between users u1 and u3 : Gu1;u3 ¼
fi6 ; i9 g; v 0u1;u3 ¼ 0=2 ¼ 0; v 1u1;u3 ¼ 1=2 ¼ 0:5; v 3u1;u3 ¼
With It we obtain k-neighborhoods (Ku) and with Ku and Iv we make 0=2 ¼ 0; v 4u1;u3 ¼ 0=2 ¼ 0; Jaccardu1;u3 ¼ 2=7 ¼ 0:286,
prediction pu,i.
54 2
31 2
Each term nu,i indicates whether item i not recommended to the 51
þ 51
lu1;u3 ¼ 1  ¼ 0:844
2
user u is relevant 1 
) MJDu1;u3 ¼ w1 v 0u1;u3 þ w2 v 1u1;u3 þ w3 v 3u1;u3 þ w4 v 4u1;u3 þ w5 lu1;u3 þ w6 Jaccardu1;u3 ¼ 1:166
6
nu;i –  () u 2 U v ; i 2 Iv pu;i –  ^ru;i –  ^pu;i < h ^ ru;i P h; not rec: &rel:
nu;i ¼  () u 2 U v ; i 2 Iv pu;i –  ^ru;i –  ^pu;i < h ^ r u;i < h; not rec: &not rel: Table 9 specifies the way to obtain the coverage of u1 with each
ð66Þ of their not voted items. Based on the similarity results set out in
Table 6 and the predictions expressed in Table 7 we determine
We define Yu as set of recommended items to user u which have
100% coverage for user u1.
been relevant.
Table 10 summarizes the results provided by the quality mea-
Y u ¼ fi 2 Ijt u;i – g; u 2 U v ð67Þ sures applied to the running example, using MJD.
We define Nu as the set of not recommended items to user u which
have been relevant. 6. Design of the experiments

Nu ¼ fi 2 Ijnu;i – g; u 2 U v ð68Þ The experiments have been carried out using the Netflix and
The recall of the user u(xu) is obtained as the proportion be- Movielens databases, which contains the cold-start users filtered
tween the number of recommended relevant items to the user
Table 9
and the total relevant items for the user (recommended and not User u1 coverage.
recommended).
Iv It MJDu1,ui Ku1 Pu1,{i} Quality
#Y u measures
xu ¼ () Y u [ Nu – ;; xu 2 ½0; 1 ð69Þ u3 u4 u5 u6
coverage
#ðY u [ Nu Þ
xu ¼  () Y u [ Nu ¼ ; ð70Þ {i1} {i2, i5, i6, i9} 1.166 1.415 1.572 0.887 {u5, u4} 2 c–
{i3} {i2, i5, i6, i9} 4 c–
The recall of the RS: (x) is obtained as the average of the user’s {i4} {i2, i5, i6, i9} 3 c–
{i7} {i2, i5, i6, i9} 4,5 c–
recall:
{i8} {i2, i5, i6,i9} 4 c–
Lets S ¼ fu 2 U v jxu – g ð71Þ {i10} {i2, i5, i6, i9} 2 c–
{i11} {i2, i5, i6, i9} 3 c–
We define the system’s recall as: {i12} {i2, i5, i6, i9} 2,5 c–
Total 100
1 X
x¼ xu () S – ;; x 2 ½0; 1 ð72Þ
#S u2S
x ¼  () S ¼ ; ð73Þ Table 10
Quality measures results.

Quality measures
5.2.5. Running example
MAE Coverage Precision Recall
We establish Uv = {u1, u2}, Ut = {u3, u4, u5, u6}, k = 2.
u1 0.75 100 0.5 1
Table 8 shows an outline of the process with which we obtain
u2 1.7 100 1 0.33
the quality measures MAE, precision and recall of user u1. The Total 1.225 100 0.75 0.66
two first columns (Iv and It) describe respectively the validation

Table 8
User u1 MAE, precision and recall.

Iv It MJDu1,ui Ku1 Pu1,Iv ru1,Iv Quality measures


u3 u4 u5 u6 MAE Precision Recall
{i2} {i5, i6, i9} 1.166 1.387 1.610 0.864 {u5, u4} 3 2 1 q = , t =  t = , n = 
{i5} {i2, i6, i9} 1.166 1.387 1.582 0.895 {u5, u4} 4 3 1 q – , t =  t = , n = 
{i6} {i2, i5, i9} 0.846 1.422 1.422 0,864 {u4, u5} 5 5 0 q – , t –  t – , n = 
{i9} {i2, i5, i6} 1.397 1.422 1.610 0.864 {u5, u4} 2 3 1 q = , t =  t = , n = 
Total 0.75 0.5 1
234 J. Bobadilla et al. / Knowledge-Based Systems 26 (2012) 225–238

Table 11
Experiments performed.

Databases # Neighbors on x axis


Test users K (MAE, coverage) Precision, recall
Range step K N h Figures
Movielens 1 M 20% {100, . . . , 2000} 100 700 {2, . . . , 20} 5 Fig. 4
Netflix 20% {100, . . . , 2000} 100 700 {2, . . . , 20} 5 Fig. 6
Databases # Ratings on x axis
Test users (MAE, coverage, precision, recall)
#Ratings K N h Figures
Movielens 1 M 20% {2, . . . , 20} step 1 700 10 5 Fig. 5
Netflix 20% {2, . . . , 20} step 1 700 10 5 Fig. 7

in other databases (users with less than 20 votes). Movielens is the precision and recall, using leave-one-out cross validation for the
RS research database reference and Netflix offers us a large data- items, 20% of validation users, 80% of training users (only those
base on which metrics, algorithms, programming and systems who have voted for a maximum of 20 items are processed into
are put to the test. The main parameters of Netflix are: 480189 the validation users set). Section 5.2 sets out the formalization
users, 17770 items, 100,480,507 ratings, 1–5 stars; the main which gives a detailed description of how to obtain each quality
parameters of Movielens are: 6040 users, 3706 items, 1,480,507 measure result.
ratings, 1 to 5 stars. Each quality measure (MAE, coverage, precision and recall) will
Since the database Movielens does not take into account cold- be calculated, using Movielens and Netflix, in two experiments:
start users (users with less than 20 votes), we have removed votes
of this database in order to achieve cold-start users. Indeed, we
Experiment 1. Evolution of the results of MJD, PIP, UError,
have removed randomly between 5 and 20 votes of those users
correlation, constrained correlation, cosine and JMSD. Experiment
who have rated between 20 and 30 items. In this way, those users
1.1: MAE and coverage throughout the range of neighborhoods
who now result to rate between 2 and 20 items are regarded as
k 2 {100, . . . , 2000}, step 100. Experiment 1.2: precision and recall
cold-start users. We recover the removing votes of those users with
throughout the range of number of recommendations
greater than 20 votes despite of removing some their votes (in this
N 2 {2, . . . , 20}. In this experiment we make use of all the cold-
way, these users keep immutable in the database).
start users (no more than 20 votes) belonging to the users
With the aim of checking the correct operation of the cold-
validation set (Uv).
start similarity measure proposed in the paper, it is compared
with part of the traditional similarity measures most commonly
used in the field of CF: Pearson correlation, cosine and con- Experiment 2. Evolution of the results of MJD, PIP, UError, corre-
strained Pearson correlation; with the new metric JMSD [5] and lation, constrained correlation, cosine and JMSD throughout the
with current user cold-start metrics working just on the users’ range of votes cast by the cold-start users #Ru 2 {2, . . . , 20}. We
ratings matrix: PIP [22] and UError [16]. The quality measures use the fixed value of k-neighbors k = 700 and number of recom-
to which all of the metrics will be subjected are MAE, coverage, mendations N = 10.

Fig. 4. (a) MAE, (b) coverage, (c) precision and (d) recall obtained using: Movielens database. Individual experiments (x-axis): k 2 {100, . . . , 2000}, step 100, 20% of validation
users, leave-one-out cross validation applied to the items, precision and recall threshold h = 5, all the validation cold-start users: #Ru 2 {2, . . . , 20}.
J. Bobadilla et al. / Knowledge-Based Systems 26 (2012) 225–238 235

Fig. 5. (a) MAE, (b) coverage, (c) precision and (d) recall obtained using: Movielens database. Individual experiments (x-axis): #Ru 2 {2, . . . , 20}, 20% of validation users, leave-
one-out cross validation applied to the items, precision and recall threshold h = 5, k = 700, N = 10.

Fig. 6. (a) MAE, (b) coverage, (c) precision and (d) recall obtained using: Netflix database. Individual experiments (x-axis): k 2 {100, . . . , 2000}, step 100, 20% of validation
users, leave-one-out cross validation applied to the items, precision and recall threshold h = 5, all the validation cold-start users: #Ru 2 {2, . . . , 20}.

Table 11 summarizes the experiments performed (used dat- that the similarity measure designed (MJD) improves the predic-
abases, selected parameters values and figures where the results tion quality of the traditional similarity measures when they are
are shown). applied to cold-start users as well as it improves the new user
cold-start error-reflected (UError) metric. The PIP metric works
better only for a number of neighbors fewer than 500.
7. Results Fig. 4(b) displays the negative aspect of the similarity measure
proposed which had been highlighted in section 2 (motivation): by
Fig. 4 shows the results corresponding to Experiment 1 using selecting neighbors who have a similar number of votes to the ac-
Movielens. Graph 1a confirms the paper’s hypothesis in the sense tive user (using Jaccard), the coverage is weakened. As we can see
236 J. Bobadilla et al. / Knowledge-Based Systems 26 (2012) 225–238

Fig. 7. (a) MAE, (b) coverage, (c) precision and (d) recall obtained using: Netflix database. Individual experiments (x-axis): #Ru 2 {2, . . . , 20}, 20% of validation users, leave-one-
out cross validation applied to the items, precision and recall threshold h = 5, k = 700, N = 10.

in Graph 4b, the MJD coverage is worse than the obtained using the Fig. 5(c) shows us precision values obtained with MJD which, as
other metrics (except JMSD). Graphs 4a and 4b, together, enable in the MAE, are moderate for the extreme cold-start users and bet-
the administrator of the RS to make a decision based on the num- ter for the rest of the cold-start users. Fig. 4(d) shows recall positive
ber of neighbors (k) to be used depending on the desired balance margins of improvement similar to those obtained when dealing
between the quality and the variety of the predictions that we wish with precision measures; this indicates a good capacity of MJD to
to offer the cold-start users. reduce the number of false-negatives (not recommended relevant
The quality of the recommendations, measured with precision items). Both measures show an improvement in the proposed
quality measure (proportion of relevant recommendations as re- metric.
gards the total number of recommendations made), improves Fig. 6 shows the results corresponding to Experiment 1 using
using the similarity measure proposed (MJD) as regards traditional Netflix. As may be seen, these results confirm the conclusions de-
and cold-start similarity measures (Fig. 4(c)). The improvement is rived from the results obtained from the database Movilens. In
obtained through the whole range of neighbors, which means that Fig. 6(a) we can see how MJD provides much better results in rela-
the improvement achieved in the predictions is transferred to the tion to PIP when working with Netflix than when working with
recommendations (using k = 700). Movielens. As may be seen in Fig. 4(b) and Fig. 6(b), the coverage
By analyzing Graph 4d we can determine that the quality of the results obtained for Netflix are very similar to the ones obtained
recommendations measured with the recall quality measure (pro- for Movielens. Figs. 6(c) and (d) show outstanding results in the
portion of relevant recommendations as regards the total number recommendation quality measures for the database Netflix in anal-
of relevant items) improves, using MJD, for the entire number of ogous way as those obtained for the database Movielens (Figs. 4(c)
neighbors considered. and (d)).
Fig. 5 (Experiment 2) enables us to discover the quality of the Fig. 7 (Experiment 2) shows the excellent general behavior of
predictions and the recommendations made to the cold-start users the proposed metric for the database Netflix, in a similar way to
according to the number of items they have voted for. In Fig. 5(a) the results we obtained for Movielens.
we can see a generalized improvement in the accuracy obtained Besides, we have compared the time required to process for the
using the similarity measure proposed (MJD) as regards the tradi- proposed metric MJD and for the metric PIP. Since the calculation
tional and the cold-start ones. of our metric is very simple, it provides much faster recommenda-
As is to be expected, the cold-start users who have voted for tions. Using Movielens as recommender system, we have taken all
very few items (two or three items) generate greater prediction er- of the cold-start users and we have calculated their similarity with
rors. These cold-start users, which we could call extreme cold-start the rest of users in the database. We have repeated this experiment
users, do not present significant improvements in the MAE using 100 times and we have obtained that the process time of a cold-
MJD; the basic problem with these users is that, in their case, the start user for the MJD similarity metric is 9.11 ms, while for the
number of items with which the similarity measures can work is PIP similarity metric is 66.42 ms (which means a performance
so small that the improvement margin is practically zero. The rest improvement of 729%).
of the cold-start users considered present a reduction in the MAE
using MJD as regards the other similarity measures used. 8. Conclusions
Fig. 5(b) shows that MJD predicts worse than most metrics
(especially PIP); using MJD, the parameter that determines the The new user cold start issue represents a serious problem in RS
adjustment in the coverage is k (number of neighbors). as it can lead to loss of new users due to the lack of accuracy in
J. Bobadilla et al. / Knowledge-Based Systems 26 (2012) 225–238 237

their recommendations because of having not made enough votes Group of Items, Expert Systems with Applications, in press, doi:10.1016/
j.eswa.2011.07.005.
in the RS. For this reason, it is particularly important to design new
[10] T. Chen, L. He, Collaborative filtering based on demographic attribute vector,
similarity metrics which give greater precision to the results of- in: Proceedings of the International Conference on Future Computer and
fered to users who have cast few votes. Communication, 2009, pp. 225–229, doi:10.1109/FCC.2009.68.
The combination of Jaccard’s similarity measure, the arithmetic [11] C. Christakou, A. Stafylopatis, A hybrid movie recommender system based on
neural networks, in: International Conference on Intelligent Systems Design
average of the squared differences in votes and the values of the and Applications, 2005, pp. 500–505, doi:10.1109/ISDA.2005.9.
differences in the votes provide us with the basic elements with [12] H. Denis, Managing collaborative learning processes, e-learning applications,
which to design a metric that obtains good results in new user cold in: 29th International Conference on Information Technology Interfaces, 2007,
pp. 345–350.
start situations. These basic elements have been weighted via a lin- [13] L. Ding, D. Steil, B. Dixon, A. Parrish, D. Brown, A relation context oriented
ear combination for which the weights are obtained in a process of approach to identify strong ties in social networks, Knowledge-Based Systems
optimization based on neural learning. 24 (8) (2011) 1187–1195.
[14] L.Q. Gao, C. Li, Hybrid personalized recommended model based on genetic
The Jaccard similarity measure makes use of information based algorithm, in: International Conference on Wireless Communications,
on the distribution and on the number of votes cast by each pair of Networking and Mobile Computing, 2008, pp. 9215–9218.
users to be compared. Its use has been a determining factor in [15] C. Hayes, P. Cunningham, Context boosting collaborative recommendations,
Knowledge Based Systems 17 (2–4) (2004) 131–138.
achieving the quality of the results obtained, which confirms that [16] N.M. Heung, E.S. Abdulmotaleb, S.J. Geun, Collaborative error-reflected models
it is appropriate to combine this information with traditional infor- for cold-start recommender systems, Decision Support Systems, in press,
mation, based on numerical values of the votes, when we wish to doi:10.1016/j.dss.2011.02.015.
[17] J.L. Herlocker, J.A. Konstan, J.T. Riedl, L.G. Terveen, Evaluating collaborative
design a cold-start similarity measure.
filtering recommender systems, ACM Transactions on Information Systems 22
The proposed metric and a complete set of similarity measures (1) (2004) 5–53.
have been tested on the Netflix and Movielens databases. The pro- [18] F. Hernandez, E. Gaudioso, Evaluation of recommender systems: a new
posed cold-start similarity measure provides results that improve approach, Expert Systems with Applications (2007) 790–804, doi:10.1016/
j.eswa.2007.07.047.
the prediction quality measure MAE and the recommendation [19] Z. Huang, D. Zeng, H. Chen, A comparison of collaborative-filtering
quality measures precision & recall. The coverage is the only qual- recommendation algorithms for e-commerce, IEEE Intelligent Systems
ity measure that displays inferior result as it is evaluated with the (2007) 68–78.
[20] Y.P. Huang, W.P. Chuang, Y.H. Ke, F.E. Sandnes, Using back-propagation to
proposed measure; this is due to the fact that the Jaccard compo- learn association rules for service personalization, Expert Systems with
nent gives priority as neighbors to users with a similar number Applications 35 (2008) 245–253, doi:10.1016/j.eswa.2007.06.035.
of votes to the active user. [21] M.H. Hsu, Proposing an ESL recommender teaching and learning system,
Expert Systems with Applications 34 (3) (2008) 2102–2110.
The proposed similarity measure runs seven times faster than [22] J.A. Hyung, A new similarity measure for collaborative filtering to alleviate the
the PIP one, and it also improves the MAE, precision and recall new user cold-starting problem, Information Sciences 178 (2008) 37–51,
quality results. doi:10.1016/j.ins.2007.07.024.
[23] H. Jinghua, W. Kangning, F. Shaohong, A survey of e-commerce recommender
In RS, in general, it is feasible to use different metrics on differ- Systems, in: International Conference on Service Systems and Service
ent users, and, in particular, it is possible to use one similarity mea- Management, 2007, pp. 1–5, doi:10.1109/ICSSSM.2007.4280214.
sure with those users who have cast few votes and a different one [24] H.N. Kim, A.T. Ji, I. Ha, G.S. Jo, Collaborative filtering based on collaborative
tagging for enhancing the quality of recommendations, Electronic Commerce
on the rest of the users, which enables an improvement in the new
Research and Applications 9 (1) (2010) 73–83, doi:10.1016/
users’ recommendations without affecting the correct global oper- j.elerap.2009.08.004.
ation of the RS. [25] J.A. Konstan, B.N. Miller, J. Riedl, PocketLens: toward a personal recommender
system, ACM Transactions on Information Systems 22 (3) (2004) 437–476.
[26] B. Krulwich, Lifestyle finder: intelligent user profiling using large-scale
demographic data, Artificial Intelligence Magazine 18 (2) (1997) 37–45.
Acknowledgement [27] X.N. Lam, T. Vu, T.D. Le, A.D. Duong, Addressing cold-start problem in
recommendation systems, in; Conference On Ubiquitous Information
Our acknowledgement to the FilmAffinity.com & Netflix compa- Management And Communication, 2008, pp. 208–211, doi:10.1145/
1352793.1352837.
nies, to the Movielens group and to the Elsevier Knowledge Based
[28] K. Lang, NewsWeeder: learning to filter netnews, in: Proceedings of the 12th
Systems journal. International Conference on Machine Learning, 1995, pp. 331–339.
[29] M. Lee, Y. Woo, A hybrid recommender system combining collaborative
filtering with neural network, Lecture Notes on Computer Sciences 2347
References (2002) 531–534.
[30] C.W. Leung, S.C. Chan, F.L. Chung, An empirical study of a cross-level
association rule mining approach to cold-start recommendations, Knowledge
[1] E. Adomavicius, A. Tuzhilin, Toward the next generation of recommender
Based Systems 21 (7) (2008) 515–529. Autumn.
systems: a survey of the state-of-the-art and possible extensions, IEEE
[31] Q. Li, S.H. Myaeng, B.M. Kim, A probabilistic music recommender considering
Transactions on Knowledge and Data Engineering 17 (6) (2005) 734–749.
user opinions and audio features, Information Processing & Management 43
[2] N. Antonopoulus, J. Salter, CinemaScreen recommender agent: combining
(2) (2007) 473–487.
collaborative and content-based filtering, IEEE Intelligent Systems (2006) 35–
[32] S.G. Li, L. Shi, L. Wang, The agile improvement of MMORPGs based on the
41.
enhanced chaotic neural network, Knowledge Based Systems 24 (5) (2011)
[3] J.S. Breese, D. Heckerman, C. Kadie, Empirical analysis of predictive algorithms
642–651.
for collaborative filtering, in: Proceedings of the 14th Conference on
[33] S. Loh, F. Lorenzi, R. Granada, D. Lichtnow, L.K. Wives, J.P. Oliveira, Identifying
Uncertainty in Artificial Intelligence, Morgan Kaufmann, 1998, pp. 43–52.
similar users by their scientific publications to reduce cold start in
[4] J. Bobadilla, F. Serradilla, A. Hernando, Collaborative filtering adapted to
recommender systems, in: Proceedings of the 5th International Conference
recommender systems of e-learning, Knowledge Based Systems 22 (2009)
on Web Information Systems and Technologies (WEBIST2009), 2009, pp. 593–
261–265, doi:10.1016/j.knosys.2009.01.008.
600.
[5] J. Bobadilla, F. Serradilla, J. Bernal, A new collaborative filtering metric that
[34] L. Martinez, L.G. Perez, M.J. Barranco, Incomplete preference relations to
improves the behavior of recommender Systems, Knowledge Based Systems 23
smooth out the cold-start in collaborative recommender systems, in:
(6) (2010) 520–528, doi:10.1016/j.knosys.2010.03.009.
Proceedings of the 28th North American Fuzzy Information Processing
[6] J. Bobadilla, A. Hernando, F. Ortega, J. Bernal, A framework for collaborative
Society Annual Conference (NAFIPS2009), 2009, pp. 1–6, doi:10.1109/
filtering recommender systems. Expert Systems with Applications, in press,
NAFIPS.2009.5156454.
doi:10.1016/j.eswa.2011.05.021.
[35] A. Nocera, D. Ursino, An Approach to Providing a User of a ‘‘Social Folksonomy’’
[7] J. Bobadilla, F. Ortega, A. Hernando, A Collaborative Filtering Similarity
with Recommendations of Similar Users and Potentially Interesting Resources,
Measure Based on Singularities, Inf. Proc. and Manag., in press, doi:10.1016/
Knowledge Based Systems 24 (8) (2011) 1277–1296.
j.ipm.2011.03.007.
[36] M.P. O’Mahony, B. Smyth, A classification-based review recommender,
[8] J. Bobadilla, F. Ortega, A. Hernando, J. Alcalá, Improving collaborative filtering
Knowledge Based Systems 23 (4) (2010) 323–329.
recommender systems results and performance using genetic algorithms,
[37] S.T. Park, D.M. Pennock, O. Madani, N. Good, D. Coste, Naı¨ve filterbots for
Knowledge-Based Systems 24 (8) (2011) 1310–1316.
robust cold-start recommendations, in: Proceedings of Knowledge Discovery
[9] J. Bobadilla, F. Ortega, A. Hernando, J. Bernal, Generalization of Recommender
and Data Mining (KDD2006), 2006, pp. 699–705.
Systems: Collaborative Filtering Extended to Group of Users and Restricted to
238 J. Bobadilla et al. / Knowledge-Based Systems 26 (2012) 225–238

[38] Y.J. Park, A. Tuzhilin, The long tail of recommender systems and how to [48] J.B. Schafer, D. Frankowski, J. Herlocker, S. Sen, Collaborative filtering
leverage it, in: ACM Conference on Recommender Systems, 2008, pp. 11–18, recommender systems, The Adaptive Web, LNCS 4321 (2007) 291–324.
doi:10.1145/1454008.1454012. [49] A.I. Schein, A. Popescul, L.H. Ungar, D.M. Pennock, Methods and metrics for
[39] S.T. Park, W. Chu, Pairwise Preference Regression for Cold-start cold-start recommendations, SIGIR (2002) 253–260.
Recommendation, in: ACM Conference on Recommender Systems, 2009, pp. [50] M. Saranya, T. Atsuhiro, Hybrid recommender systems using latent features,
21–28, doi:10.1145/1639714.1639720. in: Proceedings of the International Conference on Advanced Information
[40] C. Porcel, E. Herrera-Viedma, Dealing with incomplete information in a fuzzy Networking and Applications Workshops, 2009, pp. 661–666, doi:10.1109/
linguistic recommender system to disseminate information in university WAINA.2009.122.
digital libraries, Knowledge Based Systems 23 (1) (2010) 32–39. [51] P. Symeonidis, A. Nanopoulos, Y. Manolopoulos, Providing justifications in
[41] P. Pu, L. Chen, Trust-inspiring explanation interfaces for recommender recommender systems, IEEE Transactions on Systems, Man and Cybernetics 38
systems, Knowledge Based Systems 20 (6) (2007) 542–556. (6) (2008) 1262–1272.
[42] A.M. Rashid, I. Albert, D. Cosley, S.K. Lam, S.M. McNee, J.A. Konstan, J. Riedl, [52] H.F. Wang, Ch.T. Wu, A strategy-oriented operation module for recommender
Getting to know you: learning new users preferences in recommender Systems in e-commerce, E-commerce Computers & Operations Research, in
systems, in: International Conference on Intelligent Users Interfaces press, doi:10.1016/j.cor.2010.03.011.
(IUI2002), 2002, pp. 127–134. [53] L.T. Weng, Y. Xu, Y. Li, R. Nayak, Exploiting item taxonomy for solving cold-
[43] A.M. Rashid, G. Karypis, J. Riedl, Learning preferences of new users in start problem in recommendation making, in: Proceedings of the 20th IEEE
recommender systems: an information theoretic approach, Knowledge International Conference on Tools with Artificial Intelligence (ICTAI2008),
Discovery and Data Mining (KDD2008) 10 (2) (2008) 90–100. Dayton, USA, 2008, pp. 113–120.
[44] L. Ren, L. He, J. Gu, W. Xia, F. Wu, A hybrid recommender approach based on [54] B. Widrow, M.E. Hoff, Adaptive switching circuits, New York. Convention
Widrow–Hoff learning, in: International Conference on Future Generation Record, IRE WESCON, 1960, pp. 96–104.
Communication and Networking, 2008, pp. 40–45, doi:10.1109/FGCN.2008.48. [55] J.M. Yang, K.F. Li, Recommendation based on rational inferences in
[45] T.H. Roh, K.J. Oh, I. Han, The collaborative filtering recommendation based on collaborative filtering, Knowledge Based Systems 22 (1) (2009) 105–114.
SOM cluster-indexing CBR, Expert Systems with Applications 25 (2003) 413– [56] W. Yuan, D. Guan, Y.K. Lee, S. Lee, S.J. Hur, Improved trust-aware recommender
423, doi:10.1016/SO957-4174(03)00067-8. system using small-worldness of trust Networks, Knowledge-Based Systems
[46] P.B. Ryan, D. Bridge, Collaborative recommending using formal concept 23 (3) (2010) 232–238.
analysis, Knowledge Based Systems 19 (5) (2006) 309–315. [57] M.L. Yung, P.K. Chien, TREPPS: a trust-based recommender system for peer
[47] J. Salter, N. Antonopoulus, CinemaScreen recommender agent: combining production services, Expert Systems with Applications 36 (2) (2009) 3263–
collaborative and content-based filtering, IEEE Intelligent Systems 21 (2006) 3277.
35–41.

Vous aimerez peut-être aussi