479 vues

Transféré par clickstone

- PredictingMergerTargetsAndAcquirers Preview (4)
- Soyer & Hogarth_2012
- CSF 469 L20 L22 Recommender Systems SVD
- 08a Netflix Prize
- Intelligent Shopping Recommender using Data Mining
- Reviewing Cluster Based Collaborative Filtering Approaches
- e Views Programming
- berendt_2010_06_23
- a
- SSRN-id1996568
- Recommendation Generation by Integrating Sequential
- YourPrivacyProtector a Recommender System for Privacy Settings in Social Networks
- 4 Lerman
- 1989_poorter_physiolplant
- Discovering The Impact Of Knowledge In Recommender Systems: A Comparative Study
- practical_problems_in_slr.doc
- EG_Slides_6
- Regressi On
- Research Workshop
- Long-Term Monitoring and Identiﬁcation of Bridge Structural Parameters.pdf

Vous êtes sur la page 1sur 4

collaborative filtering

Arkadiusz Paterek

Institute of Informatics, Warsaw University

ul. Banacha 2, 02-097 Warsaw, Poland

paterek@mimuw.edu.pl

ABSTRACT 1. INTRODUCTION

A key part of a recommender system is a collaborative filter- Recommender systems are very important for e-commerce.

ing algorithm predicting users’ preferences for items. In this If a company offers many products to many clients, it can

paper we describe different efficient collaborative filtering benefit substantially from presenting personalized recom-

techniques and a framework for combining them to obtain a mendations. For example, Greg Linden, developer of Ama-

good prediction. zon’s recommendation engine [6], reported that in 2002 over

The methods described in this paper are the most im- 20% of Amazon’s sales resulted from personalized recom-

portant parts of a solution predicting users’ preferences for mendations. There exist many commercial applications of

movies with error rate 7.04% better on the Netflix Prize recommender systems for products like books, movies, music

dataset than the reference algorithm Netflix Cinematch. and others. Also many applications not directly commercial

The set of predictors used includes algorithms suggested have emerged: personalized recommendations for websites,

by Netflix Prize contestants: regularized singular value de- jokes [4], Wikipedia articles, etc.

composition of data with missing values, K-means, postpro- A difficult part of building a recommender system is, know-

cessing SVD with KNN. We propose extending the set of ing preferences of users for some items, to accurately predict

predictors with the following methods: addition of biases to which other items they will like. This task is called collab-

the regularized SVD, postprocessing SVD with kernel ridge orative filtering. Most approaches to this task, described so

regression, using a separate linear model for each movie, far in the literature, are variations of K-nearest neighbors

and using methods similar to the regularized SVD, but with (like TiVo [1]) or singular value decomposition (like Eigen-

fewer parameters. Taste [4]). Another approach is using graphical models [7,

All predictors and selected 2-way interactions between 8]. Articles [3, 7] are examples of comparisons of different

them are combined using linear regression on a holdout set. collaborative filtering techniques.

In October 2006, the contest Netflix Prize was announced.

The goal of the contest is to produce a good prediction of

users’ preferences for movies. Netflix released a database of

Categories and Subject Descriptors over 100 million movie ratings made by 480189 users. The

I.2.6 [Artificial Intelligence]: Learning; contest ends when someone submits a solution with pre-

H.3.3 [Information storage and retrieval]: Information diction error RMSE (root mean squared error) 10% better

search and retrieval—Information filtering than the Netflix Cinematch algorithm. For introduction to

the Netflix Prize competition and description of Netflix Cin-

ematch we direct the reader to the article [2].

General Terms This paper describes various collaborative filtering algo-

rithms that work well on the Netflix Prize dataset. Using

Algorithms, Experimentation, Performance approach of combining results of many methods with lin-

ear regression we obtained 7.04% better RMSE than Netflix

Cinematch on the Netflix Prize competition evaluation set.

Keywords In section 2 we describe our framework for combining pre-

dictions with linear regression. Most effective predictors

prediction, collaborative filtering, recommender systems, Net- from our ensemble are described in section 3, including ap-

flix Prize proaches proposed by Netflix Prize contestants: regularized

SVD of data with missing values, K-means, postprocessing

results of regularized SVD with K-NN. In that section we de-

scribe also new (to our knowledge) approaches: regularized

Permission to make digital or hard copies of all or part of this work for SVD with biases, postprocessing results of SVD with kernel

personal or classroom use is granted without fee provided that copies are ridge regression, building a separate linear model for each

not made or distributed for profit or commercial advantage and that copies

movie, and using two methods inspired by regularized SVD,

bear this notice and the full citation on the first page. To copy otherwise, to

republish, to post on servers or to redistribute to lists, requires prior specific but with lower number of parameters. In section 4 experi-

permission and/or a fee. mental results are presented, which show that combining the

KDDCup.07 August 12, 2007, San Jose, California, USA proposed predictors leads to a significantly better prediction

Copyright 2007 ACM 978-1-59593-834-3/07/0008 ...$5.00.

39

than using pure regularized SVD. In section 5 we summarize 3. PREDICTORS

our experiments and discuss possible further improvements.

3.1 Simple predictors

In this section we describe six predictors which are used by

2. COMBINING PREDICTORS methods from subsections 3.2 (RSVD) and 3.5 (SVD KNN)

and also in all experiments described in section 4.

In this section we describe how in the proposed solution

For the given movie j rated by user i, first five predictors

the training and the test set are chosen and how different

are empirical probabilities of each rating 1 − 5 for user i.

prediction methods are combined with linear regression.

The sixth predictor is the mean rating of movie j, after

The Netflix Prize data consists of three files:

subtracting the mean rating of each member.

We will refer to that set of six simple predictors as ”BA-

• training.txt contains R = 100, 480, 507 ratings on a SIC”.

scale 1 to 5, for M = 17, 770 movies, made by N =

480, 189 customers, 3.2 Regularized SVD

Regularized SVD, a technique inspired by effective meth-

ods from the domain of natural language processing [5], was

• probe.txt contains 1, 408, 395 user-movie pairs, for which proposed for collaborative filtering by Simon Funk (Brandyn

the ratings are provided in training.txt, Webb) [9]. Simon Funk’s description [9] includes proposition

of learning rate and regularization constants, and a method

of clipping predictions.

• qualifying.txt contains 2, 817, 131 user-movie pairs, for In the regularized SVD predictions for user i and movie j

which we do not know the ratings, but RMSE of a are made in the following way:

prediction is computed by the Netflix Prize evaluation

system. We can assume probe.txt and qualifying.txt ŷij = uT

i vj (1)

come from the same population of newest ratings.

where ui and vj are K-dimensional vectors of parameters.

The layer of k-th parameters of all vectors ui , vj is called

Summarizing, the user-item matrix for this data has N ∗ the k-th feature.

M = 8, 532, 958, 530 elements – c.a. 98.9% values are miss- Parameters are estimated by minimizing the sum of squared

ing. residuals, one feature at a time, using gradient descent with

The dataset besides ratings contains other information, regularization and early stopping. Before training, from

like the dates of the ratings, but we do not use for prediction each rating a simple baseline prediction is subtracted – com-

any information besides the above mentioned rating data. bination of six predictors described in section 2.1, with weights

Our framework for combining predictions is simple: we chosen with linear regression.

draw random 1.5% − 15% of probe.txt as a test set (hold-

out set). Our training set contains the remaining ratings rij = yij − ŷij

from training.txt. We train all algorithms on the training

set (some methods also occasionally observe test set error uik += lrate ∗ (rij vjk − λuik )

to make a decision when to stop optimization of weights).

Then the predictions made by each algorithm for the test set vjk += lrate ∗ (rij uik − λvjk )

are combined with linear regression on the test set. Adding

to the regression selected two-way interactions between pre- , where yij is the rating given by user i for movie j.

dictors gives a small improvement. We stop training the feature when the error rate on the

There is also a possibility of using data without ratings test set increases. After learning of each feature, the predic-

(qualifying.txt), which carry some information. The article tions are clipped to < 1, 5 > range.

[8] suggests that using this additional data can significantly Parameters proposed by Simon Funk are difficult to im-

improve prediction in the Netflix Prize task. prove, so we leave them unchanged: lrate = .001, λ = .02.

Because linear regression is made on a small set, the weights We choose the number of features K = 96.

obtained are inaccurate. Also, using the test set for lin- We will refer to this method as ”RSVD”.

ear regression, feature selection and other purposes causes

small overfitting. We can improve prediction using a cross- 3.3 Improved regularized SVD

validation-like method: draw randomly a part of probe.txt We add biases to the regularized SVD model, one param-

as the test set, repeat the training and linear regression, eter ci for each user and one dj for each movie:

do this a few times and average the results. However, each

repetition means running again all algorithms on a massive ŷij = ci + dj + uT

i vj (2)

dataset. Because training each one of our algorithms takes

Weights ci , dj are trained simultaneously with uik and vjk .

much time (0.5-20h), we did not perform cross-validation in

experiments described in section 4. ci += lrate ∗ (rij − λ2 (ci + dj − global mean))

The 7.04% submission to the Netflix Prize is a result of

partial cross-validation. We ran part of our methods for dj += lrate ∗ (rij − λ2 (ci + dj − global mean))

the second time on a different test set and confirmed an

improvement after merging results of two linear regressions. Values of parameters: lrate = .001, λ2 = .05, global mean =

In the next sections we describe the most effective predic- 3.6033.

tors from our ensemble. We will refer to this method as ”RSVD2”.

40

3.4 K-means possibly much higher dimensional space, implicitly defined

K-means and K-medians were proposed for collaborative by kernel K. Predictions in this method are made in the

filtering in [7]. following way:

Before applying K-means we subtract from each rating ŷi = K(xT −1

i , X)(K(X, X) + λI) y (9)

the user’s mean rating. K-means algorithm is used to di-

vide users into K clusters Ck , minimizing the intra-cluster We can look for a kernel (defining similarity or distance

variance. between observations) which will result in better prediction

XK X than K(xT T T

i , xj ) = xi xj . We obtained good results with

||yi − µk ||2 (3) Gaussian kernel K(xi , xT T T

j ) = exp(2(xi xj − 1)) and the pa-

k=1 i∈Ck rameter λ = .5.

, where For each user we perform kernel ridge regression. Obser-

v

X vations are ||vjj || , for the first at most 500 most frequently

||yi − µk ||2 = (yij − µkj )2 (4)

j∈Ji

rated movies, rated by the user. We name the method with

Gaussian kernel ”SVD KRR”.

, where Ji is the set of movies rated by user i.

For each user belonging to cluster Ck the prediction for 3.7 Linear model for each item

movie j is µkj . For a given item (movie) j we are building a weighted

Our predictor is mean prediction of ensemble of 10 runs linear model, using as predictors, for each user i, a binary

of K-means with K ranging from 4 to 24. vector indicating which movies the user rated.

We will refer to this method as ”KMEANS”. X

ŷij = mj + ei ∗ wj2 (10)

3.5 Postprocessing SVD with KNN j2 ∈Ji

The following prediction method was proposed by an anony-

mous Netflix Prize contestant. where Ji is the set of movies rated by user i, constant mj is

Let’s define similarity between movies j and j2 as cosine the mean rating of movie j, and constant weights ei = (|Ji |+

similarity between vectors vj and vj2 obtained from regular- 1)−1/2 . The model parameters are learned using gradient

ized SVD: descent with early stopping.

vjT vj2 We name this method ”LM”.

s(vj , vj2 ) = (5)

||vj ||||vj2 || 3.8 Decreasing the number of parameters

Now we can use k-nearest neighbor prediction using similar- The regularized SVD model has O(N K + M K) param-

ity s. eters, where N is the number of users, M is the number

We use prediction by one nearest neighbor using similarity of movies, K is the number of features. One idea to de-

s and refer to this method as ”SVD KNN”. crease the number of parameters is, instead of fitting ui

We also obtained good quality clustering of items, using for each user separately, to model ui as a function of a bi-

single linkage hierarchical clustering with the similarity s. nary vector indicating which movies the user rated. For

Though it was not useful for improving prediction, we men- P

example uik ≈ ei j∈Ji wjk , where Ji is the set of movies

tion it, because clustering of items can be useful in applica- rated by user i (possibly including movies for which we do

tions in recommender systems, for example to avoid filling not know ratings, e.g. qualifying.txt) and constant weights

recommendation slots with very similar items.

ei = (|Ji | + 1)−1/2 , like in the previous section. This model

3.6 Postprocessing SVD with kernel ridge re- has O(M K) parameters.

gression K

X X

One idea to improve SVD is to discard all weights uik ŷij = ci + dj + ei vjk wj2 k (11)

after training and try to predict yij for each user i using vjk k=1 j2 ∈Ji

as predictors, for example using ridge regression. where Ji is the set of movies rated by user i.

Let’s redefine y in this section as a vector: i-th row of The second proposed model is following:

matrix y, with missing values omitted (now y is vector of

movies rated by user i). Let X be a matrix of observations - K

X X

each row of X is normalized vector of features of one movie ŷij = ci + dj + vjk vj2 k (12)

v

j rated by user i: xj2 = ||vjj || . For efficiency reasons we limit k=1 j2 ∈Ji

the number of observations to 500. If a user rated more than Parameters vjk and wjk are merged and there are no con-

500 movies, we use only 500 most frequently rated movies. stant weights ei .

We can predict y using ridge regression: In both models parameters are learned using gradient de-

β̂ = (X T X + λI)−1 X T y (6) scent with regularization and early stopping, similarly to the

regularized SVD.

yˆi = xT We name the first method ”NSVD1” and the second ”NSVD2”.

i β̂ (7)

Equivalent dual formulation involving Gram matrix XX T :

4. EXPERIMENTAL RESULTS

β̂ = X T (XX T + λI)−1 y (8)

Table 1 summarizes the results of experiments with meth-

By changing Gram matrix to a chosen positive definite ods described in the previous sections.

matrix K(X, X) we obtain the method of kernel ridge re- Combining results of RSVD2 method with six simple pre-

gression. It is equivalent to performing ridge regression in a dictors called BASIC gives RMSE .9039 on the test set and

41

Test RMSE Test RMSE Cumulative 7. REFERENCES

Predictor with BASIC with BASIC test RMSE

[1] K. Ali and W. van Stam. Tivo: making show

and RSVD2

recommendations using a distributed collaborative

BASIC .9826 .9039 .9826 filtering architecture. In W. Kim, R. Kohavi, J. Gehrke,

RSVD .9094 .9018 .9094 and W. DuMouchel, editors, KDD, pages 394–401.

RSVD2 .9039 .9039 .9018 ACM, 2004.

KMEANS .9410 .9029 .9010

[2] J. Bennett and S. Lanning. The Netflix Prize.

SVD KNN .9525 .9013 .8988

Proceedings of KDD Cup and Workshop, 2007.

SVD KRR .9006 .8959 .8933

LM .9506 .8995 .8902 [3] J. S. Breese, D. Heckerman, and C. M. Kadie.

NSVD1 .9312 .8986 .8887 Empirical analysis of predictive algorithms for

NSVD2 .9590 .9032 .8879 collaborative filtering. In G. F. Cooper and S. Moral,

editors, UAI, pages 43–52. Morgan Kaufmann, 1998.

SVD KRR

* NSVD1 — — .8879 [4] K. Y. Goldberg, T. Roeder, D. Gupta, and C. Perkins.

SVD KRR Eigentaste: A constant time collaborative filtering

* NSVD2 — — .8877 algorithm. Inf. Retr., 4(2):133–151, 2001.

[5] G. Gorrell and B. Webb. Generalized hebbian

Table 1: Linear regression results - RMSE on the algorithm for incremental latent semantic analysis.

test set Proceedings of Interspeech, 2006.

[6] G. Linden, B. Smith, and J. York. Amazon.com

recommendations: Item-to-item collaborative filtering.

.9070 (4.67% improvement over Netflix Cinematch) on quali- IEEE Internet Computing, 7(1):76–80, 2003.

fying.txt, as reported by the Netflix Prize evaluation system. [7] B. Marlin. Collaborative filtering: a machine learning

Linear regression with all predictors from the table gives perspective. M.Sc. thesis, 2004.

RMSE .8877 on the test set and .8911 (6.34% improvement) [8] R. Salakhutdinov, A. Mnih, and G. Hinton. Restricted

on qualifying.txt. Boltzmann Machines for collaborative filtering.

The predictors described in this paper are parts of a so- Proceedings of the 24th International Conference on

lution which scores .8844 on the qualifying dataset – that is Machine Learning, 2007.

7.04% improvement over Netflix Cinematch. The solution [9] B. Webb. Netflix update: Try this at home.

submitted to the Netflix Prize is the result of merging in http://sifter.org/∼simon/journal/20061211.html, 2006.

proportion 85/15 two linear regressions trained on different

training-test partitions: one linear regression with 56 pre-

dictors (most of them are different variations of regularized

SVD and postprocessing with KNN) and 63 two-way inter-

actions, and the second one with 16 predictors (subset of the

predictors from the first regression) and 5 two-way interac-

tions. In the first regression the test set is random 15% of

probe.txt, and in the second – 1.5% of probe.txt.

All experiments were done on a PC with 2GHz proces-

sor and 1.2GB RAM. Running times varied from 45min for

SVD KNN to around 20h for RSVD2.

5. SUMMARY

We described a framework for combining predictions and

described methods that combined together give a good pre-

diction for the Netflix Prize dataset.

Possible further improvements of the solution presented:

ter 2 – repeat calculations on different training-test

partitions and merge the results.

candidates are methods already applied with success

to collaborative filtering: Restricted Boltzmann Ma-

chines [8] and other graphical models [7].

6. ACKNOWLEDGMENTS

Thanks to Netflix for releasing their data and organization

of the Netflix Prize. Thanks to Simon Funk for sharing his

approach of using regularized singular value decomposition.

Also, I would like to thank Piotr Pokarowski for the course

Statistics II.

42

- PredictingMergerTargetsAndAcquirers Preview (4)Transféré parYin Shen Goh
- Soyer & Hogarth_2012Transféré parmavven
- CSF 469 L20 L22 Recommender Systems SVDTransféré parKriti Goyal
- 08a Netflix PrizeTransféré parWang Chen Yu
- Intelligent Shopping Recommender using Data MiningTransféré parIRJET Journal
- Reviewing Cluster Based Collaborative Filtering ApproachesTransféré parATS
- e Views ProgrammingTransféré parHerman Sulistiyo
- berendt_2010_06_23Transféré parOnixSantos
- aTransféré parRakesh Tripathi
- SSRN-id1996568Transféré parArop Ndras
- Recommendation Generation by Integrating SequentialTransféré parInternational Journal of Research in Engineering and Technology
- YourPrivacyProtector a Recommender System for Privacy Settings in Social NetworksTransféré parijsptm
- 4 LermanTransféré parzzztimbo
- 1989_poorter_physiolplantTransféré parubuntu 13.04
- Discovering The Impact Of Knowledge In Recommender Systems: A Comparative StudyTransféré parijcses
- practical_problems_in_slr.docTransféré parakash
- EG_Slides_6Transféré parRe Za
- Regressi OnTransféré parfansuri80
- Research WorkshopTransféré parAziz Malik
- Long-Term Monitoring and Identiﬁcation of Bridge Structural Parameters.pdfTransféré parbaba97k
- Chap 4 Probability&Statics ECTransféré parMidhun Babu
- p118 RogersTransféré parThumbCakes
- 1703.00800Transféré parMiguelIglesias
- 3. Contoh Multiple Linear RegressionTransféré parMohamad Ridwan
- Review_of_wta-wtp StudiesTransféré parchuky5
- Active Passive Appreciation ArticleTransféré paraabbot
- Univariate RegressionTransféré parSree Nivas
- SPMA22Transféré parAnonymous 1aqlkZ
- pr akbi bab 10.docxTransféré parAgung Rizky Aji Prayogo
- TAG ME: An Accurate Name Tagging System for Web Facial Images using Search-Based Face AnnotationTransféré parEditor IJRITCC

- CHIEN-PING LU.pdfTransféré parDouglas Winston
- Improved Watermarking Scheme Using Discrete Cosine Transform and Schur DecompositionTransféré parijcsn
- A Singular Spectrum Analysis Technique to Electricity Consumption ForecastingTransféré parAnonymous 7VPPkWS8O
- Capacity Evaluation of a High Altitude Platform Diversity System Equipped with Compact MIMO AntennasTransféré parIDES
- MATLAB Documentation - MathWorks IndiaTransféré parDinesh Blaze
- SVD.pdfTransféré parAkshat Rastogi
- User Guide for TFocsTransféré parFarhan Bin Khalid
- Linear Algebra in LabVIEWTransféré parOnur Özgelen
- Cosmo LearningTransféré parHemant Mohite
- 2017 - Word Translation Without Parallel DataTransféré paryounge994
- opencv2refmanTransféré parsndupont
- bok%3A978-1-4471-2879-3Transféré parLê Hữu Nam
- Mathcad PrimerTransféré parajroc
- 9780511526480_9780521851558Transféré parchaitanya518
- Document clusteringTransféré parSravan Kumar Kondaparthy
- GUPTA-SHARMA-THAKUR-Optimization Criteria for Optimal Placement of Piezoelectric Sensors and Actuators on a Smart Structure- A Technical ReviewTransféré parJames Brown
- MIMO Channel CapacityTransféré parAlexnx
- psTransféré parCristina San Román
- ROBUST VISUAL TRACKING BASED ON SPARSE PCA-L1Transféré parCS & IT
- 214.9 (Wishart)Transféré parDavidArechaga
- Greenacre c08 Ok 2010Transféré parÁngel Quintero Sánchez
- scherpen2005Transféré partytydz
- Antoulas SurveyTransféré parmiggyiv
- tutorial_stat890-1.pdfTransféré parRafael Fernandes Siqueira
- [Krull_I.S.,_(Ed.)_(2012)]_Analytical_Chemistry_-_(b-ok.xyz).pdfTransféré parUlima Inayah
- Inversion algorithmsTransféré parSrinivasaRaoPeddinti
- Modified PCA based Image Fusion and its Quality MeasureTransféré parJournal of Computing
- 904 Complete Fall09Transféré parNeneng Alif
- Introduction to MatlabTransféré parXxbugmenotxX
- Applications of Linear AlgebraTransféré parSolitary ChOcoholic