Vous êtes sur la page 1sur 13

# Method 3: Matrix factorization

Previous section:
1. Collaborative filtering and Co-occurrence matrix
Used for product recommendations - Amazon
2. Limitations of collaborative filtering
No context, does not use features, etc.

Matrix factorization
Movie recommendation by NETFLIX
Use features of Use interactions of
users and items users and items
(Classification model) (Collaborative filtering)

## www.subhrajitroy.com | facebook.com/sroy.subhrajitroy | @sroy_subhrajit

Movie recommendation
User Movie Rating Each user watches only
a few of the movies

## www.subhrajitroy.com | facebook.com/sroy.subhrajitroy | @sroy_subhrajit

Movie recommendation
Rating given by user 𝑢𝑢 for movie 𝑣𝑣
𝑢𝑢
Known for white cells
? ?

Users
Rating (𝑢𝑢, 𝑣𝑣)
Rating = ? Unknown for blue cells

? ? ? ?
Movies 𝑣𝑣

## www.subhrajitroy.com | facebook.com/sroy.subhrajitroy | @sroy_subhrajit

Recommendations from known features
Movie recommendation and Ratings matrix

Describe movie 𝑣𝑣 by vector 𝑅𝑅𝑣𝑣 How much does the movie vector (𝑅𝑅𝑣𝑣)
and user vector (𝐿𝐿𝑢𝑢)agree?
How much is it action, romance, drama, ….

## 𝑅𝑅𝑣𝑣 = [ 0.2, 0.8, 1.3, … … ]

Describe user 𝑢𝑢 by vector 𝐿𝐿𝑢𝑢

## How much he/she likes action, romance, drama, ….

𝐿𝐿𝑢𝑢 = [ 0.7, 0, 2.1, … … ]
� (𝑢𝑢, 𝑣𝑣) by using 𝑅𝑅𝑣𝑣 and 𝐿𝐿𝑢𝑢
Find 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅

## www.subhrajitroy.com | facebook.com/sroy.subhrajitroy | @sroy_subhrajit

Recommendations from known features
𝑅𝑅𝑣𝑣 = [ 0.2,0.8, 1.3, … … ]
⨯ ⨯ ⨯
𝐿𝐿𝑢𝑢 = [ 0.7, 0, 2.1, … … ] For user u

## 𝑅𝑅𝑣𝑣 = [ 0.2, 0.8, 1.3,… … ]

⨯ ⨯ ⨯
𝐿𝐿𝑢𝑢′ = [ 2.9, 0.01,0.02,… … ] For user u’

## Recommendations: Sort movies the user

� (𝒖𝒖, 𝒗𝒗)
hasn’t watched by 𝑹𝑹𝑹𝑹𝑹𝑹𝑹𝑹𝑹𝑹𝑹𝑹

## www.subhrajitroy.com | facebook.com/sroy.subhrajitroy | @sroy_subhrajit

Predictions in the matrix form
𝑣𝑣
𝑢𝑢 � (𝑢𝑢, 𝑣𝑣) = < 𝐿𝐿𝐿𝐿, 𝑅𝑅𝑅𝑅 >
𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅

Users
� =
𝑹𝑹𝑹𝑹𝑹𝑹𝑹𝑹𝑹𝑹𝑹𝑹

Movies
𝑅𝑅𝑣𝑣

𝐿𝐿𝑢𝑢 𝑹𝑹
≈ 𝑳𝑳
𝑅𝑅𝑣𝑣 = [action, romance, drama, ….]
𝐿𝐿𝑢𝑢 = [action, romance, drama, ….]
www.subhrajitroy.com | facebook.com/sroy.subhrajitroy | @sroy_subhrajit
Predictions in the matrix form
𝑣𝑣
𝑢𝑢 � (𝑢𝑢, 𝑣𝑣) = < 𝐿𝐿𝐿𝐿, 𝑅𝑅𝑅𝑅 >
𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅
� =
𝑹𝑹𝑹𝑹𝑹𝑹𝑹𝑹𝑹𝑹𝑹𝑹

𝑅𝑅𝑣𝑣

𝐿𝐿𝑢𝑢 𝑹𝑹
≈ 𝑳𝑳

## 𝑅𝑅𝑣𝑣 = [action, romance, drama, ….]

𝐿𝐿𝑢𝑢 = [action, romance, drama, ….]

## But you don’t know topics of users and movies

www.subhrajitroy.com | facebook.com/sroy.subhrajitroy | @sroy_subhrajit
Matrix Factorization: Discovering topics from data
White squares = Data

Users
Rating =

Movies

𝑹𝑹
≈ 𝑳𝑳

## Parameters of the model

www.subhrajitroy.com | facebook.com/sroy.subhrajitroy | @sroy_subhrajit
Matrix Factorization: Discovering topics from data
Residual sum of squares (RSS)
2
𝑅𝑅𝑅𝑅𝑅𝑅 𝐿𝐿𝑢𝑢, 𝑅𝑅𝑅𝑅 = ( 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅(𝑢𝑢, 𝑣𝑣) − < 𝐿𝐿𝐿𝐿, 𝑅𝑅𝑅𝑅 >)
2
𝑅𝑅𝑅𝑅𝑅𝑅 𝐿𝐿, 𝑅𝑅 = ∑ ( 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅(𝑢𝑢, 𝑣𝑣) − < 𝐿𝐿𝐿𝐿, 𝑅𝑅𝑅𝑅 >) For all 𝑢𝑢 and 𝑣𝑣

## White squares = Data

Users
Rating =

Movies
www.subhrajitroy.com | facebook.com/sroy.subhrajitroy | @sroy_subhrajit
Matrix Factorization: Discovering topics from data

Users
Rating =

Movies
FACTORIZE into

𝑹𝑹
≈ 𝑳𝑳

## www.subhrajitroy.com | facebook.com/sroy.subhrajitroy | @sroy_subhrajit

Matrix Factorization and Limitations
𝐿𝐿𝑢𝑢 →
𝐿𝐿� 𝑢𝑢
𝑅𝑅𝑣𝑣 →
𝑅𝑅� 𝑣𝑣

## Many efficient algorithms for factorization Example: Stochastic Gradient Descent

(refer this part of e-book)

## www.subhrajitroy.com | facebook.com/sroy.subhrajitroy | @sroy_subhrajit

Combining features and discovered topics
How to solve the cold start problem?
Features capture context: Time of the day, user information, etc.
Discovered topics from Matrix Factorization capture groups of
users that behave similarly

## 1. Ratings for a new user from features only

2. Matrix Factorization topics become more important as more
information about the user is discovered

Ensemble methods

## www.subhrajitroy.com | facebook.com/sroy.subhrajitroy | @sroy_subhrajit

Blending models
Netflix Prize 2006-2009

over 100 models

## Data: 100M ratings, 17,770 movies and 480,189 users

Goal: Predict 3M ratings to highest accuracy
Prize: 1 million USD
www.subhrajitroy.com | facebook.com/sroy.subhrajitroy | @sroy_subhrajit