Vous êtes sur la page 1sur 4

Recommender systems

Recommender systems are information filtering systems that present relevant item
suggestions to users from a large collection of possible items, for this purpose
recommender systems traditionally rely on the historic interaction of the user with
the system to build and accurate representation of the user's interests. We
understand relevance defined in as the: "utility or usefulness of the information in
regard to the user's task and needs".

There are two types of information recommender systems use in order to predict
the relevance of an item for a user:

Explicit feedback consists on the direct feedback of the user on an item. Ratings
are usual representation of the user opinion of a user for an item, for example a
numerical rating (1 to 5), a rating on like a scale (Strongly disagree, Disagree,
Neither agree nor disagree, Agree, Strongly agree) or binary ratings (like, dislike).

Implicit feedback on the other hand is the collection of actions users exert on
items, these actions indirectly reflect the opinion of the user on the item, and for
example a user can view, click or buy an item.

Most recommendation systems work with explicitly provided ratings as feedback


for the recommendation process, which is known as explicit feedback. This works
well for systems that are based on these kinds of ratings such as Netflix (where
users can rate movies) and Amazon (where users can rate items).
Some of the recommendation systems are not used explicit ratings.
For Example Twitter, where users can show that they like a tweet by retweeting,
and news sites, where the number of views of a news item indicates its popularity,
are examples of systems that do rely on implicit feedback instead of explicit
feedback.
One advantage of this implicit feedback is that a separate user interaction is often
not required: in the example of the news site only opening a news item is enough;
no rating interaction is required.
Another advantage is that such implicit feedback can often be collected in large
quantities: views and clicks occur much more often than explicitly rating an item.

These views and clicks, combined with purchase events, are the most common
examples of implicit feedback used in recommender systems.
It is shown that using such implicit feedback in recommender systems produces
results that are consistent with results achieved when using explicit feedback.

According to recommender systems can be divided into two general categories


according on how the systems employ the user information to calculate the
relevance of an item:

Content based filtering (CB) which uses the features or characteristics of the
items to find out the relevance for the user.

Collaborative filtering (CF) which uses only the opinions of the users on the
items.

Other categories such as knowledge based recommender systems and demographic


based recommender systems but since they rely heavily on user and item features.

Content based filtering (CB)

Content based (CB) filtering systems operate under the assumption that the user
will like similar items to the ones she has liked in the past. CB systems extract the
characteristics or features of the data items and use these characteristics to
represent the item and the user under a common knowledge model.

Data item representation: The data analyzer component creates a data item
representation under a knowledge model that reflects the relevant characteristics of
the data item that are useful for the filtering component.
User representation: The user model component creates a user profile that
reflects the current user's situation and desires and represents it under a knowledge
model. To properly represent the user's situation and interests explicit information
from the user can be used.
Matching: The filtering component is in charge of using the item and user
representation to predict the relevance of the item for the said user.
Learning: The learning component keeps up to date the user model representation
based on the feedback received by the user.

Although CB systems are simple, easy to implement and its easy to explain why an
item has been classified as relevant, several researchers such as have noted the
limitations of these kind of systems in their filtering performance.
We Identified the following Limitations:
(1) Limited content analysis: In order to have a good representation of data items,
data items must be characterized by features. If this feature extraction of data items
is difficult or impractical the representation of the data items will not be accurate
and a conceptual gap between the representation and the real features will occur,
this inaccurate classification of items will affect the accuracy of the filtering
system.
(2)Overspecialization: A CB system will only classify as relevant data items that
are similar to the ones that the user has classified as relevant in the past. This
means that the system cannot predict the relevance of a data item that is unlike the
ones the user has seen.
(3)New user problem: A user has to express her opinion on a sufficient number of
data items in order to build an useful user profile.

Collaborative filtering (CF)


Collaborative filtering (CF) systems are IF systems that operate under the
assumption a user will like the same data items that other users have liked in the
past. Generally speaking, CF systems try to predict the relevance of a data item for
a user by taking into account the opinion that other similar users have manifested
about that item instead of taking into account the features of the data item. CF
solves some of the problems of CB approaches: It doesn't need to know the
features of the data items, therefore is not prone to the limited content analysis.
Also it can detect the relevance of a item that is very different from what the user
has seen, reducing the impact of the overspecialization problem.

We have noted the limitations of these kind of systems in their predictive accuracy,
in particular we remark the following problems:
(1) Sparsity: In some systems data items are rated rarely, making it difficult to find
similarities between users or between items. For model based systems there is also
a problem since
(2)New item problem: When a new item is added to the system is difficult to
predict its relevance because the user and item profile has little or no information,
this problem is critical in systems where items appear and disappear frequently.
(3)Scalability: As the number of items and users
increase, the neighborhood formation process is more demanding computationally,
also as vector representations increase their dimensionality, distance metrics to
detect similarities become less significant. On model based systems, the
computational cost of learning the model parameters as the number of users and
items increase is not negligible
(4)Complexity and explainability: Although model based CF gives better
predictive performance than memory based CF, most of the times developer prefer
using memory based CF for two reasons: Model based CF is more complex;
adjusting parameters can be and time consuming and difficult to maintain. On the
other hand it's difficult to find out what the latent dimensions in the model are
representing and it's difficult to explain to the user how the relevance of an item
has been calculated.

Sparse Linear Method Algorithms


The Sparse Linear Method (SLIM) algorithm is focused specifically on top-N
recommendations, the type of recommendations in the Sanoma use-case as
presented in the beginning of this chapter. Besides this, the results show major
improvements over both nearest neighbor and matrix factorization algorithms for
the top-N recommendation task.

Vous aimerez peut-être aussi