Académique Documents
Professionnel Documents
Culture Documents
Abstract
Recommender systems have become extremely common in recent years, and are utilized in a variety
of areas: some popular applications include movies, music, news, books, research articles, search
queries, social tags, and products in general. Traditional recommendation techniques in
recommender systems mainly use content based or collaborative filtering techniques. These systems
only use the product ratings given by the users to predict/recommend new products or items to the
user. They do not consider other attributes while generating recommendations for a user.
This article describes a new recommendation system that uses genetic algorithm to learn about the
preferences of the users and provides recommendations based on these preferences. This research
uses Movie Lens (http://www.movielens.umn.edu) database and the genetic algorithm combines
features (22) from different files present in the dataset. These features are then used to train the
system. The 22 features are - movie rating, age, sex, occupation and 18 movie genres like action,
adventure, animation, children, comedy, crime, documentary, drama, fantasy, film-noir, horror,
musical, mystery, romance, sci-fi, thriller, war and western.
Introduction
In everyday life it is often necessary to make a decision without resorting to personal experience of various
alternatives. When there are many alternatives it is difficult for users to make appropriate decisions. So people rely on
recommendations from other people’s knowledge or advertisements and reviews about the products either offline or
online.
Recommender systems are thus useful especially in this current age of internet where people are buying all sorts of
products like the daily essential needs like groceries, online. Many largest e-commerce and social media companies
are using recommender systems to assist their customers in searching items they would like to purchase. These
systems provide with the search results tailored to user’s own preference.1
Recommender systems generally use either content based or collaborative or hybrid techniques for
recommendations. In this article, a new recommendation system is proposed that uses an elitist genetic algorithm
together with some features of collaborative filtering and trains it on 22 movie features to generate
recommendations.
1
Dr A.P.J. Abdul Kalam Technical University, lucknow.
E-mail Id: jyotijoshi1222@gmail.com
This article is organized as follows. Section II reviews In the alternative approach, the content-based filtering,
related work and describes the structure of the the recommender system examines the description of
proposed recommender system. Section III explains the the items which are rated higher than others from
genetic algorithm used. Section IV has the experimental users. After this step, the system analyzes the similarity
results and analysis and finally in Section V this article is between examined items and all of the remaining items.
concluded. The system then makes recommendation of new items
by ordering based on its high similarities with the
Related Work selected items.3,4 However, this approach has limitation
that it focuses on only accessed items.
Recommender Systems
We combine the collaborative filtering with an elitist
The main issue of a recommender system is how to genetic algorithm and use not only the ratings of each
recommend items tailored with user’s preference from movie but other features like age, gender and movie
resources. The recommender system also must genres as well to train the system and generate
recognize and provide items corresponding with recommendations for the user.
favorites of users. To resolve this matter there are 2
main approaches: collaborative filtering and content- Generating Profiles
based filtering.2
Before recommendations can be made, the movie data
In the collaborative filtering approach, the is processed into separate profiles, one for each person,
recommender system provides recommendations by defining that person’s movie preferences. Profile (j, i) is
collecting users’ profiles and discovers relations defined to mean the profile for user j on movie item i,
between each profile. After identifying correlation of see fig. 1. The profile of j, profile (j) is therefore a
each profile, the system classifies users having profiles collection of profile (j, i) for all the movies i that j has
that are similar to the others. The system then seen.
recommends items derived from other profiles in the
same group. The advantage of this system is that it has Rating, Age, Gender, Occupation ……… 22.18 Genre
high probability to recommend items corresponding to frequencies
user’s preference by providing environments in which
user can share his or her own profile.3,4
4 35 0 20 000000100010001100
Figure 1.Profile (j, i) – Profile for User j with Rating on Movie Item i, if i has a Rating of 4
Once profiles are built, the process of recommendation Selecting Neighboring Profiles
can begin. Given an active user A, a set of neighborhood
profiles similar to profile (A) must be found. The success of a collaborative filtering system is highly
dependent upon finding neighborhood of profiles that
From the Movie Lens database the ml100k data is used. are most similar to that of the active user. So only the
From this data u.item, u.data and u.user files are used to best or closest profiles should be chosen and used to
create the user profiles. The u.item file contains movie generate new recommendations for the user.
Id, movie name together with 18 bits corresponding to
movie genres. The movie Id and genres are used from In an ideal world the entire database of profiles would
this file. Each entry in u.data file has user Id, movie Id be used to select the best possible profiles. But this is
and corresponding rating. So for each user multiple not a feasible option when the dataset is very large.
entries for movie Id, rating pair are created. Thus, most system opt for random sampling and this is
what is done in this algorithm.
The data collected from other 2 files is combined with
u.user file to create profile (j, i). File u.user contains user Once a set of profiles are selected the distance or
Id, age, gender and occupation fields for each user. For similarity between selected profiles and current user’s
each user Id and movie Id pair an entry for profile must be computed. Most current recommender
corresponding rating, age, gender, occupation, and system use standard algorithms that consider only the
genres is created. movie ratings on which the comparison between 2
profiles is made. In real life however, two people are
29
Joshi J J. Adv. Res. Appl. Arti. Intel. Neural Netw. 2017; 4(1&2)
said to be similar not only on the basis of having profile value for feature j between users A and B on
different opinions on a particular subject but also on movie item i.
other factors like their background, preferences etc.
Before this calculation is made, the profile values are
We can apply the same thing here and consider normalized to ensure that they lie between 0 and 1.
demographic information like user’s age, gender and When the weight for any feature is zero, that feature is
preferences of movie genres. Each user places a ignored. This way feature selection is made adaptive to
different importance or priority on each feature. The each user’s preferences. The difference in the profile
current approach shows how weights defining user’s values for occupation is either 0, if the 2 users have the
priorities can be evolved by a genetic algorithm. same occupation or 1 otherwise.
30
J. Adv. Res. Appl. Arti. Intel. Neural Netw. 2017; 4(1&2) Joshi J
31
Joshi J J. Adv. Res. Appl. Arti. Intel. Neural Netw. 2017; 4(1&2)
computed as the average of differences between actual This score is used to guide the future generations of
and predicted ratings of all movies in the training set. weight evolution, see Fig. 3.
Profile Selection and Matching
Figure 3.Finding the Fitness Score of an Individual (The Active User’s Feature Weights)
Experiments and Result Analysis recommender system based on the Pearson algorithm.7
In each set of experiments, the predicted votes of all the
Experiments movie items in the test set (the items that the active
user has rated but were not used in weights evolution)
Four sets of experiments were designed to observe the were computed using the final feature weights for that
difference in performance between the evolutionary run. These votes were then compared against those
recommender system and a standard, non-adaptive produced from the simple Pearson algorithm.
32
J. Adv. Res. Appl. Arti. Intel. Neural Netw. 2017; 4(1&2) Joshi J
33
Joshi J J. Adv. Res. Appl. Arti. Intel. Neural Netw. 2017; 4(1&2)
Experiment 1: Each of the first 10 users was picked as for the GA recommender. All 10 active users performed
the active user in turn, and the first 10 users (fixed) better than the Pearson algorithm.
were used to provide recommendations.
The results for the last experiment show that the
Experiment 2: Each of the first 10 users was picked as accuracy for the GA recommender was significantly
the active user in turn, and 10 users were picked better for all but 15 active users.
randomly and used to provide recommendations.
Analysis of Results
Experiment 3: Each of the first 50 users was picked as
the active user in turn, and the first 50 users (fixed) Experiment 1 indicates that the prediction accuracy for
were used to provide recommendations. the active user 6, 8 and 9 on the GA recommender was
worse than that obtained from using the Pearson
Experiment 4: Each of the first 50 users was picked as algorithm. But when the number of users was increased
the active user in turn, and 50 users were picked to 50 in experiment 3, the accuracy for the three
randomly and used to provide recommendations. mentioned active users rose and outperformed the
other algorithm. This was expected – as the number of
Each graph above shows the percentage of the number users goes up, the probability of finding a better
of ratings that the system predicted correctly out of the matched profile should be higher and hence accuracy of
total number of available ratings by the current active the predictions should also increase.
user. Whilst the predictions computed with the Pearson
algorithm always remain the same given the same The results suggest that random sampling is a good
parameter values, those obtained from the GA vary per choice for the profile selection task of retrieving profiles
the feature weights of that run. Out of the 10 runs for from the database. Random sampling was expected to
each active user in each experiment, the run with the be better than fixing which users to select because it
best feature weights (that gave the highest percentage allowed the search to consider a greater variety of
of right predictions) was chosen and plotted against the profiles (potentially 10*10 runs = 100 users in
result from the Pearson algorithm. experiment 2 and 50 * 10 = 500 users in experiment 4)
and hence find a better set of well matched profiles.
In the first experiment, the GA recommender performed
equally well (or better) compared to the Pearson As mentioned earlier, only the run(s) with the best
algorithm on 7 active users out of 10. In the third feature weights for each active user were considered for
experiment, out of the 50 users the accuracy for the GA this analysis.
recommender fell below that of the Pearson algorithm
for 17 active users. On the rest of the active users, the Looking at the final feature weights obtained for each
accuracy for the GA recommender was found to be active user, many interesting observations have been
better – in some cases (user 16) the difference was as found.
great as 32%. The random sampling for experiment 2
showed great improvement on the prediction accuracy Let’s focus on a couple of active users - 4 and 27.
34
J. Adv. Res. Appl. Arti. Intel. Neural Netw. 2017; 4(1&2) Joshi J
The weights for feature 5-22 would be lower because of be found. From the feature weights it can be seen that
the scaling factor applied. he gives more preference to War, thriller and horror
movies which you would expect from a 24 year old boy.
Active user 4 is 24 year old male who is a technician by
occupation. This user gives maximum preference to 2nd Another active user 27 is analyzed who is a 40 year old
feature which is age. So it is likely that in this user’s female and is a librarian by profession.
neighborhood other users with similar age group would
This user gives more weight age to age and gender. She References
has interests in Western, sci-fi, romance, drama, crime
and children’s genres. She is a 40 year old female and so 1. Schafer J, Konstan J, Riedl J. Recommender Systems
might have small children and that is she has interests in in E-commerce. ACM conference on Electronic
sci-fi and children’s genres. She is a woman and so Commerce, USA. 1999. pp. 158-166.
would like movies with romance and drama like most 2. Balabanovic M, Shoham Y. FAB: content-based,
other women her age and given her profession. collaborative recommendation. Communications of
the ACM 1997; 40(3): 66-72.
Conclusion 3. Burke R. Hybrid web recommender systems. The
Adaptive Web - Lecture Notes in Computer Science,
This work has shown how evolutionary search can be 2007. pp. 377-408.
employed to fine-tune a profile-matching algorithm 4. Pazzani MJ. A Framework for Collaborative,
within a recommender system, tailoring it to the Content-based and Demographic Filtering. Artificial
preferences of individual users. Intelligence Review 1999; 13(5-6): 394-408.
5. Mitchell M. An Introduction to Genetic Algorithm.
This was achieved by reformulating the problem of
MIT Press, 1998.
making recommendations into a supervised learning
6. Goldberg DE, Holland JH. Genetic algorithms and
task, enabling fitness scores to be computed by
machine learning. Machine Learning 1988; 3(2-3):
comparing predicted votes with actual votes.
95-9.
Experiments demonstrated that, compared to a non-
7. Breese JS, Heckerman D, Kadie C. Empirical analysis
adaptive approach, the evolutionary recommender
of predictive algorithms for collaborative filtering.
system was able to successfully fine-tune the profile
Conference on Uncertainty in Artificial Intelligence,
matching algorithm. This enabled the recommender
1998. pp. 43-52.
system to make more accurate predictions, and hence
better recommendations to users.
35