Vous êtes sur la page 1sur 5

International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 0882

Volume 4, Issue 11, November 2015

Privacy Preserving Recommender Systems


Suriyaakumar G M1, Balasubramanian P2
1

Department of Database Systems, Indian Institute of Information Technology Srirangam, Tiruchirapalli, India
Department of Information Technology, Indian Institute of Information Technology Srirangam, Tiruchirapalli, India

ABSTRACT
Recommender systems are widely used in online
applications since they empower personalized service to
the users. The basic collaborative filtering techniques
work on users data which are habitually privacy
sensitive and can be abused by the service provider. To
protect the privacy of the users, we propose to encrypt
the privacy sensitive data and generate recommendations
by giving out them under encryption. With this
methodology, the service provider learns no information
on any users preferences or the recommendations made.
The proposed method is based on homomorphic
encryption schemes and secure multiparty computation
techniques. The overhead of working in the encrypted
domain is minimized by packing data revealed in the
complexity analysis.
Keywords - Recommender systems, secure multiparty
computation, user privacy, homomorphic encryption,
data packing.

I.

INTRODUCTION

In the previous years, we have experienced remarkable


progress
in
information
and
communication
technologies. Cheaper, more influential, less power
consuming
devices
and
higher
bandwidth
communication lines assisted us to create a new virtual
world in which people imitate activities from their daily
lives without the restrictions imposed by the physical
world. Online shopping, communicating and banking
much more have turned out to be common for millions
of people [1].
Personalization is a common methodology to draw even
more people to online services. Instead of making
general suggestions for the users of the system, the
system can suggest personalized services targeting only
a particular user based on his preferences [2]. Since the
personalization of the services offers high profits to the
service providers and poses interesting research
challenges, research for generating recommendations,
also known as collaborative filtering, attracts attention
both from academia and industry. The techniques for
generating recommendations for users strongly rely on
the information gathered from the user. This information
can be provided by the user himself as in profiles or the

service provider can observe users actions like click


logs. On one hand, more information on the user helps
the system to improve the accuracy of the
recommendations. On the other hand, the information on
the users creates a severe privacy risk since there is no
solid guarantee for the service provider not to misuse
the users data. It is often seen that whenever a user
enters the system, the service provider claims the
ownership of the information provided by the user and
authorizes itself to distribute the data to third parties for
its own benefits [13].
In this paper, we propose a cryptographic solution for
preserving the privacy of users in a recommender
system. In particular, the privacy-sensitive data of the
users are kept encrypted and the service provider
generates recommendations by processing encrypted
data. The cryptographic protocol developed for this
purpose is based on homomorphic encryption [3] and
secure multiparty computation (MPC) techniques [14].
While the homomorphic property is used for realizing
linear operations, protocols based on MPC techniques
are developed for non-linear operations (e.g. finding the
most similar users). The overhead introduced by
working in the encrypted domain is reduced
considerably by data packing as shown in complexity
analysis.

II.

RELATED WORK

The previous works suggests a system where the private


user data is encrypted and recommendations are made by
relating an iterative technique based on conjugate
gradient algorithm [4]. The algorithm estimates a
classification matrix of the users in a subspace and
creates recommendations by manipulating re-projections
in the encrypted area. Subsequently the algorithm is
iterative, it takes several rounds for conjunction and in
each turn users need to share in an costly decryption
technique which is built on a threshold system where a
momentous portion of the users are expected to be
online.
The outcome of each iteration, which is the
classification matrix, is available. Another work
recommends a technique to safeguard the privacy of
users built on a probabilistic factor analysis model [5] by
using a similar approach as in previous. While this work

www.ijsret.org

1160

International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 0882
Volume 4, Issue 11, November 2015

is done with encrypted user data, meanwhile another


author proposes to defend the privacy of users by using
randomization practices [11, 12]. In their work, they
shade the user data with a recognized random
distribution assuming that in aggregated data this
randomization cancels out and the outcome is a worthy
estimation of the outcome.
The achievement of this technique vastly depends on
the amount of users joining in the computation then for
the system to work; the number of users need to be in
huge amounts. This generates a trade-off between
accuracy of the recommendations and the number of
users in the system. Moreover, the result of the
algorithm is also accessible to the server which
constitutes a privacy threat to the users. Lastly, the
randomization methods are believed to be highly
doubtful [15].
A central system for creating recommendations is a
general method in e-commerce applications. To produce
recommendations for user P, the server follows a twostep technique. In the first step, the server hunts for users
similar to user P. Each user in the system is symbolized
by a preference vector which is typically composed of
ratings for each item within a definite range. Finding
alike users is built on calculating similarity measures
between users preference vectors. Pearson correlation is
a similarity measure (Eq. 1) for two users with
preference vectors VP = (v(P,0), . . . , v(P,M1))T and VQ =
(v(Q,0), . . . , v(Q,M1))T, respectively, where M is the count
of items and v symbolizes the average value of the
vector v.
sim P,Q =

iM01 (v( P ,i ) v P ).( v( Q ,i ) vQ )


iM01 (v( P ,i ) v P ) 2 . iM01 (v(Q ,i ) vQ ) 2

(1)

Once the similarity measure for each user is


calculated, the server moves to the second step. The
server selects the first N users with similarity values
above a threshold and averages their ratings. These
average ratings are offered as recommendations to user
P. In e-commerce applications the items offered to users
are typically in hundreds or thousands. Concerning the
huge number of items and users rating behavior, the
data matrix is generally extremely sparse, meaning that
utmost of the items are not graded. Finding analogous
users in a meager dataset can definitely lead the server to
produce inaccurate recommendations. To manage with
this issue, one method is to present a small set of items
that are valued by most users.
Such a base set can be obviously given to the
users or indirectly chosen by the server from the most
commonly rated items. Having a minor set of items that
is graded by most users, the server can calculate

resemblances between users more positively, resulting in


more precise recommendations. Therefore, we adopt that
the user preference vector V is split into two parts: the
first part consists of X elements that are graded by most
of the users and the second part contains M X partly
rated items that the user would like to get
recommendations on [2].
We use encryption to guard user data from the
service provider and other users. An unusual class of
cryptosystems, homomorphic cryptosystems, allows us
to process the data in the encrypted form. We picked the
Paillier cryptosystem as it is additively homomorphic
meaning that the product of two encrypted values [a] and
[b], where [] means the encryption function, agrees to
a new encrypted message whose decryption produces the
sum of a and b as [a] [b] = [a + b]. As a result of the
additive homomorphism, cipher text [m] elevated to the
power of a public value c agrees to the multiplication of
m and c in the encrypted domain: [m]c = [m c]. In
addition to the homomorphism, the Paillier cryptosystem
is semantically safe suggesting that each encryption has
a casual element that effect in different cipher texts for
the similar plaintext.
The DGK is substituted with the Paillier
cryptosystems in a sub protocol for efficiency reasons.
Due to its much minor message space, encryption and
decryption actions are more efficient than Paillier
cryptosystem. We make use of the semi-honest security
ideal, which admits that all players trail the decorum
steps but are interested and thus keep all messages from
earlier and present steps to extract extra information than
they are allowed to have. Our procedure can be adapted
to the lively attacker model by using the thoughts in [9]
with extra overhead.

III.

PROPOSED WORK

We recommend a protocol built on additively


homomorphic encryption systems and MPC methods.
The service provider, i.e. the server, accepts the
encrypted grade vector of user P and directs it to the
other users in the system so that they can calculate the
similarity value on their own by making use of the
homomorphism stuff of the encryption scheme. Once the
users calculate the similarity values, they are directed to
the server. Later, the server and user P runs a procedure
to regulate the similarity values that are above a
threshold . The server, being ignorant of the amount
users with a similarity value above a threshold and their
identities, collects the grades of all users in the encrypted
area. Then, the encrypted amount is directed to user P
along with the encrypted quantity of similarities above
the threshold; L. User P decrypts the sum and L and,

www.ijsret.org

1161

International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 0882
Volume 4, Issue 11, November 2015

calculates
the
average
values,
gaining
the
recommendations. Each step of the proposed protocol is
detailed further.
PREPROCESSING
Any user in the scheme who needs to get
recommendations produces personal public key pairs for
the Paillier and the DGK cryptosystems. We accept that
the public keys of the users are available publicly. Since
the Pearson correlation given in (1) for user P and Q can
be also written as,
X 1

simP,Q=

i 0

(v( P ,i ) vP )
iX01 (v( P ,i ) vP ) 2

(v(Q ,i ) vQ )
iX01 (v(Q ,i ) vQ ) 2

R1

(2)

R2

The terms R1 and R2 can be simply calculated by


users P and Q, respectively. Each user calculates a vector
from which the mean is deducted and normalized.
Subsequently the components of the vector are real
numbers and cryptosystems are only well-defined on
integer values, they are all scaled by a limitation f and
rounded to the nearest integer resultant in a new vector
V`i = (v(i,0), . . . , v`(i,X1))T whose components are now kbit positive integers. Note that the threshold value
should also be accustomed accordingly.

a public threshold . The protocol receives N encrypted


similarity values and outputs en encrypted vector [ P] =
([(P,0)], [(P,1)], . . . , [(P,N1)]). The components of
this vector (P, i) are either an encryption of 1, if the the
similarity value between user P and user I is above the
threshold , or an encryption of 0, otherwise.
RECOMMENDATION GENERATION
After obtaining the vector [ P], the server can
produce the recommendation for user P. For this
purpose, the server directs [(P,i)] to the ith user in the
system. User i, mentioned as user Q, can raise [(P,Q)] to
the power of each grade he has gone in his ratings vector
to obtain another encrypted vector [ (P,Q)] = ([(P,X)],
[(P,X+1)], . . . , [(P,M1)]) where (P,j) = [(P,Q)
v`(Q,j)] = [(P,Q)]v`(Q,j) for j = X to M 1. Take Note that
user Q does not recognize the content of (P,Q). The
resulting vector [ (P,Q)] is either the encrypted score
vector of user Q or a vector of encrypted 0s. Vector
[ (P, Q)] then is directed to the server to be collected
with other vectors from every user. The above process
can be enhanced in order to lessen the computational and
communication overhead. Instead of raising [(P,Q)] to
the power of each rating, the rankings can be
characterized in a compact form and then used as an
exponent,

COMPUTING SIMILARITY

v`(Q,X)|v`(Q,X+1)|...|v`(Q,M1),

The similarity value between user P and any other user


Q is calculated over the rating vectors of size X. The
elements of the user vector V`P = (v`(P,0), . , v`(P,X1)) are
encrypted separately by using the public key of the user
P. Then, the encrypted vector [V`P]pkA is sent to the
server. The server then directs the encrypted vector to
the other users in the system. Any user Q who receives
the encrypted vector [V`P]pkA
can calculate the
encrypted similarity as,
X 1

[simP, Q] = [ i 0 ( v`( P ,i ) . v`( Q ,i ) )]

(3)

Note that we neglect the encryption key pkP above


and in the rest for the sake of readability. The calculated
similarity value is then directed back to the server in
encrypted form.
FINDING SIMILAR USERS
Upon getting similarity values from users, the server
inducts a cryptographic protocol with user P to decide
the most alike users whose similarity values are beyond

(4)

Where | signifies the concatenation process.


Assuming that each v` (Q,j) is k-bits and N of such vectors
are to be collected by the server, where N is the users
contributing in the practice, each section should have a
bit size of k + log(N). Thus, packing is achieved by the
following form
M X

v``q = 2j(k+log(N)).v`(Q, j+X)


j 0

(5)

By packing values, the message cost decreases


significantly as we get a packed value rather than a
vector of encrypted vectors. Packing also decreases the
number of exponentiations which is a expensive process
in the encrypted domain, presenting a gain in
calculation. However, dependent on the message space
of the encryption system, and the number of ratings, M
X, it may not be likely to pack all values in one
encryption. The amount of values that can fit into one
encryption is T = n/(k + log(N)). Therefore, we may
want S = M X/T encryptions. Once user Q packs his

www.ijsret.org

1162

International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 0882
Volume 4, Issue 11, November 2015

ratings to obtain v``Q, he can compute [ (P,Q)] as


follows,
[ (P,Q)] = [ ( P ,Q ) ] v``Q =

[v``Q] if (P,Q) = 1 (6)

[ ( P ,i ) ] = [2-l (c mod 2l - r mod 2l) + . 2l]

[0] if (P, Q) = 0,
and directs [ (P,Q)] to the server. Upon getting
[ (P,i)]values from all users, the server collects them,
N

i 0

i 0

[ (P)] = [ (P,i)] =[ [ (P,i)]

(7)

Notice that the outcome will be equal to the sum of


grades of the users who have similarity standards above
threshold . The server also collects the [(P,i)] values
to find the number of users overhead the threshold,
N

[L] = [
i 0

(P,i)]

=[ [
i 0

(P,i)]

= [z+r] and sends it to user P who then decrypts it.


Notice that the most significant bit now can be computed
as

(8)

These two values, [ P] and [L] are then directed to


user P. After decrypting, user P crumbles P and
divides each extracted value by L, obtaining the average
grades of L users. This accomplishes our protocol. An
important comment at this point is the value of L. If L =
0, the user can report the server to repeat the second step
of the procedure with a new threshold. If L = 1, the user
gets exactly the same ratings vector of some user but he
does not have the individuality of that particular user.
CRYPTOGRAPHIC PROTOCOL
Discovery of similar users is built on comparing
the similarity value between user P and Q, simP, Q, to a
public threshold . As the similarity worth is privacy
sensitive and should be kept underground both from the
server and the user, we associate it in the encrypted area.
For this determination, we use a contrast protocol that
has been presented in [8]. The cryptographic protocol in
[8] takes two encrypted values, [p] and [q], and outputs
the result again in the encrypted form: if p > q [ =
1], and [ = 0] otherwise. For the comprehensiveness of
the paper, we give a brief explanation of the protocol.
Given the similarity value simP,M and public threshold
, both of which are bits, the most important bit of the
value z = 2+simP,Q is the consequence of the
assessment.
However, we want to obtain the most important
bit of z in the encrypted area. While the encrypted value
[z] can be calculated by the server, the most significant
bit of [z] needs running a procedure between the server
and user P who has the decryption key. Note that the
similarity worth cannot be trusted to the user as it
escapes information about other users in the system.
Therefore, the server enhances a random value r to z: [c]

(9)

where the last term is essential reliant on on the relation


between c and r. The variable is a single bit on behalf
of whether c > r or not. At this point, we convert the
problem of comparing [simA,i] and to the problem of
likening c and r which are possessed by the user and the
server respectively. Comparing c and r needs another
cryptographic protocol in which the server and user P
estimate the following formula for every of bits:
l 1

[ei] = [1- ci + ri+ 3 cj rj ]


j i 1

(10)

where ci and ri are the ith bits of c and r,


respectively. The rate of ei can be 0 if and only if c > r,
when ci = 0, ri = 1 and the higher part of c and r are the
same. After these calculations, the server sends the
randomized and shambled [ei] values to the user P. User
P decrypts them and checks whether there is a zero
among the standards ei. Presence of a 0 value indicates
that r > c. However, this outflows information about the
comparison of simP,Q and thus, the server randomizes
the way of the comparison by substituting 1ci +ri in Eq.
10 with 1ci +ri at random. User P then returns []
which is either [1] or [0] liable on the existence of 0
among the ei values. The server can correct the track of
the comparison and obtain the [(P,i)] by replacing in
Eq. 9. By using this assessment protocol, each similarity
value is associated to threshold simultaneously. The
outcomes of the comparisons, [ P] = ([(P,0)], [(P,1)], .
. . , [(P,N1)]), are then used in the succeeding steps.

IV.

RESUTS AND DISCUSSION

The performance of our method is mainly decided


by the communication among the server, and user P,
who asks for recommendation, and other users in the
scheme. In our creation, the server participates in the
calculation and transmits messages among users. User P,
on the other hand, only contributes in the procedure in
two phases (i) when asking for a recommendation and
uploads his encrypted records and ii) when he receives
the encrypted reference. Other users support the server
with the recommendation group.
Round Complexity, our practice consists of 5
rounds. The data transmission from users to the server in
the initialization phase is 0.5 rounds. To determine the
similar users and producing the recommendation, the
server requires 4 rounds of interaction. Notice that

www.ijsret.org

1163

International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 0882
Volume 4, Issue 11, November 2015

during the assessment protocol to obtain [ P], all


encrypted standards are related to a public value and
all assessments can be done in analogous. In the latter
stage, the server directs the recommendation to user P
which needs another 0.5 round. This gives O (1) rounds.
Communication Complexity, The quantity of
data transported during the procedure is mainly
prejudiced by the size of the encrypted data. For user P,
the quantity of encrypted data to be transported is O(X +
N). The server, on the other hand, has to obtain and
send O(N(R + S + )) encrypted data which is deeply
influenced by the data broadcast during the evaluation of
N similarity values. Other users in the system want to
receive and send data in the order of O(R + S).
Server
User P
User Q
Paillier DGK Paillier DGK Paillier DGK
O(Nl) O(X)
O(l)
Encr O(N)
O(1)
O(l)
Decr 2
O(NS)
O(Nl
)
O(X)
Mul
O(Nl) O(X+S) Exp
Table 1 Computational Complexity
The computational complexity hangs on the
charge of operations in the encoded domain and can be
branded into four sessions, encryptions, decryptions,
multiplications and exponentiations. In Table 1, we
provide the average statistics for every operation in the
Paillier and the DGK cryptosystems. One exception is
for the decryption process, which is really a zero-check
which is a fast and less costly process compared to
original decryption in DGK cryptosystem.

V.

CONCLUSION AND FUTURE WORK

In this paper we proposed a cryptographic method


for creating recommendations to the users within online
applications. The proposed technique is built by
homomorphic encryption systems and MPC methods. As
revealed in the complexity analysis, the overhead
presented by working in the encrypted area is reduced
suggestively by stuffing data and by the DGK
cryptosystem. However, we accomplish that our
proposal is based on a accurate scenario and the required
technology is not overly demanding compared to the
cryptographic tackles like thresholding systems that
other methods are making use [4]. Compared to
randomization methods [11, 12], our proposal is
provably safe and does not depend on on the number of
users in the system.

REFERENCES
[1] Dianshuang Wu, Guangquan Zhang, and Jie L, A Fuzzy
Preference Tree-Based Recommender System for Personalized
Business-to-Business E-Services, IEEE Transactions on
Fuzzy Systems,2014.
[2] G. Adomavicius and A. Tuzhilin. Toward the next
generation of recommender systems: A survey of the state-ofthe-art and possible extensions. IEEE Trans. on Knowl. and
Data Eng., 17(6):734749, 2005.
[3] N. Ahituv, Y. Lapid, and S. Neumann, Processing
encrypted data, Commun.ACM, 30(9):777780, 1987.
[4] J. F. Canny, Collaborative filtering with privacy, In
IEEE Symposium on Security and Privacy, pages 4557, 2002.
[5] J. F. Canny, Collaborative filtering with privacy via factor
analysis, SIGIR, pages 238245, New York, NY, USA, 2002.
[6]
Damgard, M. Geisler, and M. Krigaard,
Efficient and Secure Comparison for On-Line Auctions,
Australasian Conference on Information Security and
Privacy- ACSIP, volume 4586 of LNCS, pages 416430.
Springer, July 2-4, 2007.
[7] Damgard and M. Jurik, A Generalization, a Simplification
and some Applications of Pailliers Probabilistic Public-Key
System, Technical report, Department of Computer Science,
University of Aarhus, 2000.
[8] Z. Erkin, M. Franz, J. Guajardo, S. Katzenbeisser, R. L.
Lagendijk, and T. ToftPrivacy-preserving face recognition,
Proceedings of the Privacy Enhancing Technologies
Symposium, pages 235253, Seattle, USA, 2009.
[9] Goldreich, S. Micali, and A. Wigderson, How to Play any
Mental Game or A Completeness Theorem for Protocols with
Honest Majority, ACM Symposium on Theory of Computing STOC 87, pages 218229. ACM, May 25-27, 1987.
[10] Paillier, Public-Key Cryptosystems Based on Composite
Degree Residuosity Classes ,Advances in Cryptology
EUROCRYPT 99, volume1592 of LNCS, pages 223238.
Springer, May 2-6, 1999.
[11] Polat and Du. Privacy-preserving collaborative filtering
using randomized perturbation techniques, ICDM, pages
625628, 2003.
[12] H. Polat and W. Du, SVD-based collaborative filtering
with privacy, Proceedings of the 2005 ACM symposium on
Applied computing, pages 791795, New York, NY, USA,
2005. ACM Press.
[13]
Shopzilla,
Inc.
Privacy
policy,
2009.
http://www.bizrate.com/content/privacy.html.
[14] Yao, Protocols for Secure Computations , Annual
Symposium on Foundations of Computer Science FOCS
82, pages 160164. IEEE, November 3-5, 1982.

www.ijsret.org

1164

Vous aimerez peut-être aussi