Secure and Flexible Cloud-Assisted Association Rulev

Accepted Manuscript
Secure and Flexible Cloud-Assisted Association Rule Mining over Horizontally Partitioned
Databases
Cheng Huang, Rongxing Lu, Kim-Kwang Raymond Choo
PII: S0022-0000(16)30133-7
DOI: http://dx.doi.org/10.1016/j.jcss.2016.12.005
Reference: YJCSS 3048
To appear in: Journal of Computer and System Sciences
Received date: 27 February 2016

Revised date: 20 November 2016
Accepted date: 16 December 2016
Please cite this article in press as: C. Huang et al., Secure and Flexible Cloud-Assisted Association Rule Mining over Horizontally
Partitioned Databases, J. Comput. Syst. Sci. (2016), http://dx.doi.org/10.1016/j.jcss.2016.12.005
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing
this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is
published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all
legal disclaimers that apply to the journal pertain.
Highlights
Privacy-preserving collaborative association rule mining in cloud
Collusion attack resilience
Secure association rule mining over horizontally partitioned databases
Secure and Flexible Cloud-Assisted Association Rule
Mining over Horizontally Partitioned Databases
Cheng Huang
Department of Electrical and Computer Engineering, University of Waterloo, Canada
(email: c225huan@uwaterloo.ca)
Rongxing Lu
Faculty of Computer Science, University of New Brunswick, Canada (email:
rlu1@unb.ca)
Kim-Kwang Raymond Choo

Department of Information Systems and Cyber Security, University of Texas at San
Antonio, USA
School of Information Technology and Mathematical Sciences, University of South
Australia, Australia (email: raymond.choo@fulbrightmail.org)
School of Computer Science, China University of Geosciences, Wuhan, China
Abstract
With recent trends in big data and cloud computing, data mining has also
attracted considerable interest due to its potential to deal with distributed
data in the cloud. However, existing data mining technologies may not be
directly deployed as we need to avoid accidental privacy disclosure when
data from dierent sources are mined. In this paper, we propose a secure
and exible cloud-assisted association rule mining over horizontally parti-
tioned databases. Using the proposed scheme, data owners can provide their
data and mine the association rules in the cloud exibly, while being as-
sured of minimal risks of privacy leakage. We then show that our proposed
scheme achieves privacy-preserving mining of association rules, and provides
resilience against collusion attacks. A comparative summary demonstrates
that the proposed scheme is more ecient, in terms of computational costs,
relative to several existing homomorphic-encryption-based schemes.
Keywords: Big Data Mining, Cloud Privacy-Preserving, Association Rule
Preprint submitted to Journal of Computer and System Sciences December 28, 2016
Mining, Resilience against Collusion Attacks
1. Introduction
Big data is a relatively recent trend, and is partly due to the advance-
ments in consumer technologies (e.g. smart mobile devices and mobile apps),
widespread adoption of the Internet, and availability of data from multiple
sources (e.g. social network/media, online transactions, various sensors and
mobile devices). Unsurprisingly, big data has wide-ranging inuences on our
daily life (e.g. shopping, traveling, and education), and multinational and
Fortune 500 organizations such as Amazon, Netix, and Alibaba are actively
collecting, mining, and analyzing data from dierent sources, in order to ob-
tain in-depth insights about an individual and their prole. Such information
could be used to inuence advertising, marketing and other business strate-
gies. One big data challenge is how to capture, store, manage, share, and
analyze big data eectively and eciently; consequently, there have been
interest in data science and big data analytics [1, 2], in applications such
as forensic investigations [3, 4] and real-time detection of emergency events
[5, 6].
Data mining has been identied as a viable big data analytical solution.
Association rule mining, for example, allows one to locate potential relation-
ships between seemingly unrelated data in a large database or other infor-
mation repository. Typical applications of association rule mining include
the analysis and prediction of customer behavior [7], particularly in market
data analysis, product clustering, catalog design and store layout. A simple
example of an application of association rule is If a customer buys potato
chips and peanuts, then the customer is also likely to purchase some beer..
In general, an association rule contains two parts, namely: an antecedent
(if) and a consequent (then). The antecedent is a combination of some data
items and the consequent is usually an item that can be found in combination
with the antecedent. The detailed description of association rule mining will
be discussed later in the paper.
Deploying data mining techniques, such as association rule mining, over
the cloud allows one to leverage the clouds advanced computing capabil-
ities and unlimited storage space. However, there are associated privacy
concerns, such as the need for hosting cloud to access data owners data
and the potential leakage of sensitive information during data mining. In
2
the setting of a healthcare system, for example, a number of participating
healthcare providers store their patients health information in the cloud.
These healthcare providers wish to apply association rule mining algorithm
to collaboratively discover new knowledge about diseases or epidemic. As
sensitive patient data are uploaded to the cloud by the respective healthcare
providers, patients privacy may be compromised if the hosting cloud is able
to access these data. This may also result in a violation of industry-specied
regulations, such as the Health Insurance Portability and Accountability Act
for U.S. healthcare providers. Thus, privacy-preserving in cloud association
rule mining has been the subject of recent research (see [8, 9, 10, 11, 12, 13,
14, 15, 16, 17, 18]). This is also the focus of this paper.
In this paper, we study the use of privacy-preserving association rule min-
ing by collaborative data owners in a cloud, where some of the databases are
horizontally partitioned. Several privacy-preserving association rule mining
schemes have been proposed in the literature [19, 20, 21, 22, 23], which gen-
erally use homomorphic encryption to guarantee security and privacy. A key
limitation of partially homomorphic encryption (PHE) [24] based schemes is
that they only work for pre-determined association rule. Once the associa-
tion rule changes, each data owner needs to re-encrypt the databases prior
to sending the encrypted data to the cloud. This is clearly not practical,
particularly considering the computational costs and bandwidth required to
send a large dataset to the cloud. Another possible solution is to use fully
homomorphic encryption (FHE) [25]. However, bootstrapping of FHE is
time-consuming; thus, adversely aecting the eciency of the entire system.
Techniques, such as dierential privacy [26], bloom lter and secure multi-
party computation (SMC), have also been used to achieve privacy-preserving
association rule mining. As summarized in Table 1, there are limitations in
the various approaches.
Building on our previously proposed fast scale product technique [27],
we present a novel secure and exible cloud-assisted association rule mining
scheme in this paper. The scheme is designed to allow data owners to collab-
oratively achieve association rule mining in the cloud, without compromising
on eciency and exibility. In addition, even though some data owners may
collude with the cloud, it is not possible for the cloud to compromise other
data owners data. Specically, the main contributions of this paper are
two-fold.
1. We present a secure and exible cloud-assisted association rule min-
3
ing scheme over horizontally partitioned databases. Unlike most exist-
ing works, our proposed scheme supports distributed data owners to
achieve association rule mining without compromising on the privacy of
data owners and the mined results, and is resilient to collusion attacks.
2. The proposed scheme provides a exible approach for cloud-assisted

association rule mining, and assures the accuracy of the mining results
(i.e. independent of the association rule used, the cloud can exibly
achieve data mining with dierent data owners encrypted data through
a ne-grained access control).
To demonstrate utility of our proposed scheme, we implement the scheme

in Java. We then generate a dataset and run extensive experiments to eval-
uate its eciency, in terms of computational costs.
The remainder of this paper is organized as follows. In Section 2, we
formalize the system model and security model used, and describe the design
goals. Section 3 presents the preliminaries. In Section 4, we present the
proposed secure and exible association rule mining scheme over horizontally
partitioned databases. The security analysis and performance evaluations
are presented in Sections 5 and 6, respectively. Related work is discussed in
Section 7. Finally, we conclude the paper in Section 8.
Table 1: A comparative summary of privacy-preserving association rule mining schemes

Solution Flexibility Eciency Security Accuracy
Dierential privacy [9] Yes High Yes No
Bloom lter [13] Yes High Yes No
SMC [15] Yes Low Yes Yes
PHE [20] No Medium Yes Yes
FHE [23] Yes Low Yes Yes
Our proposed scheme Yes High Yes Yes
2. Models and Design Goals

In this section, we formalize our system and security models, as well as
identifying our design goals.
4
2.1. System Model
The system model (see Fig. 1) comprises four main entities, namely: data
owners (DOs), a cloud server (CS), a data center (DC), and a trusted au-
thority (TA). The detailed description of each entity is as follows.

Figure 1: System model under consideration
Trusted authority (TA): TA, a fully trusted entity, is responsible for ini-
tializing the system and distributing key materials to other entities in the
system. Note that TA only runs once and should be oine after the system
has been initialized to reducing security risks.
Data owner (DOs = {DO1 , DO2 , ..., DOn }): Each data owner DOi
DOs has a private database, and encrypts the private database before out-
sourcing the private database to the cloud server.
Cloud server (CS): CS has signicant computational and storage capa-
bilities, stores the private databases received from DOs, and helps the data
center to mine quantitative association rules jointly.
Data center (DC): With the assistance of CS, DC can easily and eciently
decrypt and obtain the results of association rule mining.
2.2. Security Model

In our security model, we consider the CS to be honest-but-curious. That
is, although the CS is not malicious, it may be motivated to disclose the raw
data of data owners for nancial benets (e.g. selling data to a third-party
for nancial gains) and seeks to learn the results of mining results. To ensure
a more realistic setting, we also assume that the CS could collude with any
5
data owner (i.e. a number of data owners and the CS can launch a collusion
attack and try to reveal other data owners private data), but the DC is
trusted to store the nal mining results (i.e. DC will not collude with CS or
data owners to share the nal mining results).
In a typical system setup for association rule mining over horizontally
partitioned databases, many data owners will participate in the process of
collaborative data mining in order to obtain more detailed and accurate
results. For example, some healthcare providers located in the same region
(e.g. in the state of Texas) may share information on diagnosis, medical
studies, and clinical trials, so that they can mine disease risk rules together
and better identify frequent diseases or epidemic. In general, the following
security requirements should be satised:
Condentiality: Protecting the data owners private database and raw

transaction data from other data owners and the CS. Thus, even if some
data owners collude with the CS, they cannot identify the contents
of the private databases belonging to non-colluding data owners. In
addition, even if the CS stores all the private databases uploaded by
the data owners, the CS cannot identify each data owners data. Thus,
this will ensure the condentiality of the data owners private database.
The nal mining results should also be condential (i.e. only the DC
has access to the results).
Authentication and data integrity: The communication channel is au-

thenticated; thus, if the data is forged, modied and/or replayed by
any entity, this malicious activity should be detected. Moreover, when
mining association rules, only the authenticated data owners can pro-
vide and contribute their private databases so that the DC can achieve
association rule mining.
2.3. Design Goals

In this paper, we propose a secure and exible cloud-assisted association
rule mining scheme over horizontally partitioned databases, which satises
the following design goals.
Security: Without the security goal, private databases and raw transac-
tions will be leaked to an adversary, and the privacy of each data owner
will be breached. Therefore, the proposed scheme should achieve data
condentiality, data authentication, and data integrity requirements.
6
Flexibility: The cloud-assisted system should be exible. In other
words, independent of the association rules used, data owners only
need to encrypt the databases and upload to the CS once, and the
designed system can exibly achieve association rule mining.
Eciency: Cipher-based association rule mining usually aects perfor-
mance; therefore, any trade-o needs to be balanced and realistic. In
other words, the mining latency should be acceptable compared with
the latency of secure algorithms of association rule mining using ho-
momorphic encryption techniques. We achieve this by shifting heavy
computations to the CS so that computational costs to data owner
remain low.
3. Preliminaries
In this section, we revisit the basic association rule mining algorithm, and
the bilinear pairing technique [28], as well as the access structure and linear
secret-sharing schemes, which serve as the basis of the proposed scheme.
3.1. Association Rule Mining over Horizontally Partitioned Databases

Association rule mining [29] is a data mining algorithm for discovering
interesting association relationships between data items in large transaction
databases. Given a set of transactions, this mining algorithm nds rules
which will predict the occurrence of an item based on the occurrences of
other items in the transaction. The basic idea of association rule mining is
to calculate the support and condence of every rule to decide whether the
rule is strong by comparing with the minimum threshold.
To formalize the basic idea, let T = {t1 , t2 , ..., tN } be the set of all transac-
tions and U = {u1 , u2 , ..., uL } be the set of all data items. Any subset of U can
be represented as Usub . The support count (SC) of Usub is dened as the num-
ber of transactions in T that contains Usub , i.e, (Usub ) = |{t|t T, Usub t}|.
An association rule is an implication of the form Ux Uy , where Ux and
Uy are two subsets of U and Ux Uy = (e.g., Ux = {u1 , u3 }, Uy = {u2 },
{u1 , u3 } {u2 }). The strength of this rule can be measured by two values,
namely: support (SP ) and condence (CF ). Support is an indication of how
frequently the items appear in the database. The equation for computing SP
is dened as follows.
(Ux Uy )
SP (Ux Uy ) = . (1)
|T |
7
Condence indicates an estimate of the conditional probability of nding
items of Uy in transactions that contain Ux , which can be calculated as
follows.
(Ux Uy )
CF (Ux Uy ) = . (2)
(Ux )
If a minimum support threshold SPmin and a minimum condence threshold
CFmin are specied and provided, we can decide whether a rule is strong by
comparing the SP (Ux Uy ) with SPmin and CF (Ux Uy ) with CFmin . A
rule is strong i SP (Ux Uy ) SPmin and CF (Ux Uy ) CFmin . Here,
Table 2: Market-basket transactions

TID Bread Coke Milk Beer Diaper
(1) 1 1 1 0 0
(2) 1 0 0 1 0
(3) 0 1 1 1 1
(4) 1 0 1 1 1
(5) 0 1 1 0 1
we take the market basket transactions in Tab 2 as an example. The support

for the rule {Diaper, M ilk} {Beer} is (Diaper, M ilk, Beer)/5 = 2/5 =
40%, and its condence is (Diaper, M ilk, Beer)/(Diaper, M ilk) = 2/3 =
67%. In this paper, we also consider association rule mining over horizontally
partitioned databases, where each table contains the same number of columns
(items) but dierent rows (transaction). As shown in Tab 3, we give an
example to horizontally divide the original database in Tab 2 into 2 databases
with the same items (5) and dierent transactions (3 and 2).
3.2. Bilinear Pairing

Let G and GT be two cyclic groups of prime order q with the multipli-
cation. Let g and h be two random generators of G and e be a bilinear
map. Let e : G G GT be a bilinear map with the following properties:
i)Bilinearity: for all u, v G and a, b Zq , we have e(ua , v b ) = e(u, v)ab ;
ii) Non-degeneracy: e(g, g) = 1; and iii) Computability: there is an ecient
algorithm to compute bilinear map e : G G GT .
In group G, the Computational Die-Hellman (CDH) problem is hard
given g, g a , g b for g G and unknown a, b Zq , it is intractable to compute
g ab in a polynomial time. However, the Decisional Die-Hellman (DDH)
8
Table 3: Market-basket transactions (horizontally partitioned databases)
(a) Database 1
(1) 1 1 1 0 0
(2) 1 0 0 1 0
(3) 0 1 1 1 1
(b) Database 2
(1) 1 0 1 1 1
(2) 0 1 1 0 1
problem is easy given g, g a , g b , g c for g G and unknown a, b, c Zq ,

?
it is easy to judge whether c = ab mod q by checking e(g a , g b ) = e(g c , g).
We refer the interested reader to [28] for detailed description of the pairing
techniques, and their complexity assumptions.
Denition 1. A bilinear pairing generator algorithm Gen() takes a secu-
rity parameter as input, and outputs a 5-tuple parameter (q, g, h, G, GT , e).
3.3. Access Structure and Linear Secret-Sharing Schemes

Denition 2. (Access Structure [30]) Let {P1 , P2 , ..., Pm } be a set
of parties. A collection A 2{P1 ,P2 ,...,Pm } is monotone if B, C : if B A
and B C then C A. An access structure (respectively, monotone access
structure) is a collection (respectively, monotone collection) A of non-empty
subsets of {P1 , P2 , ..., Pm }, i.e., A 2{P1 ,P2 ,...,Pm } \. The sets in A are known
as the authorized sets, and the sets not in A are the unauthorized sets.
Denition 3. (Linear Secret-Sharing Schemes (LSSS) [30]) A
secret-sharing scheme over a set of parties P is called linear (over Zq ) if i)
The shares for each party form a vector over Zq . ii) There exists a matrix W
with l rows and n columns called the share-generating matrix for . For all
i = 1, 2, ..., l, the i-th row of W , we let the function dened the party labeling
row i as (i). When we consider the column vector
v = (, r2 , ..., rn ), where
Zq is the secret to be shared, and r2 , r3 , ..., rn Zq are randomly chosen,
then W v is the vector of l shares of the secret according to . The share
i = (W v )i belongs to party (i).
9
4. Proposed Scheme
In this section, we present the proposed secure and exible cloud-assisted
association rule mining scheme over horizontally partitioned databases. The
scheme comprises three phases, namely: system initialization, encryption
of data owners databases, jointly association rule mining. Compared with
state-of-the-art secure association rule mining, our scheme is exible when
working with large datasets for mining optional association rule and strikes
a balanced trade-o between security and computational eciency.
4.1. Overview
During system initialization, TA initializes the whole system by gener-
ating and distributing the corresponding key materials for other entities (i.e.
DOs, CS, and DC). Then, TA will be oine and will not participate in sub-
sequent mining processes. In order to construct a more exible and scalable
system, DOs can be classied according to their respective attributes. This
allows the DC to control the private databases which are used to achieve
association rule mining.
During the encryption of DOs databases, each DO needs to encrypt
his/her database, and uploads the encrypted database to CS. CS will au-
thenticate and store all received encrypted databases before merging them
together to generate a larger database. When this stage is nished, it is not
necessary for any DO to perform this phase again.
During jointly association rule mining, according to dierent access
control policies, some authenticated DOs will be chosen and their private
databases are used to mine the association rule together. Next, CS will
perform the corresponding association rule mining tasks. Then, CS will
send the encrypted mining results to the DC. Meanwhile, DC needs to store
all received encrypted mining results and cooperate with the chosen DOs
to decrypt the encrypted mining results. Finally, DC can obtain the nal
mining results and terminate the process of association rule mining.
4.2. System Initialization

A single trusted authority (TA) is responsible to initialize the entire sys-
tem and generate the public/private key pairs for DOs, CS and DC. Speci-
cally, TA executes the bootstrap as follows:
Based on security parameters k1 , k2 , and k3 , TA chooses ve large
primes p, s, x , y , and , such that |p| = |s| = |x | = |y | = k1 , and
|| = k2 .
10
Given the security parameter , TA rst runs the bilinear pairing gen-
erator algorithm to generate the parameters (q, g, G, GT , e).
TA chooses two secure cryptographic hash functions H() and H (),
where H : (0, 1) Zq and H : (0, 1) Zp , and a secure symmetric
encryption/decryption algorithm E()/D(), such as the AES algorithm.
Assuming that there are a total of |U | attributes for all DOs, TA selects
a random number a Zq and random elements (h1 , h2 , ..., h|U | ) G,
and computes e(g, g)a . Then, the master key can be represented as M K =
{P K, a}, where P K = {q, g, G, e(g, g)a , h1 , h2 , ..., h|U | }, and TA shares P K
with the DC as an access control key.
For each data owner DOi DOs = {DO1 , DO2 , ..., DOn }, in order to
realize a ne-grained access control when mining association rules, TA rst
generates the LSSS access structure according to dierent DOs characters,
which can be represented as (W, ). Let W be an lm matrix. The function
maps rows of W to all attributes. Let denotes the set of distinct attributes
which are included in W (i.e. = {d : i [1, l], (i) = d}). Then,
TA chooses a random vector v = {a, 2 , 3 , ..., m } Zqm , and calculates
i = v Wi , where Wi is the i-th row of matrix W . Afterwards, TA chooses
l random number r1 , r2 , ..., rl Zq , and generates the pre-shared access key
P REK = {P K, RKi = (Di = g i hr(i) i
, Ri = g ri , d /(i), Qi,d = hrdi )}
for all i [1, l] by using master key M K. /(i) means the set with
the element (i) removed if present. Finally, TA shares the corresponding
generated P REK||(W, ) with the DOs.
TA chooses h, which is a random generator of G, and performs the
following steps to distribute secret keys to the DOs, the CS and the DC:
- For each data owner DOi DOs, TA chooses a random number si Zq

as DOi s private key, computes Si = hsi as DOi s public key, and
denotes IDi as DOi s identity; then, TA sends (si , Si , IDi ) to DOi .
- For CS, TA chooses a random number sc Zq as CSs private key,

and computes Sc = hsc as CSs public key, and denotes IDc as CSs
identity.
- For DC, TA chooses a random number sd Zq as DCs private key,

computes Sd = hsd as DCs public key, and denotes IDd as DCs iden-
tity.
< q, g, G, GT , e, e(g, g)a , p, x , y , , s, H(), E(), D(), IDc , IDd , Sc , Sd >
11
and each < Si , IDi > for DOi DO, are published as the system-wide public
information.
4.3. Encryption of DOs databases

Assuming that there are n data owners DOs = {DO1 , DO2 , ..., DOn } in
the system, the private database of DOi DOs involves ni transactions,
and each transaction contains L items. For all DOs, the total number of

n
transactions are N (i.e. ni = N ). For each transaction, uk denotes the k-
i=1
th item of a transaction, where k [1, L]. By the following procedure, DOi
can securely outsource the private database to CS for mining association
rules, while avoiding revealing his/her privacy.
Table 4: DOi s databases

(a) DOi s private databases (b) DOi s encrypted databases (EDi )
TID u1 u2 ... uL TID u1 u2 ... uL
(1) mi11 mi12 ... mi1L (1) ci11 ci12 ... ci1L
(2) mi21 mi22 ... mi2L (2) ci21 ci22 ... ci2L
... ... ... ... ... ... ... ... ... ...
(ni ) mini 1 mini 2 ... mini L (ni ) cini 1 cini 2 ... cini L
Step-1: For each item k [1, L], DOi rst selects a random element

rik Zq , as the private keys and DOi stores all private keys in a local
database.
Step-2: To encrypt j-th transaction in his/her private database, as
shown in Tab 4 (a). DOi generates a random number rijk whose length
is k3 for each item. Then, DOi traverses all L items and encrypts each
item in this transaction one by one. If mijk = 1, then DOi computes

cijk = ( + rijk )srik mod p; otherwise, DOi computes cijk = rijk srik mod p.
Finally, DOi encrypts the transaction as Cj = {cij1 , cij2 , ..., cijL }.
Step-3: for j = 1, 2, ..., L, DOi repeats Step 2 to encrypt all ni trans-
actions, and the encrypted database is EDi = {C1 , C2 , ..., Cni }, as shown in
Tab 4 (b).
Step-4: DOi computes the session key for secure communication as kic =

H (Scsi ||IDc ||IDi ||T S), where T S is the current timestamp.
Step-5: DOi performs AES encryption using kic to encrypt EDi as Ei =
Ekic (EDi ) and sends IDi ||Ei ||T S to CS.
12
Step-6: After receiving the encrypted data Ei at time T S , CS rst
checks the validity of the time interval between T S and T S in order to
prevent a replay attack. If T S T S T , where T denotes the expected
valid time interval for transmission delay, CS accepts and processes Ei ||T S,
and rejects otherwise. Once Ei ||T S is accepted, CS computes the session key
as kci = H (Sisc ||IDc ||IDi ||T S) to decrypt Ei as Dkci (Ei ). Finally, CS stores
the EDi in its (large) database.
Step-7: After storing all DOs encrypted databases, CS joins all databases
together and obtains the nal outsourced encrypted database (ID1 , ED1 )||
(ID2 , ED2 )||...||(IDn , EDn ).
4.4. Jointly Association Rule Mining

In this phase, DOs, CS, and DC perform the following steps to achieve
association rule mining jointly.
Step-1: According to dierent requirements, DC sets dierent access
policies to select some authenticated DOs databases. Specically, the access
policies can be represented as a set of attributes S. DC then chooses a random
Zq , and generates the ciphertext using P K as CT = (S, C, {Cz }), where
C = g and {Cz = hz }zS . Finally, DC publishes CT ||T Sc , where T Sc is the
current timestamp, and computes the shared secret as v = e(g, g)a .
Step-2: DC also publishes the rule which needs to be mined privately,
such as Rule = {u1 , u3 } {u2 }.
Step-3: Each data owner DOi DOs fetches CT , and checks whether
the set S satises the access structure (W, . If yes, the data owner decrypts
the CT ; otherwise, outputs . To decrypt CT , DOi rst locates the I,
which satises all i I, (i) S. Then, according to I, DOi calculates
wi , which satises iI i Wi = (1, 0, 0, ..., 0).Next, DOi computes D i =

Di z/(i) Qi,z = g z hz and F = z Cz = z hz , where
i ri
= {z : i I, (i) = z}. Finally, DOi computes the shared secret v as

follows.
D
e(C, i i )
e(g , iI g i i ( z hrzi i ))
iI
= =
e( iI Rii , F ) e( iI g ri i , z hz )

r
e(g, g)a e(g, z hz iI i i )

iI ri i
= e(g, g)a . (3)
e(g, z hz )
13

Step-4: Based on the local data ri1 , ri2 , ..., riL and the published rule

(e.g. {u1 , u3 } {u2 }), DOi calculates Oix = H (e(g, g)a ||x ||T Sc ) ri1

ri3 mod p 1 and Oiy = H (e(g, g) ||y ||T Sc ) ri1 ri3 ri2 mod p 1.
a
Step-5: Each data owner DOi computes the session key as kid = H (Sdsi ||IDd
||IDi ||T S), where T S is the current timestamp, and uses AES encryption to
encrypt (Oix , Oiy ) as O = Ekid (Oix ||Oiy ). Finally, DOi forwards the shared
secret O||T S to DC.
Step-6: After receiving O||T S, DC rst checks the timestamp. and
then computes the session key kdi = H (Sisd ||IDi ||IDd ||T S) to decrypt O as
Dkdi (O) = (Oix , Oiy ). Next, DC sends the identities of all authenticated DOs
to CS.
Step-7: Based on the published rule (e.g. {ux1 , ux2 , ..., uxk } {uy1 , uy2 ,
..., uyk }) and DOs identities, for DOi s j-th transaction, CS computes ccijx =
cijx1 cijx2 ...cijxk and ccijy = ccijx cijy1 cijy2 ...cijyk . Assuming that DOi has
ni
ni
a total of ni transactions, CS computes CCi1 = ccijx and CCi2 = ccijy .
j=1 j=1
Finally, CS sends all CCi1 ||CCi2 ||IDi to DC.
Step-8: Assuming that the total number of authenticated DOs transac-

N
N
tions is N , DC computes SC1 = CCi1 sOix and SC2 = CCi2 sOiy . To
i=1 i=1
decrypt SC1 and SC2 , DC also computes decryption key (ssx , ssy ) as ssx =

sH (e(g,g) ||x ||T Sc ) and ssy = sH (e(g,g) ||y ||T Sc ) . Then, SC1 = ss1
a a
x SC1 mod p
1
and SC2 = ssy SC2 mod p.
Step-9: Finally, DC calculates the support counts as follows.

SC1 SC1 (mod k )
SC1 = . (4)
k

SC2 SC2 (mod k +k )
SC2 = . (5)
k +k
The supports of this rule are SP = SC2 /N and the condence can be
calculated as CF = SC 2
SC1
.
Remark. All rules can be formalized as Ux = {ux1 , ux2 , ..., uxk }, Uy =
{uy1 , uy2 , ..., uyk } and Ux Uy (e.g. {u2 , u3 , u4 } {u5 }), where k , k
[1, L] and Ux Uy = . In our scheme, there exists a constraint: when
DOs, CS and DC try to perform association rule mining together, we need
k1 k2 (k + k ) and k2 k3 .
14
Correctness. The correctness of the authentication in our scheme is ob-
vious. If DOi is not a valid data owner, he/she cannot obtain his/her private
key and cannot generate the shared secret v = e(g, g)a to produce a valid
Auth to pass DCs authentication. In addition, the timestamp technique
ensures the uniqueness of each session key for secure communication.
Settings. To demonstrate the correctness of our mining algorithm clearly,
the settings are dened and given rstly. Namely, there are n data owners
DOs = {DO1 , DO2 , ..., DOn }, L data items U = {u1 , u2 , ..., uL }, and each
data owner has ni transactions. For data owner DOi , his/her transactions
can be denoted as Ti = {t1 , t2 , ..., tni }, and each transaction tj Ti involves
L data items {mij1 , mij2 , ..., mijL }. Also, each data item mijk will be value 0
or 1. The mining rule considered in our proof is dened as Ux Uy , where
Ux = {ux1 , ux2 , ..., uxk } and Uy = {uy1 , uy2 , ..., uyk } are two subsets of U and
Ux Uy = .
Correct Results. According to the settings above and the denition of
association rule mining in Section 3, we can easily obtain the correct support
SPcorrect and condence CFcorrect by running any existing association rule
mining algorithm as follows. Note that, |T | and (Ux ) cannot be value 0
based on the denition.

n
ni
k
k
( mijxk mijyk )
(Ux Uy ) i=1 j=1 k=1 k=1
SPcorrect = = (6)
|T | n
ni
i=1

n
ni
k
k
( mijxk mijyk )
(Ux Uy ) i=1 j=1 k=1 k=1
CFcorrect = = (7)
(Ux )
n
ni
k
mijxk
i=1 j=1 k=1
Proof. We will prove that our proposed privacy-preserving mining al-

gorithm can always output the same mining results (i.e. correct support
and condence) as the mining results of any existing association rule mining
algorithm. That is, under the same settings, our proposed secure mining
algorithm can always output the correct support SP and condence CF .
As described in our mining algorithm, each data mijk of DOi is rstly en-
crypted as cijk according to its value (0 or 1) at DOi s side, and the encrypted
15
databases of DOi are outsourced to CS as shown in Tab 4.

( + rijk )srik mod p, when mijk = 1
cijk = (8)
rijk srik mod p, when mijk = 0
In addition, DOi calculates Oix and Oiy according to the mining rule Ux
Uy , and shares them with DC.

k

Oix = H (e(g, g) ||x ||T Sc )
a
rix k
mod (p 1) (9)
k=1

k
k

Oiy = H (e(g, g) ||y ||T Sc )
a
rix k
riy k
mod (p 1) (10)
k=1 k=1
After all encrypted databases of n data owners have been outsourced to

CS, CS calculates CCi1 and CCi2 for each data owner DOi DOs based
on the rule Ux Uy and sends the results to DC for decryption. In the
following equations, i1 and i2 represent the polynomials of where the

highest orders are smaller than k and k + k (e.g. i1 = 5 k 1 + 3 k 2

and i2 = 7 k +k 1 + 2 k +k 2 ), respectively.

ni
k k

rix
ni
k

CCi1 = cijxk = s k=1 k
( mijxk k + i1 ) mod p (11)
j=1 k=1 j=1 k=1

ni
k
k
CCi2 = cijxk cijyk (12)
j=1 k=1 k=1

k

rix +
k

riy
ni
k
k

= sk=1 k k=1 k
( ( mijxk mijyk ) k +k + i2 ) mod p
j=1 k=1 k=1
Then, DC computes SC1 and SC2 in our mining algorithm as follows.

n
H (e(g,g)a ||x ||T Sc ) 1
SC1 = (s ) CCi1 sOix mod p (13)
i=1

n
ni
k
n
k
=( mijxk ) + i1 mod p
i=1 j=1 k=1 i=1
16

n
H (e(g,g)a ||y ||T Sc ) 1
SC2 = (s ) CCi2 sOiy mod p (14)
i=1

n
ni k
k

n
=( ( mijxk mijyk )) k +k + i2 mod p
i=1 j=1 k=1 k=1 i=1
In the above equations, the random numbers will be canceled out and

the left coecient sH (e(g,g) ||x ||T Sc ) will also be eliminated by its modular
a

inverse as (sH (e(g,g) ||x ||T Sc ) )1 sH (e(g,g) ||x ||T Sc ) = 1 mod p.
a a

k

rix +H (e(g,g)a ||x ||T Sc )
k

rix
ni
k

CCi1 s Oix
= sk=1 k k=1 k
( mijxk k + i1 ) (15)
j=1 k=1

ni k
a ||
x ||T Sc )
= sH (e(g,g) ( mijxk k + i1 )
j=1 k=1
k
k

k

k

rix +
riy +H (e(g,g)a ||x ||T Sc )
rix
riy
CCi2 s Oiy
= sk=1 k k=1 k k=1 k k=1 k
(16)

ni k
k

( ( mijxk mijyk ) k +k + i2 )
j=1 k=1 k=1

ni
k
k
H (e(g,g)a ||x ||T Sc )
=s ( ( mijxk mijyk ) k +k + i2 )
j=1 k=1 k=1
Next, SC1 and SC2 can be calculated by regular

n modular operations

in our mining algorithm. Since SC1 (mod k ) =
i=1 i1 and SC2 (mod

n
n
k +k ) = ni=1 i2 , the polynomials i1 and i2 can be simply elim-
i=1 i=1

inated by modular subtraction. Similarly, k and k +k can also be elimi-
nated by modular division.
i n n k
SC1 SC1 (mod k )
SC1 = mod p = mijxk mod p (17)
k i=1 j=1 k=1

SC2 SC2 (mod k +k ) n ni k k
SC2 = mod p = ( m ijxk mijyk ) mod p
k +k i=1 j=1 k=1 k=1
(18)
17
Since p is a large prime under the condition (k1 k2 (k + k ) and k2
k3 ), the modular arithmetic (substraction and division) in our algorithm
will not eect the normal arithmetic, which is why the follow equations are
correct.

n
ni
k
n
ni
k
mijxk mod p = mijxk (19)
i=1 j=1 k=1 i=1 j=1 k=1

n
ni k
k

n
ni k
k
( mijxk mijyk ) mod p = ( mijxk mijyk ) (20)

i=1 j=1 k=1 k=1 i=1 j=1 k=1 k=1
Consequently, DC computes the support SP and condence CF as fol-

lows. The results show that SP = SPcorrect and CF = CFcorrect . That is,
the proposed mining algorithm is always correct.

n
ni
k
k
( mijxk mijyk )
SC2 i=1 j=1 k=1 k=1
SP =
n =
n = SPcorrect (21)
ni ni
i=1 i=1

n
ni
k
k
( mijxk mijyk )
SC2 i=1 j=1 k=1 k=1
CF = = = CFcorrect (22)
SC1
n
ni
k
mijxk
i=1 j=1 k=1
5. Security Analysis
In this section, we analyze the security properties of the proposed scheme.
Specically, we will demonstrate that the proposed scheme achieves all the
security requirements dened earlier, and collusion attack resilience.
DOs database is privacy-preserving. To achieve the privacy preserva-
tion of DOs data, each data owner DOi s transaction j is outsourced as
Ci = {cij1 , cij2 , ..., cijL }, and each data item cijk of one transaction is en-

crypted as cijk = ( + rijk )srik mod p (if the transaction has this data item)

or rijk srik mod p (if the transaction does not have this data item). It is clear

that cijL is one-time masked with random number rijk and rik with bit length

larger than 100 bits. In addition, with the ciphertext ( +rijk )srik mod p and

rijk srik mod p, it is impossible for others to dierentiate these two ciphertexts
18

due to srik , which is a large number larger than p. Therefore, no one can
know whether a private transaction contains a data item or not, with the
exception of the entity who performs the encryption. We can ensure that
each encrypted transaction is privacy-preserving and only the DO can de-
crypt his/her private database. Although all DOs use the same shared s to

encrypt their database, they encrypt the data as cijk = ( + rijk )srik . That

is, dierent random numbers rik chosen by dierent data owners to encrypt
the same transaction will result in dierent values. Therefore, the databases
remain condential when other DOs try to guess the private databases be-
longing to another DO. We now demonstrate our scheme is able to re-
sist collusion attacks. Each data item of a transaction is encrypted as

cijk = ( +rijk )srik mod p. Although all DOs share s, , the random number

rik chosen by one data owner DOi generates a new mask srik which results
in the ciphertext being random and diers from those of other DOs. Even if

a DO, say DOj , shares his/her random number rjk with CS, it will not be
possible to recover the private database of DOi .
Achieving both authentication and data integrity. All communications in
our scheme are encrypted using the session key between two entities. Specif-
ically, each data owner DOi can calculate his/her own session key kic with
CS and kid with DC, and every session key is dierent from other DOs and
is timestamp T S dependent (i.e. the session key will change with a dierent
T S). When DOi outsources the database to CS, kic = H(Scsi ||IDc ||IDi ||T S)
can be used as the session key to encrypt the communication packages using
the AES algorithm. Similarly, kid = H(Sdsi ||IDd ||IDi ||T S) can be used as the
session key when communicating with DC. In addition, the encrypted com-
munication data is Ek (data||T S) and each entity can verify the decrypted
data and compare T S with the current timestamp for data integrity; thus,
resisting replay attacks. In addition, the proposed scheme achieves a ne-
grained access control by employing the key-policy attribute-based encryp-
tion technique [31]. Since e(g, g)a can only be recovered by the authenticated
data owner DOi using his/her access key P REK, the scheme achieves both
authentication and data integrity.
Mining results are condential. By calculating Oix and Oiy , DC is
the only entity who can compute the decryption key (ssx , ssy ) as ssx =
a a
sH (e(g,g) ) and ssy = sH (e(g,g) ) . Without the decryption key, CS or any data

SC1 SC1 ( mod k )
owner cannot calculate the support and condential by SC1 = k
19
Table 5: Parameter settings
Parameter k1 k2 k3 k k
Setting 1024 1024 200 128 2 1
+k
SC SC ( mod k )
and SC2 = 2 2
k +k
. Thus, the nal mining results cannot be ob-
tained by CS or any data owner, with the exception of DC.
6. Performance Evaluation
In this section, we evaluate the performance of our proposed scheme in
terms of computational costs. More specically, the performance metrics
used in the evaluation are 1) transactions encryption time (an indication of
computational costs for DOs), 2) time of association rule mining over hori-
zontally partitioned databases (an indication of computational costs for CS),
and 3) decryption time (an indication of computational costs for DC). In
addition, we compare our proposed scheme with EMHS [20], which is the
state-of-the-art privacy-preserving association rule mining over horizontally
partitioned databases. EMHS, primarily based on Paillier homomorphic en-
cryption technique, is described as follows. Each DO rst encrypts his/her
databases using the public key of DC and outsources the encrypted database
to the cloud based on pre-dened rules. Then, CS merges all databases
together, performs the homomorphic addition operations to obtain the en-
crypted sum data (encrypted support courts), and sends the results to DC.
Finally, DC decrypts the results and calculates the nal mining results, such
as the support and the condence of the rules.
In our experiment, we used a Windows 7 laptop with a 3.1 GHz processor
and 8GB RAM. In addition, we implemented both our scheme and EHMS
scheme in Java, using the parameter settings shown in Tab 5. Note that ,
k1 , k2 , and k3 are the security parameters dened in our scheme, and k and
k mean that a specic rule (i.e. Ux Uy , where Ux contains k data items
and Uy contains k data items) is considered in our simulation. To simulate
the process of association rule mining, we generated 1 million transactions
for 100 DOs, and each transaction contains 20 data items. For simplicity,
each entity of our proposed scheme (i.e. DO, CS, and DC) will be evaluated
separately, and the computational cost of these entities in our scheme were
used to compare with those in the EMHS scheme under the same conditions,
including the computational cost of encryption (with dierent transaction
20
numbers and data items), the computational cost of mining one rules in 1
million transactions from 100 DOs, and the computational cost of decryption
at the DC.
1800
1600 EMHS
Our scheme
1400
200
1200
EMHS
Time(seconds)
150
1000
Time(s)
100
800
Our scheme
50 600
0 400
10000
20 200
18
5000 16
14 0
12 1 2 3 4 5 6 7 8 9 10
Transactions 0 Rules
10 Data Items
(a) Encryption time with dierent trans- (b) Encryption time with dierent rules
actions and data items at DOs at DOs
Figure 2: Computational cost of our scheme and EMHS at DOs
Computational costs for DOs. To evaluate the computational e-

ciency of DOs, we focus on two key attributes which have the most impact
on performance, namely the number of transactions and the data items for
each transaction, and the number of rules queued to be mined. In our simu-
lator, the number of transactions was set between 1000 and 10000, and the
number of data items was set between 10 and 20. Intuitively, as the number
of transactions and data items increases, computation consumption and run-
ning time will also increase since each data item in each transaction needs to
be encrypted. However, since there is no power operation in our encryption
operation, compared with the Paillier encryption operation, our scheme is
clearly more ecient. As shown in Fig. 2 (a), to encrypt 10000 transactions
(10 data items), the running time of EHMS was almost 180 seconds while
our scheme required less than 50 seconds. That is, to encrypt each data
item in one transaction, the Paillier encryption costs almost 1.8 ms while our
scheme only costs 0.5 ms which is very ecient. In addition, considering that
our scheme is exible, when the mining rule changes, our scheme does not
require all DOs to re-encrypt their databases (unlike EMHS). Specically, to
re-encrypt 10000 transactions, EHMS spent almost 17 seconds for each new
rule but the computational cost of DO in our scheme was almost zero (see
Fig. 2 (b)). Therefore, we can conclude that the DOs computational costs
21
in our scheme is signicantly lower. We also remark that we omitted the
computational costs involved in the authentication and the computation of
session key, since these functions are not part of our scheme.
35000
29744
30000
25000
20000
15000 13450
10000
5000
23 19
0
Running time at CS side (ms) Decryption time at DC side (ms)
EMHS Our scheme
Figure 3: Comparative summary of computational costs in our scheme and EMHS for CS
and DC
Computational costs for CS. In our simulator, to mine the association

rules, we assumed that there are 1 million transactions generated by 100 DOs.
In other words, we horizontally split the generated databases in random and
assigned them to 100 DOs, and each DO may have dierent transactions.
The computational costs for CS is shown in Fig. 3, and the results showed
that in our scheme CS only required 13 seconds to mine one association rule
from 1 million transactions, but EMHS required 29 seconds.
Computational costs for DC. With the exception of computing both
pre-shared secret key and session key, DC needs to decrypt the encrypted
mining results from CS. In our simulator, when the encrypted mining results
were received from CS, the average decryption time was 19 ms in our scheme
and 23 ms in EMHS (see Fig. 3). Thus, the computational cost for DC in
our scheme is slightly lower than those of EHMS.
22
7. Related Work
In this section, we introduce related literature on privacy-preserving as-
sociation rule mining over horizontally and vertically distributed databases.
Specically, this paper seeks to solve the challenge of privacy-preserving as-
sociation rule mining over horizontally partitioned databases, while other
papers generally focus on privacy-preserving association rule mining over
vertically partitioned databases. Despite the dierent focuses, techniques
such as homomorphic encryption, dierential privacy, and secure multiparty
computation can be employed for both horizontally and vertically partitioned
databases.
Solutions based on homomorphic encryption. In solutions based on homo-
morphic encryption [25], all DOs databases are encrypted using the public
key of DC and are stored in cloud. When the DC wishes to mine association
rules over all distributed databases, CS performs the homomorphic compu-
tations and obtains the encrypted support and the condence of each rule.
Then, by using the private key, DC can easily decrypt the encrypted mining
results to obtain the support and the condence. There are two kinds of ho-
momorphic encryption, namely: partially homomorphic encryption (PHE)
[24] and fully homomorphic encryption (FHE) [25]. PHE-based solutions
[19, 20, 21] are more ecient than FHE-based solutions [22, 23], but the
former is unable to support exibly association rule mining.
Solutions based on dierential privacy and bloom lter. Solutions based
on dierential privacy [26] and bloom lters [12, 13] are designed without the
use of cryptographic techniques. By adding noises or using non-cryptographic
hash functions, these solutions [8, 9] can achieve association rule mining very
eciently. That is, all computations are performed without the need for
encryptions, which are generally computationally expensive. A key limitation
of such solutions is that the mining results are not accurate and can deviate
signicantly from the real results, due to the characteristics of dierential
privacy and bloom lter.
Solutions based on secure multiparty computation. By combining several
secure protocols together, solutions such as those described in [32, 15] can
achieve secure and exible association rule mining at the cost of eciency
(due to the complex protocols). In addition, most of them do not consider
collusion attack resilience, and assume that no DO will collude with the cloud
to game the system (an unrealistic assumption).
Dierent from the above solutions, building on our previous fast scale
23
product technique [27], our proposed scheme supports exible privacy-preserving
association rule mining with multiple DOs in the cloud. Thus, it can per-
form the data mining algorithm accurately and eciently. In addition, our
proposed scheme is designed to resist collusion attacks.
8. Conclusion
Big data trend is unlikely to go away anytime soon, particularly with the
increasing pervasiveness of consumer technologies and storage capabilities.
As noted in a recent survey [33], data analytical technologies have failed to
keep pace with the big data challenges. Privacy-preserving association rule
mining by collaborative data owners in a cloud environment is one of the
current research challenges.
In this paper, we proposed a secure and exible cloud-assisted association
rule mining over horizontally partitioned databases, which is designed to
achieve privacy-preserving association rule mining with distributed databases
while reducing privacy disclosure. We then presented a detailed security
analysis to demonstrate that the proposed scheme achieves secure and exible
association rule mining. In addition, based on our extensive performance
evaluation, we demonstrated the eciency of the scheme.
Future work will include integrating dierential privacy techniques with
our proposed scheme to achieve a more secure and robust scheme for associa-
tion rule mining, as well as designing the scheme to also work with vertically
partitioned databases (see [34]).
Acknowledgment
The authors would also like to thank the associate editor and the three
anonymous reviewers for their constructive feedback.
[1] Y. Hu, J. Yan, K.-K. R. Choo, Pedal: a dynamic analysis tool for ef-
cient concurrency bug reproduction in big data environment, Cluster
Computing 19 (1) (2016) 153166.
[2] L. Zhao, L. Chen, R. Ranjan, K.-K. R. Choo, J. He, Geographical infor-

mation system parallelization for spatial big data processing: a review,
Cluster Computing 19 (1) (2016) 139152.
24
[3] D. Quick, K.-K. R. Choo, Data reduction and data mining framework
for digital forensic evidence: Storage, intelligence, review, and archive,
Trends & Issues in Crime and Criminal Justice 480 (2011) 111.
[4] D. Quick, K.-K. R. Choo, Big forensic data reduction: Digital forensic
images and electronic evidence, Cluster Computing.
[5] Z. Xu, H. Zhang, C. Hu, L. Mei, J. Xuan, K.-K. R. Choo, V. Sugumaran,

Y. Zhu, Building knowledge base of urban emergency events based on
crowdsourcing of social media, Concurrency and Computation: Practice
and Experience.
[6] Z. Xu, H. Zhang, V. Sugumaran, K.-K. R. Choo, L. Mei, Y. Zhu, Partic-

ipatory sensing-based semantic and spatial analysis of urban emergency
events using mobile social media, EURASIP Journal on Wireless Com-
munications and Networking 2016 (2016) 44.
[7] P. Giudici, S. Figini, Applied data mining for business and industry,
Applied Data Mining for Business and Industry, Second Edition (2009)
iviii.
[8] F. Giannotti, L. V. S. Lakshmanan, A. Monreale, D. Pedreschi, W. H.

Wang, Privacy-preserving mining of association rules from outsourced
transaction databases, in: Twentieth Italian Symposium on Advanced
Database Systems, SEBD 2012, Venice, Italy, June 24-27, 2012, Pro-
ceedings, 2012, pp. 233242.
[9] X. Yi, F. Rao, E. Bertino, A. Bouguettaya, Privacy-preserving associa-

tion rule mining in cloud computing, in: Proceedings of the 10th ACM
Symposium on Information, Computer and Communications Security,
ASIA CCS 15, Singapore, April 14-17, 2015, 2015, pp. 439450.
[10] O. A. Wahab, M. O. Hachami, A. Zaari, M. Vivas, G. G. Dagher,

DARM : a privacy-preserving approach for distributed association rules
mining on horizontally-partitioned data, in: 18th International Database
Engineering & Applications Symposium, IDEAS 2014, Porto, Portugal,
July 7-9, 2014, 2014, pp. 18.
[11] Y. Duan, J. F. Canny, J. Z. Zhan, Ecient privacy-preserving associa-

tion rule mining: P4P style, in: Proceedings of the IEEE Symposium on
25
Computational Intelligence and Data Mining, CIDM 2007, part of the
IEEE Symposium Series on Computational Intelligence 2007, Honolulu,
Hawaii, USA, 1-5 April 2007, 2007, pp. 654660.
[12] L. Qiu, Y. Li, X. Wu, Preserving privacy in association rule mining with
bloom lters, J. Intell. Inf. Syst. 29 (3) (2007) 253278.
[13] M. Kantarcioglu, R. Nix, J. Vaidya, An ecient approximate protocol
for privacy-preserving association rule mining, in: Advances in Knowl-
edge Discovery and Data Mining, 13th Pacic-Asia Conference, PAKDD
2009, Bangkok, Thailand, April 27-30, 2009, Proceedings, 2009, pp. 515
524.
[14] A. V. Evmievski, R. Srikant, R. Agrawal, J. Gehrke, Privacy preserving
mining of association rules, Inf. Syst. 29 (4) (2004) 343364.
[15] T. Tassa, Secure mining of association rules in horizontally distributed
databases, IEEE Trans. Knowl. Data Eng. 26 (4) (2014) 970983.
[16] Y. Saygin, V. S. Verykios, A. K. Elmagarmid, Privacy preserving as-
sociation rule mining, in: 12th International Workshop on Research
Issues in Data Engineering: Engineering E-Commerce/E-Business Sys-
tems, RIDE02, San Jose, California, USA, February 24-25, 2002, 2002,
pp. 151158.
[17] C. N. Modi, A. R. Patil, Proceedings of 3rd International Conference
on Advanced Computing, Networking and Informatics: ICACNI 2015,
Volume 2, Springer India, New Delhi, 2016, Ch. Privacy Preserving
Association Rule Mining in Horizontally Partitioned Databases Without
Involving Trusted Third Party (TTP), pp. 549555.
[18] C. Huang, R. Lu, Efpa:ecient and exible privacy-preserving mining of
association rule in cloud, in: 2015, IEEE/CIC International Conference
on Communications in China, ICCC 2015, Shenzhen, China, November
2-4, 2015, 2015.
[19] M. Hussein, A. El-Sisi, N. A. Ismail, Fast cryptographic privacy pre-
serving association rules mining on distributed homogenous data base,
in: Proceedings of The 2008 International Conference on Data Mining,
DMIN 2008, July 14-17, 2008, Las Vegas, USA, 2 Volumes, 2008, pp.
513519.
26
[20] X. C. Nguyen, H. B. Le, T. A. Cao, An enhanced scheme for
privacy-preserving association rules mining on horizontally distributed
databases, in: 2012 IEEE RIVF International Conference on Comput-
ing & Communication Technologies, Research, Innovation, and Vision
for the Future (RIVF), Ho Chi Minh City, Vietnam, February 27 - March
1, 2012, 2012, pp. 14.
[21] S. Rana, P. S. Thilagam, Hierarchical homomorphic encryption based

privacy preserving distributed association rule mining, in: 2014 Interna-
tional Conference on Information Technology, ICIT 2014, Bhubaneswar,
India, December 22-24, 2014, 2014, pp. 379385.
[22] M. G. Kaosar, R. Paulet, X. Yi, Fully homomorphic encryption based

two-party association rule mining, Data Knowl. Eng. 76 (2012) 115.
[23] J. Liu, J. Li, S. Xu, B. C. M. Fung, Secure outsourced frequent pattern

mining by fully homomorphic encryption, in: Big Data Analytics and
Knowledge Discovery - 17th International Conference, DaWaK 2015,
Valencia, Spain, September 1-4, 2015, Proceedings, 2015, pp. 7081.
[24] P. Paillier, Paillier encryption and signature schemes, in: Encyclopedia

of Cryptography and Security, 2nd Ed., 2011, pp. 902903.
[25] C. Gentry, A fully homomorphic encryption scheme, Ph.D. thesis, Stan-

ford University, crypto.stanford.edu/craig (2009).
[26] C. Dwork, Dierential privacy, in: Automata, languages and program-

ming, Springer, 2006, pp. 112.
[27] R. Lu, H. Zhu, X. Liu, J. K. Liu, J. Shao, Toward ecient and privacy-
preserving computing in big data era, IEEE Network 28 (4) (2014) 46
50.
[28] D. Boneh, M. K. Franklin, Identity-based encryption from the weil pair-

ing, in: Advances in Cryptology - CRYPTO 2001, 21st Annual Interna-
tional Cryptology Conference, Santa Barbara, California, USA, August
19-23, 2001, Proceedings, 2001, pp. 213229.
[29] R. Agrawal, T. Imielinski, A. N. Swami, Mining association rules be-

tween sets of items in large databases, in: ACM SIGMOD93., 1993, pp.
207216.
27
[30] V. Goyal, O. Pandey, A. Sahai, B. Waters, Attribute-based encryption
for ne-grained access control of encrypted data, in: Proceedings of the
13th ACM Conference on Computer and Communications Security, CCS
2006, Alexandria, VA, USA, Ioctober 30 - November 3, 2006, 2006, pp.
8998.
[31] S. Hohenberger, B. Waters, Attribute-based encryption with fast de-

cryption, in: Public-Key Cryptography - PKC 2013 - 16th International
Conference on Practice and Theory in Public-Key Cryptography, Nara,
Japan, February 26 - March 1, 2013. Proceedings, 2013, pp. 162179.
[32] M. Kantarcioglu, C. Clifton, Privacy-preserving distributed mining of

association rules on horizontally partitioned data, IEEE Trans. Knowl.
Data Eng. 16 (9) (2004) 10261037.
[33] D. Quick, K.-K. R. Choo, Impacts of increasing volume of digital forensic

data: A survey and future research challenges, Digital Investigation
11 (4) (2014) 273294.
[34] L. Li, R. Lu, K.-K. R. Choo, A. Datta, J. Shao, A. privacy-preserving

outsourced association rule mining on vertically partitioned databases,
IEEE Transactions on Information Forensics and Security.
28

Secure and Flexible Cloud-Assisted Association Rulev

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Secure and Flexible Cloud-Assisted Association Rulev

Transféré par

Droits d'auteur :

Formats disponibles

Accepted Manuscript

Cheng Huang, Rongxing Lu, Kim-Kwang Raymond Choo

To appear in: Journal of Computer and System Sciences

Received date: 27 February 2016

Kim-Kwang Raymond Choo

1. We present a secure and exible cloud-assisted association rule min-

2. The proposed scheme provides a exible approach for cloud-assisted

To demonstrate utility of our proposed scheme, we implement the scheme

Table 1: A comparative summary of privacy-preserving association rule mining schemes

2. Models and Design Goals

Figure 1: System model under consideration

2.2. Security Model

Condentiality: Protecting the data owners private database and raw

Authentication and data integrity: The communication channel is au-

2.3. Design Goals

3.1. Association Rule Mining over Horizontally Partitioned Databases

Table 2: Market-basket transactions

we take the market basket transactions in Tab 2 as an example. The support

3.2. Bilinear Pairing

problem is easy given g, g a , g b , g c for g G and unknown a, b, c Zq ,

3.3. Access Structure and Linear Secret-Sharing Schemes

4.2. System Initialization

- For each data owner DOi DOs, TA chooses a random number si Zq

- For CS, TA chooses a random number sc Zq as CSs private key,

- For DC, TA chooses a random number sd Zq as DCs private key,

< q, g, G, GT , e, e(g, g)a , p, x , y , , s, H(), E(), D(), IDc , IDd , Sc , Sd >

4.3. Encryption of DOs databases

Table 4: DOi s databases

4.4. Jointly Association Rule Mining

= {z : i I, (i) = z}. Finally, DOi computes the shared secret v as

Proof. We will prove that our proposed privacy-preserving mining al-

After all encrypted databases of n data owners have been outsourced to

Then, DC computes SC1 and SC2 in our mining algorithm as follows.

Next, SC1 and SC2 can be calculated by regular

( mijxk mijyk ) mod p = ( mijxk mijyk ) (20)

Consequently, DC computes the support SP and condence CF as fol-

Figure 2: Computational cost of our scheme and EMHS at DOs

Computational costs for DOs. To evaluate the computational e-

Computational costs for CS. In our simulator, to mine the association

[2] L. Zhao, L. Chen, R. Ranjan, K.-K. R. Choo, J. He, Geographical infor-

[5] Z. Xu, H. Zhang, C. Hu, L. Mei, J. Xuan, K.-K. R. Choo, V. Sugumaran,

[6] Z. Xu, H. Zhang, V. Sugumaran, K.-K. R. Choo, L. Mei, Y. Zhu, Partic-

[8] F. Giannotti, L. V. S. Lakshmanan, A. Monreale, D. Pedreschi, W. H.

[9] X. Yi, F. Rao, E. Bertino, A. Bouguettaya, Privacy-preserving associa-

[10] O. A. Wahab, M. O. Hachami, A. Zaari, M. Vivas, G. G. Dagher,

[11] Y. Duan, J. F. Canny, J. Z. Zhan, Ecient privacy-preserving associa-

[21] S. Rana, P. S. Thilagam, Hierarchical homomorphic encryption based

[22] M. G. Kaosar, R. Paulet, X. Yi, Fully homomorphic encryption based

[23] J. Liu, J. Li, S. Xu, B. C. M. Fung, Secure outsourced frequent pattern

[24] P. Paillier, Paillier encryption and signature schemes, in: Encyclopedia

[25] C. Gentry, A fully homomorphic encryption scheme, Ph.D. thesis, Stan-

[26] C. Dwork, Dierential privacy, in: Automata, languages and program-

[28] D. Boneh, M. K. Franklin, Identity-based encryption from the weil pair-

[29] R. Agrawal, T. Imielinski, A. N. Swami, Mining association rules be-

[31] S. Hohenberger, B. Waters, Attribute-based encryption with fast de-

[32] M. Kantarcioglu, C. Clifton, Privacy-preserving distributed mining of

[33] D. Quick, K.-K. R. Choo, Impacts of increasing volume of digital forensic

[34] L. Li, R. Lu, K.-K. R. Choo, A. Datta, J. Shao, A. privacy-preserving

Vous aimerez peut-être aussi