Académique Documents
Professionnel Documents
Culture Documents
Secure and Flexible Cloud-Assisted Association Rule Mining over Horizontally Partitioned
Databases
PII: S0022-0000(16)30133-7
DOI: http://dx.doi.org/10.1016/j.jcss.2016.12.005
Reference: YJCSS 3048
Please cite this article in press as: C. Huang et al., Secure and Flexible Cloud-Assisted Association Rule Mining over Horizontally
Partitioned Databases, J. Comput. Syst. Sci. (2016), http://dx.doi.org/10.1016/j.jcss.2016.12.005
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing
this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is
published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all
legal disclaimers that apply to the journal pertain.
Highlights
Privacy-preserving collaborative association rule mining in cloud
Collusion attack resilience
Secure association rule mining over horizontally partitioned databases
Secure and Flexible Cloud-Assisted Association Rule
Mining over Horizontally Partitioned Databases
Cheng Huang
Department of Electrical and Computer Engineering, University of Waterloo, Canada
(email: c225huan@uwaterloo.ca)
Rongxing Lu
Faculty of Computer Science, University of New Brunswick, Canada (email:
rlu1@unb.ca)
Abstract
With recent trends in big data and cloud computing, data mining has also
attracted considerable interest due to its potential to deal with distributed
data in the cloud. However, existing data mining technologies may not be
directly deployed as we need to avoid accidental privacy disclosure when
data from dierent sources are mined. In this paper, we propose a secure
and exible cloud-assisted association rule mining over horizontally parti-
tioned databases. Using the proposed scheme, data owners can provide their
data and mine the association rules in the cloud exibly, while being as-
sured of minimal risks of privacy leakage. We then show that our proposed
scheme achieves privacy-preserving mining of association rules, and provides
resilience against collusion attacks. A comparative summary demonstrates
that the proposed scheme is more ecient, in terms of computational costs,
relative to several existing homomorphic-encryption-based schemes.
Keywords: Big Data Mining, Cloud Privacy-Preserving, Association Rule
Preprint submitted to Journal of Computer and System Sciences December 28, 2016
Mining, Resilience against Collusion Attacks
1. Introduction
Big data is a relatively recent trend, and is partly due to the advance-
ments in consumer technologies (e.g. smart mobile devices and mobile apps),
widespread adoption of the Internet, and availability of data from multiple
sources (e.g. social network/media, online transactions, various sensors and
mobile devices). Unsurprisingly, big data has wide-ranging inuences on our
daily life (e.g. shopping, traveling, and education), and multinational and
Fortune 500 organizations such as Amazon, Netix, and Alibaba are actively
collecting, mining, and analyzing data from dierent sources, in order to ob-
tain in-depth insights about an individual and their prole. Such information
could be used to inuence advertising, marketing and other business strate-
gies. One big data challenge is how to capture, store, manage, share, and
analyze big data eectively and eciently; consequently, there have been
interest in data science and big data analytics [1, 2], in applications such
as forensic investigations [3, 4] and real-time detection of emergency events
[5, 6].
Data mining has been identied as a viable big data analytical solution.
Association rule mining, for example, allows one to locate potential relation-
ships between seemingly unrelated data in a large database or other infor-
mation repository. Typical applications of association rule mining include
the analysis and prediction of customer behavior [7], particularly in market
data analysis, product clustering, catalog design and store layout. A simple
example of an application of association rule is If a customer buys potato
chips and peanuts, then the customer is also likely to purchase some beer..
In general, an association rule contains two parts, namely: an antecedent
(if) and a consequent (then). The antecedent is a combination of some data
items and the consequent is usually an item that can be found in combination
with the antecedent. The detailed description of association rule mining will
be discussed later in the paper.
Deploying data mining techniques, such as association rule mining, over
the cloud allows one to leverage the clouds advanced computing capabil-
ities and unlimited storage space. However, there are associated privacy
concerns, such as the need for hosting cloud to access data owners data
and the potential leakage of sensitive information during data mining. In
2
the setting of a healthcare system, for example, a number of participating
healthcare providers store their patients health information in the cloud.
These healthcare providers wish to apply association rule mining algorithm
to collaboratively discover new knowledge about diseases or epidemic. As
sensitive patient data are uploaded to the cloud by the respective healthcare
providers, patients privacy may be compromised if the hosting cloud is able
to access these data. This may also result in a violation of industry-specied
regulations, such as the Health Insurance Portability and Accountability Act
for U.S. healthcare providers. Thus, privacy-preserving in cloud association
rule mining has been the subject of recent research (see [8, 9, 10, 11, 12, 13,
14, 15, 16, 17, 18]). This is also the focus of this paper.
In this paper, we study the use of privacy-preserving association rule min-
ing by collaborative data owners in a cloud, where some of the databases are
horizontally partitioned. Several privacy-preserving association rule mining
schemes have been proposed in the literature [19, 20, 21, 22, 23], which gen-
erally use homomorphic encryption to guarantee security and privacy. A key
limitation of partially homomorphic encryption (PHE) [24] based schemes is
that they only work for pre-determined association rule. Once the associa-
tion rule changes, each data owner needs to re-encrypt the databases prior
to sending the encrypted data to the cloud. This is clearly not practical,
particularly considering the computational costs and bandwidth required to
send a large dataset to the cloud. Another possible solution is to use fully
homomorphic encryption (FHE) [25]. However, bootstrapping of FHE is
time-consuming; thus, adversely aecting the eciency of the entire system.
Techniques, such as dierential privacy [26], bloom lter and secure multi-
party computation (SMC), have also been used to achieve privacy-preserving
association rule mining. As summarized in Table 1, there are limitations in
the various approaches.
Building on our previously proposed fast scale product technique [27],
we present a novel secure and exible cloud-assisted association rule mining
scheme in this paper. The scheme is designed to allow data owners to collab-
oratively achieve association rule mining in the cloud, without compromising
on eciency and exibility. In addition, even though some data owners may
collude with the cloud, it is not possible for the cloud to compromise other
data owners data. Specically, the main contributions of this paper are
two-fold.
3
ing scheme over horizontally partitioned databases. Unlike most exist-
ing works, our proposed scheme supports distributed data owners to
achieve association rule mining without compromising on the privacy of
data owners and the mined results, and is resilient to collusion attacks.
4
2.1. System Model
The system model (see Fig. 1) comprises four main entities, namely: data
owners (DOs), a cloud server (CS), a data center (DC), and a trusted au-
thority (TA). The detailed description of each entity is as follows.
Trusted authority (TA): TA, a fully trusted entity, is responsible for ini-
tializing the system and distributing key materials to other entities in the
system. Note that TA only runs once and should be oine after the system
has been initialized to reducing security risks.
Data owner (DOs = {DO1 , DO2 , ..., DOn }): Each data owner DOi
DOs has a private database, and encrypts the private database before out-
sourcing the private database to the cloud server.
Cloud server (CS): CS has signicant computational and storage capa-
bilities, stores the private databases received from DOs, and helps the data
center to mine quantitative association rules jointly.
Data center (DC): With the assistance of CS, DC can easily and eciently
decrypt and obtain the results of association rule mining.
5
data owner (i.e. a number of data owners and the CS can launch a collusion
attack and try to reveal other data owners private data), but the DC is
trusted to store the nal mining results (i.e. DC will not collude with CS or
data owners to share the nal mining results).
In a typical system setup for association rule mining over horizontally
partitioned databases, many data owners will participate in the process of
collaborative data mining in order to obtain more detailed and accurate
results. For example, some healthcare providers located in the same region
(e.g. in the state of Texas) may share information on diagnosis, medical
studies, and clinical trials, so that they can mine disease risk rules together
and better identify frequent diseases or epidemic. In general, the following
security requirements should be satised:
6
Flexibility: The cloud-assisted system should be exible. In other
words, independent of the association rules used, data owners only
need to encrypt the databases and upload to the CS once, and the
designed system can exibly achieve association rule mining.
Eciency: Cipher-based association rule mining usually aects perfor-
mance; therefore, any trade-o needs to be balanced and realistic. In
other words, the mining latency should be acceptable compared with
the latency of secure algorithms of association rule mining using ho-
momorphic encryption techniques. We achieve this by shifting heavy
computations to the CS so that computational costs to data owner
remain low.
3. Preliminaries
In this section, we revisit the basic association rule mining algorithm, and
the bilinear pairing technique [28], as well as the access structure and linear
secret-sharing schemes, which serve as the basis of the proposed scheme.
7
Condence indicates an estimate of the conditional probability of nding
items of Uy in transactions that contain Ux , which can be calculated as
follows.
(Ux Uy )
CF (Ux Uy ) = . (2)
(Ux )
If a minimum support threshold SPmin and a minimum condence threshold
CFmin are specied and provided, we can decide whether a rule is strong by
comparing the SP (Ux Uy ) with SPmin and CF (Ux Uy ) with CFmin . A
rule is strong i SP (Ux Uy ) SPmin and CF (Ux Uy ) CFmin . Here,
8
Table 3: Market-basket transactions (horizontally partitioned databases)
(a) Database 1
TID Bread Coke Milk Beer Diaper
(1) 1 1 1 0 0
(2) 1 0 0 1 0
(3) 0 1 1 1 1
(b) Database 2
TID Bread Coke Milk Beer Diaper
(1) 1 0 1 1 1
(2) 0 1 1 0 1
9
4. Proposed Scheme
In this section, we present the proposed secure and exible cloud-assisted
association rule mining scheme over horizontally partitioned databases. The
scheme comprises three phases, namely: system initialization, encryption
of data owners databases, jointly association rule mining. Compared with
state-of-the-art secure association rule mining, our scheme is exible when
working with large datasets for mining optional association rule and strikes
a balanced trade-o between security and computational eciency.
4.1. Overview
During system initialization, TA initializes the whole system by gener-
ating and distributing the corresponding key materials for other entities (i.e.
DOs, CS, and DC). Then, TA will be oine and will not participate in sub-
sequent mining processes. In order to construct a more exible and scalable
system, DOs can be classied according to their respective attributes. This
allows the DC to control the private databases which are used to achieve
association rule mining.
During the encryption of DOs databases, each DO needs to encrypt
his/her database, and uploads the encrypted database to CS. CS will au-
thenticate and store all received encrypted databases before merging them
together to generate a larger database. When this stage is nished, it is not
necessary for any DO to perform this phase again.
During jointly association rule mining, according to dierent access
control policies, some authenticated DOs will be chosen and their private
databases are used to mine the association rule together. Next, CS will
perform the corresponding association rule mining tasks. Then, CS will
send the encrypted mining results to the DC. Meanwhile, DC needs to store
all received encrypted mining results and cooperate with the chosen DOs
to decrypt the encrypted mining results. Finally, DC can obtain the nal
mining results and terminate the process of association rule mining.
10
Given the security parameter , TA rst runs the bilinear pairing gen-
erator algorithm to generate the parameters (q, g, G, GT , e).
TA chooses two secure cryptographic hash functions H() and H (),
where H : (0, 1) Zq and H : (0, 1) Zp , and a secure symmetric
encryption/decryption algorithm E()/D(), such as the AES algorithm.
Assuming that there are a total of |U | attributes for all DOs, TA selects
a random number a Zq and random elements (h1 , h2 , ..., h|U | ) G,
and computes e(g, g)a . Then, the master key can be represented as M K =
{P K, a}, where P K = {q, g, G, e(g, g)a , h1 , h2 , ..., h|U | }, and TA shares P K
with the DC as an access control key.
For each data owner DOi DOs = {DO1 , DO2 , ..., DOn }, in order to
realize a ne-grained access control when mining association rules, TA rst
generates the LSSS access structure according to dierent DOs characters,
which can be represented as (W, ). Let W be an lm matrix. The function
maps rows of W to all attributes. Let denotes the set of distinct attributes
which are included in W (i.e. = {d : i [1, l], (i) = d}). Then,
TA chooses a random vector v = {a, 2 , 3 , ..., m } Zqm , and calculates
i = v Wi , where Wi is the i-th row of matrix W . Afterwards, TA chooses
l random number r1 , r2 , ..., rl Zq , and generates the pre-shared access key
P REK = {P K, RKi = (Di = g i hr(i) i
, Ri = g ri , d /(i), Qi,d = hrdi )}
for all i [1, l] by using master key M K. /(i) means the set with
the element (i) removed if present. Finally, TA shares the corresponding
generated P REK||(W, ) with the DOs.
TA chooses h, which is a random generator of G, and performs the
following steps to distribute secret keys to the DOs, the CS and the DC:
11
and each < Si , IDi > for DOi DO, are published as the system-wide public
information.
Step-1: For each item k [1, L], DOi rst selects a random element
rik Zq , as the private keys and DOi stores all private keys in a local
database.
Step-2: To encrypt j-th transaction in his/her private database, as
shown in Tab 4 (a). DOi generates a random number rijk whose length
is k3 for each item. Then, DOi traverses all L items and encrypts each
item in this transaction one by one. If mijk = 1, then DOi computes
cijk = ( + rijk )srik mod p; otherwise, DOi computes cijk = rijk srik mod p.
Finally, DOi encrypts the transaction as Cj = {cij1 , cij2 , ..., cijL }.
Step-3: for j = 1, 2, ..., L, DOi repeats Step 2 to encrypt all ni trans-
actions, and the encrypted database is EDi = {C1 , C2 , ..., Cni }, as shown in
Tab 4 (b).
Step-4: DOi computes the session key for secure communication as kic =
H (Scsi ||IDc ||IDi ||T S), where T S is the current timestamp.
Step-5: DOi performs AES encryption using kic to encrypt EDi as Ei =
Ekic (EDi ) and sends IDi ||Ei ||T S to CS.
12
Step-6: After receiving the encrypted data Ei at time T S , CS rst
checks the validity of the time interval between T S and T S in order to
prevent a replay attack. If T S T S T , where T denotes the expected
valid time interval for transmission delay, CS accepts and processes Ei ||T S,
and rejects otherwise. Once Ei ||T S is accepted, CS computes the session key
as kci = H (Sisc ||IDc ||IDi ||T S) to decrypt Ei as Dkci (Ei ). Finally, CS stores
the EDi in its (large) database.
Step-7: After storing all DOs encrypted databases, CS joins all databases
together and obtains the nal outsourced encrypted database (ID1 , ED1 )||
(ID2 , ED2 )||...||(IDn , EDn ).
13
Step-4: Based on the local data ri1 , ri2 , ..., riL and the published rule
(e.g. {u1 , u3 } {u2 }), DOi calculates Oix = H (e(g, g)a ||x ||T Sc ) ri1
ri3 mod p 1 and Oiy = H (e(g, g) ||y ||T Sc ) ri1 ri3 ri2 mod p 1.
a
Step-5: Each data owner DOi computes the session key as kid = H (Sdsi ||IDd
||IDi ||T S), where T S is the current timestamp, and uses AES encryption to
encrypt (Oix , Oiy ) as O = Ekid (Oix ||Oiy ). Finally, DOi forwards the shared
secret O||T S to DC.
Step-6: After receiving O||T S, DC rst checks the timestamp. and
then computes the session key kdi = H (Sisd ||IDi ||IDd ||T S) to decrypt O as
Dkdi (O) = (Oix , Oiy ). Next, DC sends the identities of all authenticated DOs
to CS.
Step-7: Based on the published rule (e.g. {ux1 , ux2 , ..., uxk } {uy1 , uy2 ,
..., uyk }) and DOs identities, for DOi s j-th transaction, CS computes ccijx =
cijx1 cijx2 ...cijxk and ccijy = ccijx cijy1 cijy2 ...cijyk . Assuming that DOi has
ni
ni
a total of ni transactions, CS computes CCi1 = ccijx and CCi2 = ccijy .
j=1 j=1
Finally, CS sends all CCi1 ||CCi2 ||IDi to DC.
Step-8: Assuming that the total number of authenticated DOs transac-
N
N
tions is N , DC computes SC1 = CCi1 sOix and SC2 = CCi2 sOiy . To
i=1 i=1
decrypt SC1 and SC2 , DC also computes decryption key (ssx , ssy ) as ssx =
sH (e(g,g) ||x ||T Sc ) and ssy = sH (e(g,g) ||y ||T Sc ) . Then, SC1 = ss1
a a
x SC1 mod p
1
and SC2 = ssy SC2 mod p.
Step-9: Finally, DC calculates the support counts as follows.
SC1 SC1 (mod k )
SC1 = . (4)
k
SC2 SC2 (mod k +k )
SC2 = . (5)
k +k
The supports of this rule are SP = SC2 /N and the condence can be
calculated as CF = SC 2
SC1
.
Remark. All rules can be formalized as Ux = {ux1 , ux2 , ..., uxk }, Uy =
{uy1 , uy2 , ..., uyk } and Ux Uy (e.g. {u2 , u3 , u4 } {u5 }), where k , k
[1, L] and Ux Uy = . In our scheme, there exists a constraint: when
DOs, CS and DC try to perform association rule mining together, we need
k1 k2 (k + k ) and k2 k3 .
14
Correctness. The correctness of the authentication in our scheme is ob-
vious. If DOi is not a valid data owner, he/she cannot obtain his/her private
key and cannot generate the shared secret v = e(g, g)a to produce a valid
Auth to pass DCs authentication. In addition, the timestamp technique
ensures the uniqueness of each session key for secure communication.
Settings. To demonstrate the correctness of our mining algorithm clearly,
the settings are dened and given rstly. Namely, there are n data owners
DOs = {DO1 , DO2 , ..., DOn }, L data items U = {u1 , u2 , ..., uL }, and each
data owner has ni transactions. For data owner DOi , his/her transactions
can be denoted as Ti = {t1 , t2 , ..., tni }, and each transaction tj Ti involves
L data items {mij1 , mij2 , ..., mijL }. Also, each data item mijk will be value 0
or 1. The mining rule considered in our proof is dened as Ux Uy , where
Ux = {ux1 , ux2 , ..., uxk } and Uy = {uy1 , uy2 , ..., uyk } are two subsets of U and
Ux Uy = .
Correct Results. According to the settings above and the denition of
association rule mining in Section 3, we can easily obtain the correct support
SPcorrect and condence CFcorrect by running any existing association rule
mining algorithm as follows. Note that, |T | and (Ux ) cannot be value 0
based on the denition.
n
ni
k
k
( mijxk mijyk )
(Ux Uy ) i=1 j=1 k=1 k=1
SPcorrect = = (6)
|T | n
ni
i=1
n
ni
k
k
( mijxk mijyk )
(Ux Uy ) i=1 j=1 k=1 k=1
CFcorrect = = (7)
(Ux )
n
ni
k
mijxk
i=1 j=1 k=1
15
databases of DOi are outsourced to CS as shown in Tab 4.
( + rijk )srik mod p, when mijk = 1
cijk = (8)
rijk srik mod p, when mijk = 0
In addition, DOi calculates Oix and Oiy according to the mining rule Ux
Uy , and shares them with DC.
k
Oix = H (e(g, g) ||x ||T Sc )
a
rix k
mod (p 1) (9)
k=1
k
k
Oiy = H (e(g, g) ||y ||T Sc )
a
rix k
riy k
mod (p 1) (10)
k=1 k=1
n
H (e(g,g)a ||x ||T Sc ) 1
SC1 = (s ) CCi1 sOix mod p (13)
i=1
n
ni
k
n
k
=( mijxk ) + i1 mod p
i=1 j=1 k=1 i=1
16
n
H (e(g,g)a ||y ||T Sc ) 1
SC2 = (s ) CCi2 sOiy mod p (14)
i=1
n
ni k
k
n
=( ( mijxk mijyk )) k +k + i2 mod p
i=1 j=1 k=1 k=1 i=1
In the above equations, the random numbers will be canceled out and
the left coecient sH (e(g,g) ||x ||T Sc ) will also be eliminated by its modular
a
inverse as (sH (e(g,g) ||x ||T Sc ) )1 sH (e(g,g) ||x ||T Sc ) = 1 mod p.
a a
k
rix +H (e(g,g)a ||x ||T Sc )
k
rix
ni
k
CCi1 s Oix
= sk=1 k k=1 k
( mijxk k + i1 ) (15)
j=1 k=1
ni k
a ||
x ||T Sc )
= sH (e(g,g) ( mijxk k + i1 )
j=1 k=1
k
k
k
k
rix +
riy +H (e(g,g)a ||x ||T Sc )
rix
riy
CCi2 s Oiy
= sk=1 k k=1 k k=1 k k=1 k
(16)
ni k
k
( ( mijxk mijyk ) k +k + i2 )
j=1 k=1 k=1
ni
k
k
H (e(g,g)a ||x ||T Sc )
=s ( ( mijxk mijyk ) k +k + i2 )
j=1 k=1 k=1
i n n k
SC1 SC1 (mod k )
SC1 = mod p = mijxk mod p (17)
k i=1 j=1 k=1
SC2 SC2 (mod k +k ) n ni k k
SC2 = mod p = ( m ijxk mijyk ) mod p
k +k i=1 j=1 k=1 k=1
(18)
17
Since p is a large prime under the condition (k1 k2 (k + k ) and k2
k3 ), the modular arithmetic (substraction and division) in our algorithm
will not eect the normal arithmetic, which is why the follow equations are
correct.
n
ni
k
n
ni
k
mijxk mod p = mijxk (19)
i=1 j=1 k=1 i=1 j=1 k=1
n
ni k
k
n
ni k
k
n
ni
k
k
( mijxk mijyk )
SC2 i=1 j=1 k=1 k=1
SP =
n =
n = SPcorrect (21)
ni ni
i=1 i=1
n
ni
k
k
( mijxk mijyk )
SC2 i=1 j=1 k=1 k=1
CF = = = CFcorrect (22)
SC1
n
ni
k
mijxk
i=1 j=1 k=1
5. Security Analysis
In this section, we analyze the security properties of the proposed scheme.
Specically, we will demonstrate that the proposed scheme achieves all the
security requirements dened earlier, and collusion attack resilience.
DOs database is privacy-preserving. To achieve the privacy preserva-
tion of DOs data, each data owner DOi s transaction j is outsourced as
Ci = {cij1 , cij2 , ..., cijL }, and each data item cijk of one transaction is en-
crypted as cijk = ( + rijk )srik mod p (if the transaction has this data item)
or rijk srik mod p (if the transaction does not have this data item). It is clear
that cijL is one-time masked with random number rijk and rik with bit length
larger than 100 bits. In addition, with the ciphertext ( +rijk )srik mod p and
rijk srik mod p, it is impossible for others to dierentiate these two ciphertexts
18
due to srik , which is a large number larger than p. Therefore, no one can
know whether a private transaction contains a data item or not, with the
exception of the entity who performs the encryption. We can ensure that
each encrypted transaction is privacy-preserving and only the DO can de-
crypt his/her private database. Although all DOs use the same shared s to
encrypt their database, they encrypt the data as cijk = ( + rijk )srik . That
is, dierent random numbers rik chosen by dierent data owners to encrypt
the same transaction will result in dierent values. Therefore, the databases
remain condential when other DOs try to guess the private databases be-
longing to another DO. We now demonstrate our scheme is able to re-
sist collusion attacks. Each data item of a transaction is encrypted as
cijk = ( +rijk )srik mod p. Although all DOs share s, , the random number
rik chosen by one data owner DOi generates a new mask srik which results
in the ciphertext being random and diers from those of other DOs. Even if
a DO, say DOj , shares his/her random number rjk with CS, it will not be
possible to recover the private database of DOi .
Achieving both authentication and data integrity. All communications in
our scheme are encrypted using the session key between two entities. Specif-
ically, each data owner DOi can calculate his/her own session key kic with
CS and kid with DC, and every session key is dierent from other DOs and
is timestamp T S dependent (i.e. the session key will change with a dierent
T S). When DOi outsources the database to CS, kic = H(Scsi ||IDc ||IDi ||T S)
can be used as the session key to encrypt the communication packages using
the AES algorithm. Similarly, kid = H(Sdsi ||IDd ||IDi ||T S) can be used as the
session key when communicating with DC. In addition, the encrypted com-
munication data is Ek (data||T S) and each entity can verify the decrypted
data and compare T S with the current timestamp for data integrity; thus,
resisting replay attacks. In addition, the proposed scheme achieves a ne-
grained access control by employing the key-policy attribute-based encryp-
tion technique [31]. Since e(g, g)a can only be recovered by the authenticated
data owner DOi using his/her access key P REK, the scheme achieves both
authentication and data integrity.
Mining results are condential. By calculating Oix and Oiy , DC is
the only entity who can compute the decryption key (ssx , ssy ) as ssx =
a a
sH (e(g,g) ) and ssy = sH (e(g,g) ) . Without the decryption key, CS or any data
SC1 SC1 ( mod k )
owner cannot calculate the support and condential by SC1 = k
19
Table 5: Parameter settings
Parameter k1 k2 k3 k k
Setting 1024 1024 200 128 2 1
+k
SC SC ( mod k )
and SC2 = 2 2
k +k
. Thus, the nal mining results cannot be ob-
tained by CS or any data owner, with the exception of DC.
6. Performance Evaluation
In this section, we evaluate the performance of our proposed scheme in
terms of computational costs. More specically, the performance metrics
used in the evaluation are 1) transactions encryption time (an indication of
computational costs for DOs), 2) time of association rule mining over hori-
zontally partitioned databases (an indication of computational costs for CS),
and 3) decryption time (an indication of computational costs for DC). In
addition, we compare our proposed scheme with EMHS [20], which is the
state-of-the-art privacy-preserving association rule mining over horizontally
partitioned databases. EMHS, primarily based on Paillier homomorphic en-
cryption technique, is described as follows. Each DO rst encrypts his/her
databases using the public key of DC and outsources the encrypted database
to the cloud based on pre-dened rules. Then, CS merges all databases
together, performs the homomorphic addition operations to obtain the en-
crypted sum data (encrypted support courts), and sends the results to DC.
Finally, DC decrypts the results and calculates the nal mining results, such
as the support and the condence of the rules.
In our experiment, we used a Windows 7 laptop with a 3.1 GHz processor
and 8GB RAM. In addition, we implemented both our scheme and EHMS
scheme in Java, using the parameter settings shown in Tab 5. Note that ,
k1 , k2 , and k3 are the security parameters dened in our scheme, and k and
k mean that a specic rule (i.e. Ux Uy , where Ux contains k data items
and Uy contains k data items) is considered in our simulation. To simulate
the process of association rule mining, we generated 1 million transactions
for 100 DOs, and each transaction contains 20 data items. For simplicity,
each entity of our proposed scheme (i.e. DO, CS, and DC) will be evaluated
separately, and the computational cost of these entities in our scheme were
used to compare with those in the EMHS scheme under the same conditions,
including the computational cost of encryption (with dierent transaction
20
numbers and data items), the computational cost of mining one rules in 1
million transactions from 100 DOs, and the computational cost of decryption
at the DC.
1800
1600 EMHS
Our scheme
1400
200
1200
EMHS
Time(seconds)
150
1000
Time(s)
100
800
Our scheme
50 600
0 400
10000
20 200
18
5000 16
14 0
12 1 2 3 4 5 6 7 8 9 10
Transactions 0 Rules
10 Data Items
(a) Encryption time with dierent trans- (b) Encryption time with dierent rules
actions and data items at DOs at DOs
21
in our scheme is signicantly lower. We also remark that we omitted the
computational costs involved in the authentication and the computation of
session key, since these functions are not part of our scheme.
35000
29744
30000
25000
20000
15000 13450
10000
5000
23 19
0
Running time at CS side (ms) Decryption time at DC side (ms)
EMHS Our scheme
Figure 3: Comparative summary of computational costs in our scheme and EMHS for CS
and DC
22
7. Related Work
In this section, we introduce related literature on privacy-preserving as-
sociation rule mining over horizontally and vertically distributed databases.
Specically, this paper seeks to solve the challenge of privacy-preserving as-
sociation rule mining over horizontally partitioned databases, while other
papers generally focus on privacy-preserving association rule mining over
vertically partitioned databases. Despite the dierent focuses, techniques
such as homomorphic encryption, dierential privacy, and secure multiparty
computation can be employed for both horizontally and vertically partitioned
databases.
Solutions based on homomorphic encryption. In solutions based on homo-
morphic encryption [25], all DOs databases are encrypted using the public
key of DC and are stored in cloud. When the DC wishes to mine association
rules over all distributed databases, CS performs the homomorphic compu-
tations and obtains the encrypted support and the condence of each rule.
Then, by using the private key, DC can easily decrypt the encrypted mining
results to obtain the support and the condence. There are two kinds of ho-
momorphic encryption, namely: partially homomorphic encryption (PHE)
[24] and fully homomorphic encryption (FHE) [25]. PHE-based solutions
[19, 20, 21] are more ecient than FHE-based solutions [22, 23], but the
former is unable to support exibly association rule mining.
Solutions based on dierential privacy and bloom lter. Solutions based
on dierential privacy [26] and bloom lters [12, 13] are designed without the
use of cryptographic techniques. By adding noises or using non-cryptographic
hash functions, these solutions [8, 9] can achieve association rule mining very
eciently. That is, all computations are performed without the need for
encryptions, which are generally computationally expensive. A key limitation
of such solutions is that the mining results are not accurate and can deviate
signicantly from the real results, due to the characteristics of dierential
privacy and bloom lter.
Solutions based on secure multiparty computation. By combining several
secure protocols together, solutions such as those described in [32, 15] can
achieve secure and exible association rule mining at the cost of eciency
(due to the complex protocols). In addition, most of them do not consider
collusion attack resilience, and assume that no DO will collude with the cloud
to game the system (an unrealistic assumption).
Dierent from the above solutions, building on our previous fast scale
23
product technique [27], our proposed scheme supports exible privacy-preserving
association rule mining with multiple DOs in the cloud. Thus, it can per-
form the data mining algorithm accurately and eciently. In addition, our
proposed scheme is designed to resist collusion attacks.
8. Conclusion
Big data trend is unlikely to go away anytime soon, particularly with the
increasing pervasiveness of consumer technologies and storage capabilities.
As noted in a recent survey [33], data analytical technologies have failed to
keep pace with the big data challenges. Privacy-preserving association rule
mining by collaborative data owners in a cloud environment is one of the
current research challenges.
In this paper, we proposed a secure and exible cloud-assisted association
rule mining over horizontally partitioned databases, which is designed to
achieve privacy-preserving association rule mining with distributed databases
while reducing privacy disclosure. We then presented a detailed security
analysis to demonstrate that the proposed scheme achieves secure and exible
association rule mining. In addition, based on our extensive performance
evaluation, we demonstrated the eciency of the scheme.
Future work will include integrating dierential privacy techniques with
our proposed scheme to achieve a more secure and robust scheme for associa-
tion rule mining, as well as designing the scheme to also work with vertically
partitioned databases (see [34]).
Acknowledgment
The authors would also like to thank the associate editor and the three
anonymous reviewers for their constructive feedback.
[1] Y. Hu, J. Yan, K.-K. R. Choo, Pedal: a dynamic analysis tool for ef-
cient concurrency bug reproduction in big data environment, Cluster
Computing 19 (1) (2016) 153166.
24
[3] D. Quick, K.-K. R. Choo, Data reduction and data mining framework
for digital forensic evidence: Storage, intelligence, review, and archive,
Trends & Issues in Crime and Criminal Justice 480 (2011) 111.
[4] D. Quick, K.-K. R. Choo, Big forensic data reduction: Digital forensic
images and electronic evidence, Cluster Computing.
[7] P. Giudici, S. Figini, Applied data mining for business and industry,
Applied Data Mining for Business and Industry, Second Edition (2009)
iviii.
25
Computational Intelligence and Data Mining, CIDM 2007, part of the
IEEE Symposium Series on Computational Intelligence 2007, Honolulu,
Hawaii, USA, 1-5 April 2007, 2007, pp. 654660.
[12] L. Qiu, Y. Li, X. Wu, Preserving privacy in association rule mining with
bloom lters, J. Intell. Inf. Syst. 29 (3) (2007) 253278.
[13] M. Kantarcioglu, R. Nix, J. Vaidya, An ecient approximate protocol
for privacy-preserving association rule mining, in: Advances in Knowl-
edge Discovery and Data Mining, 13th Pacic-Asia Conference, PAKDD
2009, Bangkok, Thailand, April 27-30, 2009, Proceedings, 2009, pp. 515
524.
[14] A. V. Evmievski, R. Srikant, R. Agrawal, J. Gehrke, Privacy preserving
mining of association rules, Inf. Syst. 29 (4) (2004) 343364.
[15] T. Tassa, Secure mining of association rules in horizontally distributed
databases, IEEE Trans. Knowl. Data Eng. 26 (4) (2014) 970983.
[16] Y. Saygin, V. S. Verykios, A. K. Elmagarmid, Privacy preserving as-
sociation rule mining, in: 12th International Workshop on Research
Issues in Data Engineering: Engineering E-Commerce/E-Business Sys-
tems, RIDE02, San Jose, California, USA, February 24-25, 2002, 2002,
pp. 151158.
[17] C. N. Modi, A. R. Patil, Proceedings of 3rd International Conference
on Advanced Computing, Networking and Informatics: ICACNI 2015,
Volume 2, Springer India, New Delhi, 2016, Ch. Privacy Preserving
Association Rule Mining in Horizontally Partitioned Databases Without
Involving Trusted Third Party (TTP), pp. 549555.
[18] C. Huang, R. Lu, Efpa:ecient and exible privacy-preserving mining of
association rule in cloud, in: 2015, IEEE/CIC International Conference
on Communications in China, ICCC 2015, Shenzhen, China, November
2-4, 2015, 2015.
[19] M. Hussein, A. El-Sisi, N. A. Ismail, Fast cryptographic privacy pre-
serving association rules mining on distributed homogenous data base,
in: Proceedings of The 2008 International Conference on Data Mining,
DMIN 2008, July 14-17, 2008, Las Vegas, USA, 2 Volumes, 2008, pp.
513519.
26
[20] X. C. Nguyen, H. B. Le, T. A. Cao, An enhanced scheme for
privacy-preserving association rules mining on horizontally distributed
databases, in: 2012 IEEE RIVF International Conference on Comput-
ing & Communication Technologies, Research, Innovation, and Vision
for the Future (RIVF), Ho Chi Minh City, Vietnam, February 27 - March
1, 2012, 2012, pp. 14.
[27] R. Lu, H. Zhu, X. Liu, J. K. Liu, J. Shao, Toward ecient and privacy-
preserving computing in big data era, IEEE Network 28 (4) (2014) 46
50.
27
[30] V. Goyal, O. Pandey, A. Sahai, B. Waters, Attribute-based encryption
for ne-grained access control of encrypted data, in: Proceedings of the
13th ACM Conference on Computer and Communications Security, CCS
2006, Alexandria, VA, USA, Ioctober 30 - November 3, 2006, 2006, pp.
8998.
28