Académique Documents
Professionnel Documents
Culture Documents
, and (1)
=1 , and (2)
m(1
Iog(2
/ m)
). (3)
Now we are showing table for (e
i
,o
i
) indistinguishability vs
(e,o) probabilistic differential privacy for around 200k user and
m =7
Privacy Guarantee
i
=50
i
=100
=2(e,e
i
=10) o =1.310
-18
o =4.710
-41
o
i
=1.410
-21
o
i
=5.210
-43
=5(e,e
i
=5) o =1.310
-18
o =1.310
-18
o
i
=1.310
-18
o
i
=1.310
-18
Now we are discussing that how to set the parameter z onJ
for ensuring that Zealous will achieve (e,o) probabilistic
differential privacy
For a search chunk S and positive number m,,
i
,z Zealous
achieve (e,o) probabilistic differential privacy if
International Journal of Computer Trends and Technology (IJCTT) volume4Issue8August 2013
ISSN: 2231-2803 http://www.ijcttjournal.org Page 2813
z 2m/ e , and (4)
i
max (z ln[22c
-
1
Z,z ln (
26
0.m/ :
)) (5)
Utility guarantee of Zealous algorithm in term of accuracy is
that Zealous is accurate for very frequent items and it provides
the perfect accuracy for the infrequent items.
Our (e,o) probabilistic differentially private algorithm
ZEALOUS is able to retain frequent items with probability at
least 1/2 while filtering out all infrequent items. On the other
hand, for any e -differentially private algorithm that can retain
frequent items with non-zero probability, its inaccuracy for large
item domains is larger than an algorithm that always outputs an
empty set.
IV-EXPERIMENTAL EVALUATION
In this section we are going to discuss implementation of
publishing privacy preserving search chunk. Then after we will
compare results of privacy preserving search chunk generated
by Zealous and search chunk generated by PDPP k-anonymity.
We will take original search chunk as a point of comparison
between both implementations. We are not going to compare
only utility of both algorithm, we also have to focus on
disclosure limitation guarantee of algorithms.
We experimentally compare both algorithms by publishing
search chunk for a portal. This portal having Job search, Study
materials details, Seminar details. Admin add all the detail
regarding jobs, study materials, seminar detail. A user is going
to search for jobs, study material, seminar information
according to their need. Collection of users search detail and his
personal detail becomes search chunk. We publish this search
chunk by implementing Zealous algorithm with high utility and
good privacy guarantees.
There are two way to measure the performance of algorithm. In
first measurement we are going to evaluate that how well output
of algorithm preserve selected statistics of the original search
chunks. Secondly by using our application we check utility of
Zealous. Here we discuss about Index caching as a
representative application for search performance and Query
substitution as representative application for search quality.
A. Utility Evaluation-
Fig 2-Diff between counts in k-anonymity and Zealous with original histogram
The above graph showing the utility evaluation between k-
anonymity and Zealous making original search chunk as a point
of comparison. As expected, with increasing e the average
difference decreases, since the noise added to each count
decreases. Similarly, by decreasing k the accuracy increases
because more queries will pass the threshold. We also computed
other metrics such as the root mean- square value of the
differences and the total variation difference; they all reveal
similar qualitative trends. Despite the fact that ZEALOUS
disregards many search chunk records ZEALOUS is able to
preserve the overall distribution well.
B.Index Caching-In Index caching problem we are going to
store in memory a set of posting list that maximize the hit
probability among all keywords. In our algorithm we proposed a
method by which we can decide which list should kept in
memory. Our algorithm first assigns each keyword a score,
which equals its frequency in the search chunk divided by the
number of documents that contain the keyword. Keywords are
chosen using a greedy bin-packing strategy where we
sequentially add posting lists from the keywords with the
highest score until the memory is filled. In our experiments we
fixed the memory size to be 200 MB, and each document
posting to be 4 Bytes. Our proposed index stores the document
posting list for each keyword sorted according to their relevance
which allows retrieving the documents in the order of their
relevance. We truncate this list in memory to contain at most
10000 documents. Hence, for an incoming query the search
engine retrieves the posting list for each keyword in the query
either from memory or from disk. If the intersection of the
posting lists happens to be empty, then less relevant documents
are retrieved from disk for those keywords for which only the
truncated posting list is kept on memory. Publishing search
chunk by our Zealous algorithm achieves better utility than
publishing search chunk by using PDPP K-anonymity for a
International Journal of Computer Trends and Technology (IJCTT) volume4Issue8August 2013
ISSN: 2231-2803 http://www.ijcttjournal.org Page 2814
range of parameters. Here we have seen that by increasing
privacy parameter or anonymity parameter utility going to
marginally suffers. This can be explained by the fact that it
requires only a few very frequent keywords to achieve a high
hitprobability. Keywords with a big positive impact on the hit-
probability are less likely to be filtered out by ZEALOUS than
keywords with a small positive impact. This explains the
marginal decrease in utility for increased privacy.
C. Query Substitution-In this section we will discuss how a
query substitution algorithm examine query pairs to learn that
how user re-phrase queries. Here in our algorithm we are going
to develop related queries for a query. This process is divided
into two phase .In first phase the original query is partitioned
into subsets of keyword whom we call as phrases based on their
mutual information. In the second phase for each phrase query
substitutions are determined based on the distribution of queries.
We run this algorithm to generate ranked substitution on the
sanitized search chunks. We then compare these rankings with
the rankings produced by the original search chunk which serve
as ground truth. To measure the quality of the query
substitutions,
we compute the precision/recall, MAP (mean average precision)
and NDG (normalized discounted cumulative gain) of the top
suggestions for each query.
Consider query q and list of top ranked substitution
q
0
,.q
]-1
computed based on a sanitized search chunk.
The precision and recall of a query from the sanitized search
chunk is as follows-
Precision(q) =
| {q
0,,q
]-1
}{q
0,.,q
]-1
} |
| {q
0,.,q
]-1
} |
Recall(q) =
| {q
0,,q
]-1
}{q
0,.,q
]-1
} |
| {q
0,,q
]-1
} |
and
MAP(q) =
+1
unk o] q
i
n jq
0
,.q
]-1
[+1
]-1
=0
V- CONCLUSION
Publishing search chunk is a very useful phenomenon for
researcher and scientists. But search chunk contains sensitive
information and previously given techniques were not sufficient
for publishing search chunk. In this paper we introduced a new
technique for publishing search chunk by using Zealous
Algorithm. In this paper we showed that by implementing this
technique we are able to publish search chunk in efficient and
safe manner. Sensitive information will not disclose for all. By
using this technique we are able to publish search chunk with
high utility and low disclosure probability. In this paper we
compared proposed technique with previous technique and
shown their comparison result on behalf of which we can
declare that publishing search chunk by using proposed
technique is much efficient and safe compare to anonymity
techniques.
REFFERENCES
Soumen Chakrabarti, Rajiv Khanna, Uma Sawant, and Chiru
Bhattacharyya. Structured learning for non-smooth ranking
losses. In KDD, pages 8896, 2008.
Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya
Mironov, and Moni Naor. Our data, ourselves: Privacy via
distributed noise generation. In EUROCRYPT, 2006.
Yeye He and Jeffrey F. Naughton. Anonymization of set-valued
data via top-down, local generalization. PVLDB, 2(1):934945,
2009.
Eytan Adar. User 4xxxxx9: Anonymizing query logs. In WWW
Workshop on Query Log Analy sis, 2007.
Roberto Baeza-Yates. Web usage mining in search engines.
Web Mining: Applications and techniques.
AUTHORS PROFILE
Thappita Sumalatha,
Pursuing M.Tech(CSE) from
Vikas Group Of Institutions
(Formerly known as Mother
Theresa Educational Society
Group of Institutions),
Nunna, Vijayawada.
Affiliated to JNTU-Kakinada,
A.P., India
International Journal of Computer Trends and Technology (IJCTT) volume4Issue8August 2013
ISSN: 2231-2803 http://www.ijcttjournal.org Page 2815
Ch. Sandhya Rani,
working as a Assistant
Professor of CSE
department at Vikas Group
Of Institutions, Nunna,
Vijayawada, Affiliated to
JNTU-Kakinada, A.P., India
Betam Suresh, is working
as an HOD, Department of
Computer science
Engineering at Vikas Group
of Institutions (Formerly
Mother Teresa Educational
society Group of
Institutions), Nunna,
Vijayawada, Affiliated to
JNTU-Kakinada, A.P., India