Académique Documents
Professionnel Documents
Culture Documents
Elaine Shi
Some slides adapted from Adam Smiths lecture and other talk slides
1
Roadmap
Defining Differential Privacy
Techniques for Achieving DP
Output perturbation
Input perturbation
Perturbation of intermediate values
Sample and aggregate
General Setting
Medical data
Query logs
Social network data
Data mining
Statistical queries
General Setting
publish
Data mining
Statistical queries
Blatant Non-Privacy
Blatant Non-Privacy
Leak individual records
Can link with public databases to re-identify individual
s
Allow adversary to reconstruct database with signific
ant probablity
Attempt 2:
I am releasing
researching findings
showing that people
who smoke are very
likely to get cancer.
10
11
An Impossibility Result
13
14
Pr[A(x) S] e Pr[A(x) S]
16
Since my
mechanism is DP,
whether or not you
participate, your
privacy loss would 4
17
Notable Properties of DP
Adversary knows arbitrary auxiliary information
No linkage attacks
18
Notable Properties of DP
19
DP Techniques
20
21
x,x neighbors
22
Intuition:
24
Average
Histograms and contingency tables
Covariance matrix
[BDMN] Many data-mining algorithms can be implemente
d through a sequence of low-sensitivity queries
Perceptron, some EM algorithms, SQ learning algorithms
25
26
PINQ
27
PINQ
Language for writing differentially-private data analyses
Language extension to .NET framework
Provides a SQL-like interface for querying data
Goal: Hopefully, non-privacy experts can perform privacypreserving data analytics
28
Scenario
Query through
PINQ interface
Trusted
curator
Data analyst
29
Example 1
30
Example 2: K-Means
31
32
Partition
O1
Ok
O2
O1
P1
P2
Ok
O2
Pk
P1
P2
Pk
33
Sequential composition
Parallel composition
34
35
No satisfactory
solution yet!
36
?
37
Transformations
Where
Select
GroupBy
Join
38
39
40
Continual Setting
41
42
Comparison
Method
Error
Output
perturbation
Input
perturbation
Perturbation of
Intermediate
results
43
[1, 4]
[5, 8]
[1, 2]
8
44
[1, 4]
[5, 8]
[1, 2]
8
45
Key Observation
46
47
48
49
50
Theorem:
The sample and aggregate algorithm
preserves -DP, and converges to the
true value when the statistic f is
asymptotically normal on a database
consisting of i.i.d. values.
51
Asymptotically Normal
CLT: sum of h(xi) where h(Xi) has finite expectation and v
ariance
Common maximum likelihood estimators
Estimators for common regression problems
52
53
Other Notions
Noiseless privacy
Crowd-blending privacy
54
Homework
If I randomly sample one record from a large database consisting of
many records, and publish that record, would this be differentially pr
ivate? Prove or disprove this. (If you cannot give a formal proof, say
why or why not).
Suppose I have a very large database (e.g., containing ages of all p
eople living in Maryland), and I publish the average age of all peopl
e in the database. Intuitively, do you think this preserves users' priva
cy? Is this differentially private? Prove or disprove this. (If you canno
t give a formal proof, say why or why not).
What do you think are the pros and cons of differential privacy?
Anlyze Input Perturbation(Second techniques for achieving DP)
55
Reading list
56