Académique Documents
Professionnel Documents
Culture Documents
Agenda
Use Case
PCSA
Sample Input
DC SP (50) APP (100) MDN (100 M)
A youtube app1 123456789
A google app2 123456789
A youtube app1 938745695
A google app1 987694567
A youtube app3 123456789
A google app4 123456789
A youtube app1 938745695
A google app2 987694567
Objective is:
To achieve similar results with lesser
space/memory utilization.
4KB or below for each dimension (for Insta)
Linear Probabilistic Counting
Algo Insert:
bit[] buffer
For i in stream:
h = hash(i)
p = h % (buffer.size)
buffer[p] = 1
Algo count:
m = buffer.size
w = number of 1-bits in buffer
return m * ln ( (m w) / m)
LPC
Pros:
Very Low average error rate: 2%
Cons:
Handles low cardinality about 20000
But we use it for cardinality upto 12000 only
PCSA (Probabilistic Counting and Stochastic
Averaging)
Algo:
For i in stream:
h = hash(i)
q, r = h / number of buffer
(899)
k = first 1-bit in q
choose the r buffer
set the k bit to 1
Uniform Random Hash:
In a uniform random hash, the probability of 0/1 on each
position is equal.
Count:
y = position of last 1 bit in buff
function(2^y)
Count:
l = position of first 0-bit in buff
function(2^y * n)
PCSA
Pros:
Low average error rate: 5%
Can handle large cardinality.