Vous êtes sur la page 1sur 70

Indexing Techniques

Mei-Chen Yeh
Last week
Matching two sets of features
Strategy 1
Convert to a fixed-length feature vector
(Bag-of-words)
Use a conventional proximity measure
Strategy 2:
Build point correspondences
Last week: bag-of-words
visual vocabulary

frequen
cy

codeword ..
s
Matching local features: building
patch correspondences

Image 1 Image 2

To generate candidate matches, find


patches that have the most similar
appearance (e.g., lowest SSD)
Slide credits: Prof. Kristen Graum
Matching local features: building
patch correspondences

Image 1 Image 2

Simplest approach: compare them all, take


the closest (or closest k, or within a
thresholded distance)
Slide credits: Prof. Kristen Grauma
Indexing local features
Each patch / region has a descriptor, which
is a point in some high-dimensional
feature space (e.g., SIFT)

Descriptors
feature space
Database
images
Indexing local features
When we see close points in feature space,
we have similar descriptors, which
indicates similar local content.

Query
Descriptors
image
feature space
Database
images
Problem statement
With potentially thousands of
features per image, and hundreds
to millions of images to search, how
to efficiently find those that are
relevant to a new image?
50 thousand images

4m

Slide credit: Nistr and Stewnius


110 million images?
Scalability matters!
The Nearest-Neighbor Search
Problem
Given
A set S of n points in d dimensions
Time
A query point q complexity of
Which point in S is closest tolinear
dn scan:
q?
O( ? )

?
The Nearest-Neighbor Search
Problem
The Nearest-Neighbor Search
Problem
r-nearest neighbor
for any query q, returns a point p S
s.t. p q r

c-approximate r-nearest neighbor


for any query q, returns a point p S
s.t. p 'q cr
Today
Indexing local features
Inverted file
Vocabulary tree
Locality sensitivity hashing
Indexi
ng
local
featur
es:
invert
Indexing local features:
inverted file
For text documents, an efficient way
to find all pages on which a word
occurs is to use an index.
We want to find all images in which a
feature occurs.
page image
word feature
To use this idea, well need to map
our features to visual words.
Text retrieval vs. image
search
What makes the problems similar,
different?
Visual words

e.g., SIFT descriptor space: each


point is 128-dimensional

Extract some local features from a number of


images
Slide credit: D. Nister, CVPR 2006
Visual words
Visual words
Visual words
Each point is a
local
descriptor,
e.g. SIFT
vector.
Example: Quantize into 3 word
Visual words
Map high-dimensional descriptors to
tokens/words by quantizing the feature via
Quantize space
clustering, let
cluster centers
be the prototype
words
Determine which
Word word to assign
#2 to each new
Descriptors
image region by
feature space finding the
closest cluster
center.
Visual words
Each group of
patches
belongs to the
same visual
word!

Figure from Sivic & Zisserman, ICCV 2003


Visual vocabulary formation
Issues:
Sampling strategy: where to extract
features? Fixed locations or interest
points?
Clustering / quantization algorithm
What corpus provides features (universal
vocabulary?)
Vocabulary size, number of words
Weight of each word?
Inverted file index

Why the index


give us a
significant gain
in efficiency?

The index maps word-to-image ids


Inverted file index

A query image is matched to database images that share visual


words.
tf-idf weighting
Term frequency inverse document
frequency
Describe the frequency of each word
within an image, decrease the weights of
the words that appear
w often in the regions
discriminative
database w common regions
economic, trade,
the, most, we,
tf-idf weighting
Term frequency inverse document
frequency
Describe the frequency of each word
within an image, decrease the weights of
the words that appear often in the
Number of Total number of
database
occurrences of documents in
word i in database
document d Number of
Number of documents word i
words in occurs in, in
document d whole database
Bag-of-Words + Inverted file
Bag-of-words representation
http://people.cs.ubc.ca/~lowe/keypoints/

Inverted file
http://www.robots.ox.ac.uk/~vgg
/research/vgoogle/index.html

Slide credit: Xin Yang


D. Nistr and H. Stewenius. Scalable
Recognition with a Vocabulary Tree, CVPR
2006.
Visualize as a tree
Vocabulary Tree
Training: Filling the tree
Perceptual
Visual and Sensory
Object Augmented
Recognition Computing
Tutorial

[Nister & Stewenius, CVPR06]

Slide credit: David Nister


Vocabulary Tree
Training: Filling the tree
Perceptual
Visual and Sensory
Object Augmented
Recognition Computing
Tutorial

[Nister & Stewenius, CVPR06]

Slide credit: David Nister


Vocabulary Tree
Training: Filling the tree
Perceptual
Visual and Sensory
Object Augmented
Recognition Computing
Tutorial

[Nister & Stewenius, CVPR06]

Slide credit: David Nister


Vocabulary Tree
Training: Filling the tree
Perceptual
Visual and Sensory
Object Augmented
Recognition Computing
Tutorial

[Nister & Stewenius, CVPR06]

Slide credit: David Nister


Vocabulary Tree
Training: Filling the tree
Perceptual
Visual and Sensory
Object Augmented
Recognition Computing
Tutorial

[Nister & Stewenius, CVPR06]


42
Slide credit: David Nister
Vocabulary Tree
Recognition
Computing

Retrieved
Or perform geometric
Tutorial

verification
Object
Perceptual
Visual Augmented
Recognition
and Sensory

[Nister & Stewenius, CVPR06]

Slide credit: David Nister


Think about the computational
advantage of the hierarchical tree vs. a
flat vocabulary!
Hashing
Direct addressing
Create a direct-address table with
m slots
U 0 key satellite data
(universe of keys) 1
2 2
9 4 0 6
1 7 3 3
K 4
(actual keys) 5 5
2 6
3
5 8 7
8 8
9
Direct addressing
Search operation: O(1)
Problem: The range of keys can be
large!
64-bit numbers =>
U different
18,446,744,073,709,551,616
keys
SIFT: 128 * 8 bits
K
Hashing
O(1) average-case time
Use a hash function h to compute
the slot from the key k table
T: hash
0
U
(universe of keys) h(k1) may not be k1 anymore
h(k4)
K
(actual keys) h(k5)= h(k3) may share a buck
k1
k4
k5 m-1
k3
Hashing
A good hash function
Satisfies the assumption of simple
uniform hashing: each key is equally
likely to hash to any of the m slots.
How to design a hash function for
indexing high-dimensional data?
128-d

T: hash table

?
Locality-sensitive hashing

Indyk and Motwani. Approximate


nearest neighbors: towards removing
the curse of dimensionality, STOC
1998.
Locality-sensitive hashing (LSH)

Hash functions are locality-sensitive,


if, for any pair of points p, q we have:
Pr[h(p)=h(q)] is high if p is close to q
Pr[h(p)=h(q)] is low if p is far from q

Pr[h( x) h( y )] sim( x, y ),
hF
Locality Sensitive Hashing
A family H of functions h: Rd U is
called (r, cr, P1, P2)-sensitive, if for
p qq: r
any p,
if p q cr then Pr[h(p)=h(q)] > P1
if then Pr[h(p)=h(q)] < P2
LSH Function: Hamming
Space
Consider binary vectors
points from {0, 1}d
Hamming distance D(p, q) = # positions
on which p and q differ
Example: (d = 3)
D(100, 011) =
3
D(010, 111) =
2
LSH Function: Hamming
Space
Define hash function h as hi(p) = pi
where pi is the i-th bit of p
Example: select the 1st dimension
h(010) = 0
h(111) = 1
Pr[h(010)h(111)] = vs.
?= D(p,
D(p, q)?
q)/dd?
Pr[h(p)=h(q)] 1=- ?D(p, q)/d
Clearly, h is locality sensitive.
LSH Function: Hamming
Space
A k-bit locality-sensitive hash
function is defined as g(p) = [h1(p),
h2(p), , hk(p)]T
Each hi(p) is chosen randomly
k
Each hi(p) results in a single bit 1
Pr(similar points collide) 1 1 P
1

Pr(dissimilar points collide) P2k

Indyk and Motwani [1998]


LSH Function: R space 2

Consider 2-d vectors


LSH Function: R space 2

The probability that a random hyperplane


separates two unit vectors depends on the
angle between them:
LSH Pre-processing
Each image is entered into L hash
tables indexed by independently
constructed g1, g2, , gL
Preprocessing Space: O(LN)
LSH Querying
For each hash table, return the bin
indexed by gi(q), 1 i L.
Perform a linear search on the union of
the bins.
W. T Lee and H. T. Chen. Probing
the local-feature space of interest
points, ICIP 2010.
Hash family

The dot-product av projects each


vector v to a line

a: random vector sampled from a Gaussian


distribution

b: real value chosen uniformly from the range


[0 , r]

r: segment width
Building the hash table
Building the hash table

: segment width
(max-min)/t

For each random


projection, we get t
buckets.
Building the hash table
Generate K projections
Combing them to
get an index in the
hash table:

How many buckets


do we get?
tK
Building the hash table
Example
5 projections (K = 5)
15 segments (t = 15)
155 = 759,375 buckets in total!
Sketching the Feature Space

Natural image patches Noise image patches


(from Berkeley segmentation (Randomly-generated noise
database ) patches)

Collect three image patches of different size 16x16 ,


32x32 , 64x64

Each set consist of 200,000 patches.


Patch distribution over
buckets
Summary
Indexing techniques are essential for
organizing a database and for
enabling fast matching.
For indexing high-dimensional data
Inverted file
Vocabulary tree
Locality sensitive hashing
Resources and extended
readings
LSH Matlab Toolbox
http://www.cs.brown.edu/~gregory/down
load.html
Yeh et al., Adaptive Vocabulary
Forests for Dynamic Indexing and
Category Learning, ICCV 2007.

Vous aimerez peut-être aussi