05 Indexing

Indexing Techniques
Mei-Chen Yeh
Last week
Matching two sets of features
Strategy 1
Convert to a fixed-length feature vector
(Bag-of-words)
Use a conventional proximity measure
Strategy 2:
Build point correspondences
Last week: bag-of-words
visual vocabulary
frequen
cy

codeword ..
s
Matching local features: building
patch correspondences
Image 1 Image 2
To generate candidate matches, find

patches that have the most similar
appearance (e.g., lowest SSD)
Slide credits: Prof. Kristen Graum
Matching local features: building
patch correspondences
Image 1 Image 2
Simplest approach: compare them all, take

the closest (or closest k, or within a
thresholded distance)
Slide credits: Prof. Kristen Grauma
Indexing local features
Each patch / region has a descriptor, which
is a point in some high-dimensional
feature space (e.g., SIFT)
Descriptors
feature space
Database
images
When we see close points in feature space,
we have similar descriptors, which
indicates similar local content.
Query
Descriptors
image
feature space
Database
images
Problem statement
With potentially thousands of
features per image, and hundreds
to millions of images to search, how
to efficiently find those that are
relevant to a new image?
50 thousand images
4m
Slide credit: Nistr and Stewnius

110 million images?
Scalability matters!
The Nearest-Neighbor Search
Problem
Given
A set S of n points in d dimensions
Time
A query point q complexity of
Which point in S is closest tolinear
dn scan:
q?
O( ? )
?
Problem
Problem
r-nearest neighbor
for any query q, returns a point p S
s.t. p q r
c-approximate r-nearest neighbor

for any query q, returns a point p S
s.t. p 'q cr
Today
Inverted file
Vocabulary tree
Locality sensitivity hashing
Indexi
ng
local
featur
es:
invert
Indexing local features:
inverted file
For text documents, an efficient way
to find all pages on which a word
occurs is to use an index.
We want to find all images in which a
feature occurs.
page image
word feature
To use this idea, well need to map
our features to visual words.
Text retrieval vs. image
search
What makes the problems similar,
different?
Visual words
e.g., SIFT descriptor space: each

point is 128-dimensional
Extract some local features from a number of

images
Slide credit: D. Nister, CVPR 2006
Visual words
Visual words
Visual words
Each point is a
local
descriptor,
e.g. SIFT
vector.
Example: Quantize into 3 word
Visual words
Map high-dimensional descriptors to
tokens/words by quantizing the feature via
Quantize space
clustering, let
cluster centers
be the prototype
words
Determine which
Word word to assign
#2 to each new
Descriptors
image region by
feature space finding the
closest cluster
center.
Visual words
Each group of
patches
belongs to the
same visual
word!
Figure from Sivic & Zisserman, ICCV 2003

Visual vocabulary formation
Issues:
Sampling strategy: where to extract
features? Fixed locations or interest
points?
Clustering / quantization algorithm
What corpus provides features (universal
vocabulary?)
Vocabulary size, number of words
Weight of each word?
Inverted file index
Why the index

give us a
significant gain
in efficiency?
The index maps word-to-image ids

Inverted file index
A query image is matched to database images that share visual

words.
tf-idf weighting
Term frequency inverse document
frequency
Describe the frequency of each word
within an image, decrease the weights of
the words that appear
w often in the regions
discriminative
database w common regions
economic, trade,
the, most, we,
tf-idf weighting
Term frequency inverse document
frequency
Describe the frequency of each word
within an image, decrease the weights of
the words that appear often in the
Number of Total number of
database
occurrences of documents in
word i in database
document d Number of
Number of documents word i
words in occurs in, in
document d whole database
Bag-of-Words + Inverted file
Bag-of-words representation
http://people.cs.ubc.ca/~lowe/keypoints/
Inverted file
http://www.robots.ox.ac.uk/~vgg
/research/vgoogle/index.html
Slide credit: Xin Yang

D. Nistr and H. Stewenius. Scalable
Recognition with a Vocabulary Tree, CVPR
2006.
Visualize as a tree
Vocabulary Tree
Training: Filling the tree
Perceptual
Visual and Sensory
Object Augmented
Recognition Computing
Tutorial
[Nister & Stewenius, CVPR06]
Slide credit: David Nister

Vocabulary Tree
Perceptual
Visual and Sensory
Object Augmented
Tutorial

Vocabulary Tree
Perceptual
Visual and Sensory
Object Augmented
Tutorial

Vocabulary Tree
Perceptual
Visual and Sensory
Object Augmented
Tutorial

Vocabulary Tree
Perceptual
Visual and Sensory
Object Augmented
Tutorial

42
Vocabulary Tree
Recognition
Computing
Retrieved
Or perform geometric
Tutorial
verification
Object
Perceptual
Visual Augmented
Recognition
and Sensory

Think about the computational
advantage of the hierarchical tree vs. a
flat vocabulary!
Hashing
Direct addressing
Create a direct-address table with
m slots
U 0 key satellite data
(universe of keys) 1
2 2
9 4 0 6
1 7 3 3
K 4
(actual keys) 5 5
2 6
3
5 8 7
8 8
9
Direct addressing
Search operation: O(1)
Problem: The range of keys can be
large!
64-bit numbers =>
U different
18,446,744,073,709,551,616
keys
SIFT: 128 * 8 bits
K
Hashing
O(1) average-case time
Use a hash function h to compute
the slot from the key k table
T: hash
0
U
(universe of keys) h(k1) may not be k1 anymore
h(k4)
K
(actual keys) h(k5)= h(k3) may share a buck
k1
k4
k5 m-1
k3
Hashing
A good hash function
Satisfies the assumption of simple
uniform hashing: each key is equally
likely to hash to any of the m slots.
How to design a hash function for
indexing high-dimensional data?
128-d
T: hash table
?
Locality-sensitive hashing
Indyk and Motwani. Approximate

nearest neighbors: towards removing
the curse of dimensionality, STOC
1998.
Locality-sensitive hashing (LSH)
Hash functions are locality-sensitive,

if, for any pair of points p, q we have:
Pr[h(p)=h(q)] is high if p is close to q
Pr[h(p)=h(q)] is low if p is far from q
Pr[h( x) h( y )] sim( x, y ),
hF
Locality Sensitive Hashing
A family H of functions h: Rd U is
called (r, cr, P1, P2)-sensitive, if for
p qq: r
any p,
if p q cr then Pr[h(p)=h(q)] > P1
if then Pr[h(p)=h(q)] < P2
LSH Function: Hamming
Space
Consider binary vectors
points from {0, 1}d
Hamming distance D(p, q) = # positions
on which p and q differ
Example: (d = 3)
D(100, 011) =
3
D(010, 111) =
2
Space
Define hash function h as hi(p) = pi
where pi is the i-th bit of p
Example: select the 1st dimension
h(010) = 0
h(111) = 1
Pr[h(010)h(111)] = vs.
?= D(p,
D(p, q)?
q)/dd?
Pr[h(p)=h(q)] 1=- ?D(p, q)/d
Clearly, h is locality sensitive.
Space
A k-bit locality-sensitive hash
function is defined as g(p) = [h1(p),
h2(p), , hk(p)]T
Each hi(p) is chosen randomly
k
Each hi(p) results in a single bit 1
Pr(similar points collide) 1 1 P
1
Pr(dissimilar points collide) P2k
Indyk and Motwani [1998]

LSH Function: R space 2
Consider 2-d vectors

LSH Function: R space 2
The probability that a random hyperplane

separates two unit vectors depends on the
angle between them:
LSH Pre-processing
Each image is entered into L hash
tables indexed by independently
constructed g1, g2, , gL
Preprocessing Space: O(LN)
LSH Querying
For each hash table, return the bin
indexed by gi(q), 1 i L.
Perform a linear search on the union of
the bins.
W. T Lee and H. T. Chen. Probing
the local-feature space of interest
points, ICIP 2010.
Hash family
The dot-product av projects each

vector v to a line
a: random vector sampled from a Gaussian

distribution
b: real value chosen uniformly from the range

[0 , r]
r: segment width
Building the hash table
: segment width
(max-min)/t
For each random

projection, we get t
buckets.
Generate K projections
Combing them to
get an index in the
hash table:
How many buckets

do we get?
tK
Example
5 projections (K = 5)
15 segments (t = 15)
155 = 759,375 buckets in total!
Sketching the Feature Space
Natural image patches Noise image patches

(from Berkeley segmentation (Randomly-generated noise
database ) patches)
Collect three image patches of different size 16x16 ,

32x32 , 64x64
Each set consist of 200,000 patches.

Patch distribution over
buckets
Summary
Indexing techniques are essential for
organizing a database and for
enabling fast matching.
For indexing high-dimensional data
Inverted file
Vocabulary tree
Locality sensitive hashing
Resources and extended
readings
LSH Matlab Toolbox
http://www.cs.brown.edu/~gregory/down
load.html
Yeh et al., Adaptive Vocabulary
Forests for Dynamic Indexing and
Category Learning, ICCV 2007.

05 Indexing

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

05 Indexing

Transféré par

Droits d'auteur :

Formats disponibles

Indexing Techniques

To generate candidate matches, find

Simplest approach: compare them all, take

Slide credit: Nistr and Stewnius

c-approximate r-nearest neighbor

e.g., SIFT descriptor space: each

Extract some local features from a number of

Figure from Sivic & Zisserman, ICCV 2003

Why the index

The index maps word-to-image ids

A query image is matched to database images that share visual

Slide credit: Xin Yang

[Nister & Stewenius, CVPR06]

Slide credit: David Nister

[Nister & Stewenius, CVPR06]

Slide credit: David Nister

[Nister & Stewenius, CVPR06]

Slide credit: David Nister

[Nister & Stewenius, CVPR06]

Slide credit: David Nister

[Nister & Stewenius, CVPR06]

[Nister & Stewenius, CVPR06]

Slide credit: David Nister

Indyk and Motwani. Approximate

Hash functions are locality-sensitive,

Pr(dissimilar points collide) P2k

Indyk and Motwani [1998]

Consider 2-d vectors

The probability that a random hyperplane

The dot-product av projects each

a: random vector sampled from a Gaussian

b: real value chosen uniformly from the range

For each random

How many buckets

Natural image patches Noise image patches

Collect three image patches of different size 16x16 ,

Each set consist of 200,000 patches.

Vous aimerez peut-être aussi