Seminar Presentation

Scalable Nonlinear Embeddings for Semantic
Category-based image retrieval
Presented by: Victor Hugo Contreras O.

vhcontreraso@unal.edu.co
Universidad Nacional de Colombia
March 13, 2017
Presented by: Victor Hugo Contreras O. vhcontreraso@unal.edu.co

Scalable Nonlinear Embeddings
(Universidad
for Semantic
Nacional
Category-based
de Colombia March
image retrieval
13, 2017 1 / 27
)
Outline
1 About the paper
2 Task & Motivation
3 Introduction
4 Background
5 Proposed method
6 Experimental results
7 Discussion and conclusion
8 References

(Universidad
for Semantic
Nacional
Category-based
de Colombia March
image retrieval
13, 2017 2 / 27
)
About the paper
Title: Scalable Nonlinear Embeddings for Semantic Category-based Image

Retrieval.
Authors: Gaurav Sharma and Bernt Schiele.
Institute: Max Planck Institute for Informatics, Germany.
year: 2015.
publisher: IEEE International Conference on Computer Vision. Available
in: https://arxiv.org/abs/1509.08902

(Universidad
for Semantic
Nacional
Category-based
de Colombia March
image retrieval
13, 2017 3 / 27
)
Task & Motivation
Task
Learning a distance metric by embedding the vectors into a low
dimensional Euclidean space.
Motivation
Learning a distance metrics for comparing a multidimensional vectors is a
fundamental problem in computer vision. [...] , many important computer
vision problems such as image (object, scene, etc.) classification and
retrieval become trivial using nearest neighbor search.

(Universidad
for Semantic
Nacional
Category-based
de Colombia March
image retrieval
13, 2017 4 / 27
)
Introduction
Metric learning has achieved much success in computer vision task such
that:
Image auto-annotation.
Face verification.
visual tracking.
person identification.
nearest neighbor based image classification.

(Universidad
for Semantic
Nacional
Category-based
de Colombia March
image retrieval
13, 2017 5 / 27
)
Background: Supervised discriminative metric learning
with pairwise constraints I
Given a dataset X of positive and negative pairs of vectors i.e.

X = {(xi , xj , yij )|(i, j) I N2 }, with xi RD and yij {1, +1}, the
task is to learn a distance function on RD .

(Universidad
for Semantic
Nacional
Category-based
de Colombia March
image retrieval
13, 2017 6 / 27
)
Learning a Mahalanobis-like Metric Distance
Many metric learning approaches learn, from X , a Mahalanobis-like metric

parametrized by matrix M RDxD , i.e.
d 2 (xi , xj ) = (xi xj )T M(xi xj ) (1)
M is required to be symmetric positive semi-definite matrix, and hence can

be factorized as M = LT L,
with L RdxD and d D.

(Universidad
for Semantic
Nacional
Category-based
de Colombia March
image retrieval
13, 2017 7 / 27
)
Learning a projection
With equation (1) the metric learning can be seen as learning a projection
upon which the comparison is done using Euclidean distance in the
resulting space i.e.
i Lx
d 2 (xi , xj ) = ||Lx j ||22 (2)

(Universidad
for Semantic
Nacional
Category-based
de Colombia March
image retrieval
13, 2017 8 / 27
)
The loss function
Learning the distance function in equation (2) can be achieved by

minimizing the objective with hinge loss:
X
min F = max(0, 1 yij {b d 2 (xi , xj )}) (3)

L,b
X
which aims to learn L such that the positive pairs are at distances less
than b 1, to each other, while the negatives are at distances greater than
b + 1, with b being the bias parameter. Explicit regularization is absent as
often d << D the rank d of the learnt metric M = LT L is fixed to be
small.

(Universidad
for Semantic
Nacional
Category-based
de Colombia March
image retrieval
13, 2017 9 / 27
)
Nonlinear distance functions I
Invoke representer theorem like condition and write the rows of L as linear
combinations of input vectors i.e. L = AX T (where X is the matrix of all
vectors xi as columns).
Mapping the vectors with a non-linear feature map : RD F and the
using kernel trick i.e. k(xi , xj ) = h(xi ), (xj )iF , we can proceed as
follows,
d 2 (xi , xj ) = ||AXT (Xi ) AXT (xj )||2 = ||A(ki kj )||2 (4)
where, X = [(x1 ), (x2 ), . . . , (xn )] is the matrix of mapped vectors

and
kt = XT xt = [k(x1 , xt ), k(x2 , xf ), . . . , K (xn , xt )]T (5)

(Universidad
for Semantic
Nacional
Category-based
de ColombiaMarch
image13,
retrieval
2017 10 / 27
)
Proposed method I
We can view the projection, with matrix L in the mapped space F as

h i
L (x) = hl
1 , (x)iF , . . . , hld , (x)iF (6)
where l
t F are rows of L .

We assume that there exists lt RD such that either (lt ) = l

t i.e.
D
lt R is the pre-image of lt F or, if such pre-image doesnt exist,
then (lt ) l D
t i.e.lt R is an approximation for the pre-image of (lt ).

Once we have (lt ) = lt , we can write
L (x) = [h(l1 ), (x)iF , . . . , h(ld ), (x)iF ]

(7)
= [k(l1 , x), k(l2 , x), . . . , k(ld , x)]
we now do a nonlinear projection on the rows of L = [lT T T

1 , l2 , . . . , ld ].

(Universidad
for Semantic
Nacional
Category-based
de ColombiaMarch
image13,
retrieval
2017 11 / 27
)
Proposed method II
Following our assumption, the distance function computed in the feature
space becomes,
d
X
d2 (xi , xj ) = ||L (xi ) L (xj )||2 = (k(lt , xi ) k(lt , xj ))2 (8)
t=1
With this formulation, the parameters to be learned are the elements of

the matrix L i.e. lt RD t = 1, . . . , d. Note that this matrix is different
from the L matrix in the linear function distance case. They propose to
learn L with a standard max margin hinge loss based objective. The
optimization problem thus takes the form,
X
min F = max(0, m yij {b d2 (xi , xj )}) (9)
L
X
where the fixed margin of 1 is replaced by free parameter m, selected

according with the kernel boundaries.
(Universidad
for Semantic
Nacional
Category-based
de ColombiaMarch
image13,
retrieval
2017 12 / 27
)
Proposed method III
The sub-gradients required for the SGD iterations in the algorithm are
calculable analytically and are as follows,
(
0 if yij {b d2 (xi , xj )}
lt F ({(xi , xj , yij )}; L, b) =
2y ij(kit kjt )(lt kit lt kjt ) otherwise,
(10)
t
where ki = k(lt , xi ) and
(
0 if yij {b d2 (xi , xj )} 1
b F ({(xi , xj , yij )}; L, b) = (11)
yij otherwise
They used a shifted X 2 kernel, shown to be quite effective for image

classification, etc. given by
D
X 2xc yc
k(x, y) = (12)
|xc | + |yc |
c=1
(Universidad
for Semantic
Nacional
Category-based
de ColombiaMarch
image13,
retrieval
2017 13 / 27
)
Proposed method IV
The gradient for the X 2 kernel is also analytically calculable and given by.
2xd |xd |
l k(l, x) = (13)
(|xd | + |ld |)2
The complexity of the stochastic updates is O(dD) (kernel evaluations

with lt t = 1, . . . , d), while the complexity of similar updates with the
traditional parametrization will be O(ND) where N is the number of
distinct training vectors, as lt are linear combinations of (xi ). Since d is
fixed and small (d N) the updates are relatively inexpensive and the
algorithm is, thus, scalable to order of millions of training points. With the
SGD training the algorithm can be applied in a online setting.

(Universidad
for Semantic
Nacional
Category-based
de ColombiaMarch
image13,
retrieval
2017 14 / 27
)
Proposed method V

(Universidad
for Semantic
Nacional
Category-based
de ColombiaMarch
image13,
retrieval
2017 15 / 27
)
Learning distance function as neural network
Figure: The proposed method seen as a kernel neural network with one hidden
layer (shared). The input features, for the test pair, are passed through the
network respectively and then compared using Euclidean distance[1].

(Universidad
for Semantic
Nacional
Category-based
de ColombiaMarch
image13,
retrieval
2017 16 / 27
)
Datasets
Table: Statistics of the databases used for the experiments, along with the
performance (1 mean class accuracy or 2 mean average precision) of the features
used (with SVM) vs. existing methods. The performance serve to demonstrate
that the features we use are competitive.

(Universidad
for Semantic
Nacional
Category-based
de ColombiaMarch
image13,
retrieval
2017 17 / 27
)
Image Features
They used the outputs of the last fully connected layer after linear
rectification of CNN pre-trained on the ImageNet Dataset. This give a
4096 non-negative features. To validate the baseline implementation this
features were used with linear SVM to perform the classification task for
the different datasets in table 1.

(Universidad
for Semantic
Nacional
Category-based
de ColombiaMarch
image13,
retrieval
2017 18 / 27
)
Baselines
Five baselines were used, in all cases the final vectors are compared using
Euclidean distance.
1 No proj.
2 PCA.
3 Learn Neighborhood Component Analysis (NCA).
4 Large Margin Nearest Neighbor (LMNN), with two configurations.
5 Linear metric learning (ML).

(Universidad
for Semantic
Nacional
Category-based
de ColombiaMarch
image13,
retrieval
2017 19 / 27
)
Evaluation
They report results for the retrieval setting. The train+val sets of the
datasets were used for training the baselines and the proposed nonlinear
method. Each test image was used as a query and the rest of the test
images as gallery and report the mean precision@K (mprec@K) i.e. average
of precision@K is computed for queries of a given class, averaged over the
class.

(Universidad
for Semantic
Nacional
Category-based
de ColombiaMarch
image13,
retrieval
2017 20 / 27
)
Implementation details I
The number of iterations were fixed in one million. We sample (up to)
500.000 pairs of similar and dissimilar vectors from the train+val for
both the baseline linear metric learning and the proposed nonlinear metric
learning (NML).
The bias and learning baseline were fixed to b=1 and m=0.2 while those
for the proposed nonlinear method were fixed to b=0.1 and m=0.2, this
values were kept constant for all experiments reported i.e. no dataset
specific tunning as done. The algorithm is not very sensitive to the choice
of m and b.

(Universidad
for Semantic
Nacional
Category-based
de ColombiaMarch
image13,
retrieval
2017 21 / 27
)
Implementation details II
Figure: Performance with varying margin m (fixed b = 0.1) and bias b (fixed
m=0.02), for the proposed method (d=64)

(Universidad
for Semantic
Nacional
Category-based
de ColombiaMarch
image13,
retrieval
2017 22 / 27
)
Quantitative results

(Universidad
for Semantic
Nacional
Category-based
de ColombiaMarch
image13,
retrieval
2017 23 / 27
)
Quantitative results II

(Universidad
for Semantic
Nacional
Category-based
de ColombiaMarch
image13,
retrieval
2017 24 / 27
)
Qualitative results

(Universidad
for Semantic
Nacional
Category-based
de ColombiaMarch
image13,
retrieval
2017 25 / 27
)
Discussion and conclusion
The experimental results support the conclusion that the proposed method
is consistently better than standard competitive baselines specially in the
case of low dimensional projections.
Supervised discriminative dimensionality reduction also improves the
performance of the reference system which uses the full high dimensional
feature without any compression.

(Universidad
for Semantic
Nacional
Category-based
de ColombiaMarch
image13,
retrieval
2017 26 / 27
)
References I
Gaurav Sharma and Bernt Schiele.

Scalable nonlinear embeddings for semantic category-based image
retrieval.
Max Planck Institute for Informatics and Germany, IEEE International
Conference on Computer Vision, 2015.

(Universidad
for Semantic
Nacional
Category-based
de ColombiaMarch
image13,
retrieval
2017 27 / 27
)

Seminar Presentation

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Seminar Presentation

Transféré par

Droits d'auteur :

Formats disponibles

Scalable Nonlinear Embeddings for Semantic

Category-based image retrieval

Presented by: Victor Hugo Contreras O.

Universidad Nacional de Colombia

March 13, 2017

Presented by: Victor Hugo Contreras O. vhcontreraso@unal.edu.co

1 About the paper

2 Task & Motivation

7 Discussion and conclusion

Presented by: Victor Hugo Contreras O. vhcontreraso@unal.edu.co

Title: Scalable Nonlinear Embeddings for Semantic Category-based Image

Presented by: Victor Hugo Contreras O. vhcontreraso@unal.edu.co

Presented by: Victor Hugo Contreras O. vhcontreraso@unal.edu.co

Presented by: Victor Hugo Contreras O. vhcontreraso@unal.edu.co

Given a dataset X of positive and negative pairs of vectors i.e.

Presented by: Victor Hugo Contreras O. vhcontreraso@unal.edu.co

Many metric learning approaches learn, from X , a Mahalanobis-like metric

d 2 (xi , xj ) = (xi xj )T M(xi xj ) (1)

M is required to be symmetric positive semi-definite matrix, and hence can

Presented by: Victor Hugo Contreras O. vhcontreraso@unal.edu.co

Presented by: Victor Hugo Contreras O. vhcontreraso@unal.edu.co

Learning the distance function in equation (2) can be achieved by

Presented by: Victor Hugo Contreras O. vhcontreraso@unal.edu.co

d 2 (xi , xj ) = ||AXT (Xi ) AXT (xj )||2 = ||A(ki kj )||2 (4)

where, X = [(x1 ), (x2 ), . . . , (xn )] is the matrix of mapped vectors

Presented by: Victor Hugo Contreras O. vhcontreraso@unal.edu.co

We can view the projection, with matrix L in the mapped space F as

We assume that there exists lt RD such that either (lt ) = l

L (x) = [h(l1 ), (x)iF , . . . , h(ld ), (x)iF ]

we now do a nonlinear projection on the rows of L = [lT T T

Presented by: Victor Hugo Contreras O. vhcontreraso@unal.edu.co

With this formulation, the parameters to be learned are the elements of

where the fixed margin of 1 is replaced by free parameter m, selected

They used a shifted X 2 kernel, shown to be quite effective for image

The complexity of the stochastic updates is O(dD) (kernel evaluations

Presented by: Victor Hugo Contreras O. vhcontreraso@unal.edu.co

Presented by: Victor Hugo Contreras O. vhcontreraso@unal.edu.co

Presented by: Victor Hugo Contreras O. vhcontreraso@unal.edu.co

Presented by: Victor Hugo Contreras O. vhcontreraso@unal.edu.co

Presented by: Victor Hugo Contreras O. vhcontreraso@unal.edu.co

Presented by: Victor Hugo Contreras O. vhcontreraso@unal.edu.co

Presented by: Victor Hugo Contreras O. vhcontreraso@unal.edu.co

Presented by: Victor Hugo Contreras O. vhcontreraso@unal.edu.co

Presented by: Victor Hugo Contreras O. vhcontreraso@unal.edu.co

Presented by: Victor Hugo Contreras O. vhcontreraso@unal.edu.co

Presented by: Victor Hugo Contreras O. vhcontreraso@unal.edu.co

Presented by: Victor Hugo Contreras O. vhcontreraso@unal.edu.co

Presented by: Victor Hugo Contreras O. vhcontreraso@unal.edu.co

Gaurav Sharma and Bernt Schiele.

Presented by: Victor Hugo Contreras O. vhcontreraso@unal.edu.co

Vous aimerez peut-être aussi