Vous êtes sur la page 1sur 27

Scalable Nonlinear Embeddings for Semantic

Category-based image retrieval

Presented by: Victor Hugo Contreras O.


vhcontreraso@unal.edu.co

Universidad Nacional de Colombia

March 13, 2017

Presented by: Victor Hugo Contreras O. vhcontreraso@unal.edu.co


Scalable Nonlinear Embeddings
(Universidad
for Semantic
Nacional
Category-based
de Colombia March
image retrieval
13, 2017 1 / 27
)
Outline

1 About the paper

2 Task & Motivation

3 Introduction

4 Background

5 Proposed method

6 Experimental results

7 Discussion and conclusion

8 References

Presented by: Victor Hugo Contreras O. vhcontreraso@unal.edu.co


Scalable Nonlinear Embeddings
(Universidad
for Semantic
Nacional
Category-based
de Colombia March
image retrieval
13, 2017 2 / 27
)
About the paper

Title: Scalable Nonlinear Embeddings for Semantic Category-based Image


Retrieval.
Authors: Gaurav Sharma and Bernt Schiele.
Institute: Max Planck Institute for Informatics, Germany.
year: 2015.
publisher: IEEE International Conference on Computer Vision. Available
in: https://arxiv.org/abs/1509.08902

Presented by: Victor Hugo Contreras O. vhcontreraso@unal.edu.co


Scalable Nonlinear Embeddings
(Universidad
for Semantic
Nacional
Category-based
de Colombia March
image retrieval
13, 2017 3 / 27
)
Task & Motivation

Task
Learning a distance metric by embedding the vectors into a low
dimensional Euclidean space.

Motivation
Learning a distance metrics for comparing a multidimensional vectors is a
fundamental problem in computer vision. [...] , many important computer
vision problems such as image (object, scene, etc.) classification and
retrieval become trivial using nearest neighbor search.

Presented by: Victor Hugo Contreras O. vhcontreraso@unal.edu.co


Scalable Nonlinear Embeddings
(Universidad
for Semantic
Nacional
Category-based
de Colombia March
image retrieval
13, 2017 4 / 27
)
Introduction

Metric learning has achieved much success in computer vision task such
that:
Image auto-annotation.
Face verification.
visual tracking.
person identification.
nearest neighbor based image classification.

Presented by: Victor Hugo Contreras O. vhcontreraso@unal.edu.co


Scalable Nonlinear Embeddings
(Universidad
for Semantic
Nacional
Category-based
de Colombia March
image retrieval
13, 2017 5 / 27
)
Background: Supervised discriminative metric learning
with pairwise constraints I

Given a dataset X of positive and negative pairs of vectors i.e.


X = {(xi , xj , yij )|(i, j) I N2 }, with xi RD and yij {1, +1}, the
task is to learn a distance function on RD .

Presented by: Victor Hugo Contreras O. vhcontreraso@unal.edu.co


Scalable Nonlinear Embeddings
(Universidad
for Semantic
Nacional
Category-based
de Colombia March
image retrieval
13, 2017 6 / 27
)
Learning a Mahalanobis-like Metric Distance

Many metric learning approaches learn, from X , a Mahalanobis-like metric


parametrized by matrix M RDxD , i.e.

d 2 (xi , xj ) = (xi xj )T M(xi xj ) (1)

M is required to be symmetric positive semi-definite matrix, and hence can


be factorized as M = LT L,
with L RdxD and d D.

Presented by: Victor Hugo Contreras O. vhcontreraso@unal.edu.co


Scalable Nonlinear Embeddings
(Universidad
for Semantic
Nacional
Category-based
de Colombia March
image retrieval
13, 2017 7 / 27
)
Learning a projection

With equation (1) the metric learning can be seen as learning a projection
upon which the comparison is done using Euclidean distance in the
resulting space i.e.
i Lx
d 2 (xi , xj ) = ||Lx j ||22 (2)

Presented by: Victor Hugo Contreras O. vhcontreraso@unal.edu.co


Scalable Nonlinear Embeddings
(Universidad
for Semantic
Nacional
Category-based
de Colombia March
image retrieval
13, 2017 8 / 27
)
The loss function

Learning the distance function in equation (2) can be achieved by


minimizing the objective with hinge loss:
X
min F = max(0, 1 yij {b d 2 (xi , xj )}) (3)

L,b
X

which aims to learn L such that the positive pairs are at distances less
than b 1, to each other, while the negatives are at distances greater than
b + 1, with b being the bias parameter. Explicit regularization is absent as
often d << D the rank d of the learnt metric M = LT L is fixed to be
small.

Presented by: Victor Hugo Contreras O. vhcontreraso@unal.edu.co


Scalable Nonlinear Embeddings
(Universidad
for Semantic
Nacional
Category-based
de Colombia March
image retrieval
13, 2017 9 / 27
)
Nonlinear distance functions I

Invoke representer theorem like condition and write the rows of L as linear
combinations of input vectors i.e. L = AX T (where X is the matrix of all
vectors xi as columns).
Mapping the vectors with a non-linear feature map : RD F and the
using kernel trick i.e. k(xi , xj ) = h(xi ), (xj )iF , we can proceed as
follows,

d 2 (xi , xj ) = ||AXT (Xi ) AXT (xj )||2 = ||A(ki kj )||2 (4)

where, X = [(x1 ), (x2 ), . . . , (xn )] is the matrix of mapped vectors


and
kt = XT xt = [k(x1 , xt ), k(x2 , xf ), . . . , K (xn , xt )]T (5)

Presented by: Victor Hugo Contreras O. vhcontreraso@unal.edu.co


Scalable Nonlinear Embeddings
(Universidad
for Semantic
Nacional
Category-based
de ColombiaMarch
image13,
retrieval
2017 10 / 27
)
Proposed method I

We can view the projection, with matrix L in the mapped space F as


h i
L (x) = hl
1 , (x)iF , . . . , hld , (x)iF (6)

where l
t F are rows of L .

We assume that there exists lt RD such that either (lt ) = l


t i.e.
D
lt R is the pre-image of lt F or, if such pre-image doesnt exist,
then (lt ) l D
t i.e.lt R is an approximation for the pre-image of (lt ).

Once we have (lt ) = lt , we can write

L (x) = [h(l1 ), (x)iF , . . . , h(ld ), (x)iF ]


(7)
= [k(l1 , x), k(l2 , x), . . . , k(ld , x)]

we now do a nonlinear projection on the rows of L = [lT T T


1 , l2 , . . . , ld ].

Presented by: Victor Hugo Contreras O. vhcontreraso@unal.edu.co


Scalable Nonlinear Embeddings
(Universidad
for Semantic
Nacional
Category-based
de ColombiaMarch
image13,
retrieval
2017 11 / 27
)
Proposed method II
Following our assumption, the distance function computed in the feature
space becomes,
d
X
d2 (xi , xj ) = ||L (xi ) L (xj )||2 = (k(lt , xi ) k(lt , xj ))2 (8)
t=1

With this formulation, the parameters to be learned are the elements of


the matrix L i.e. lt RD t = 1, . . . , d. Note that this matrix is different
from the L matrix in the linear function distance case. They propose to
learn L with a standard max margin hinge loss based objective. The
optimization problem thus takes the form,
X
min F = max(0, m yij {b d2 (xi , xj )}) (9)
L
X

where the fixed margin of 1 is replaced by free parameter m, selected


according with the kernel boundaries.
Presented by: Victor Hugo Contreras O. vhcontreraso@unal.edu.co
Scalable Nonlinear Embeddings
(Universidad
for Semantic
Nacional
Category-based
de ColombiaMarch
image13,
retrieval
2017 12 / 27
)
Proposed method III
The sub-gradients required for the SGD iterations in the algorithm are
calculable analytically and are as follows,
(
0 if yij {b d2 (xi , xj )}
lt F ({(xi , xj , yij )}; L, b) =
2y ij(kit kjt )(lt kit lt kjt ) otherwise,
(10)
t
where ki = k(lt , xi ) and
(
0 if yij {b d2 (xi , xj )} 1
b F ({(xi , xj , yij )}; L, b) = (11)
yij otherwise

They used a shifted X 2 kernel, shown to be quite effective for image


classification, etc. given by
D
X 2xc yc
k(x, y) = (12)
|xc | + |yc |
c=1
Presented by: Victor Hugo Contreras O. vhcontreraso@unal.edu.co
Scalable Nonlinear Embeddings
(Universidad
for Semantic
Nacional
Category-based
de ColombiaMarch
image13,
retrieval
2017 13 / 27
)
Proposed method IV
The gradient for the X 2 kernel is also analytically calculable and given by.

2xd |xd |
l k(l, x) = (13)
(|xd | + |ld |)2

The complexity of the stochastic updates is O(dD) (kernel evaluations


with lt t = 1, . . . , d), while the complexity of similar updates with the
traditional parametrization will be O(ND) where N is the number of
distinct training vectors, as lt are linear combinations of (xi ). Since d is
fixed and small (d  N) the updates are relatively inexpensive and the
algorithm is, thus, scalable to order of millions of training points. With the
SGD training the algorithm can be applied in a online setting.

Presented by: Victor Hugo Contreras O. vhcontreraso@unal.edu.co


Scalable Nonlinear Embeddings
(Universidad
for Semantic
Nacional
Category-based
de ColombiaMarch
image13,
retrieval
2017 14 / 27
)
Proposed method V

Presented by: Victor Hugo Contreras O. vhcontreraso@unal.edu.co


Scalable Nonlinear Embeddings
(Universidad
for Semantic
Nacional
Category-based
de ColombiaMarch
image13,
retrieval
2017 15 / 27
)
Learning distance function as neural network

Figure: The proposed method seen as a kernel neural network with one hidden
layer (shared). The input features, for the test pair, are passed through the
network respectively and then compared using Euclidean distance[1].

Presented by: Victor Hugo Contreras O. vhcontreraso@unal.edu.co


Scalable Nonlinear Embeddings
(Universidad
for Semantic
Nacional
Category-based
de ColombiaMarch
image13,
retrieval
2017 16 / 27
)
Datasets

Table: Statistics of the databases used for the experiments, along with the
performance (1 mean class accuracy or 2 mean average precision) of the features
used (with SVM) vs. existing methods. The performance serve to demonstrate
that the features we use are competitive.

Presented by: Victor Hugo Contreras O. vhcontreraso@unal.edu.co


Scalable Nonlinear Embeddings
(Universidad
for Semantic
Nacional
Category-based
de ColombiaMarch
image13,
retrieval
2017 17 / 27
)
Image Features

They used the outputs of the last fully connected layer after linear
rectification of CNN pre-trained on the ImageNet Dataset. This give a
4096 non-negative features. To validate the baseline implementation this
features were used with linear SVM to perform the classification task for
the different datasets in table 1.

Presented by: Victor Hugo Contreras O. vhcontreraso@unal.edu.co


Scalable Nonlinear Embeddings
(Universidad
for Semantic
Nacional
Category-based
de ColombiaMarch
image13,
retrieval
2017 18 / 27
)
Baselines

Five baselines were used, in all cases the final vectors are compared using
Euclidean distance.
1 No proj.
2 PCA.
3 Learn Neighborhood Component Analysis (NCA).
4 Large Margin Nearest Neighbor (LMNN), with two configurations.
5 Linear metric learning (ML).

Presented by: Victor Hugo Contreras O. vhcontreraso@unal.edu.co


Scalable Nonlinear Embeddings
(Universidad
for Semantic
Nacional
Category-based
de ColombiaMarch
image13,
retrieval
2017 19 / 27
)
Evaluation

They report results for the retrieval setting. The train+val sets of the
datasets were used for training the baselines and the proposed nonlinear
method. Each test image was used as a query and the rest of the test
images as gallery and report the mean precision@K (mprec@K) i.e. average
of precision@K is computed for queries of a given class, averaged over the
class.

Presented by: Victor Hugo Contreras O. vhcontreraso@unal.edu.co


Scalable Nonlinear Embeddings
(Universidad
for Semantic
Nacional
Category-based
de ColombiaMarch
image13,
retrieval
2017 20 / 27
)
Implementation details I

The number of iterations were fixed in one million. We sample (up to)
500.000 pairs of similar and dissimilar vectors from the train+val for
both the baseline linear metric learning and the proposed nonlinear metric
learning (NML).
The bias and learning baseline were fixed to b=1 and m=0.2 while those
for the proposed nonlinear method were fixed to b=0.1 and m=0.2, this
values were kept constant for all experiments reported i.e. no dataset
specific tunning as done. The algorithm is not very sensitive to the choice
of m and b.

Presented by: Victor Hugo Contreras O. vhcontreraso@unal.edu.co


Scalable Nonlinear Embeddings
(Universidad
for Semantic
Nacional
Category-based
de ColombiaMarch
image13,
retrieval
2017 21 / 27
)
Implementation details II

Figure: Performance with varying margin m (fixed b = 0.1) and bias b (fixed
m=0.02), for the proposed method (d=64)

Presented by: Victor Hugo Contreras O. vhcontreraso@unal.edu.co


Scalable Nonlinear Embeddings
(Universidad
for Semantic
Nacional
Category-based
de ColombiaMarch
image13,
retrieval
2017 22 / 27
)
Quantitative results

Presented by: Victor Hugo Contreras O. vhcontreraso@unal.edu.co


Scalable Nonlinear Embeddings
(Universidad
for Semantic
Nacional
Category-based
de ColombiaMarch
image13,
retrieval
2017 23 / 27
)
Quantitative results II

Presented by: Victor Hugo Contreras O. vhcontreraso@unal.edu.co


Scalable Nonlinear Embeddings
(Universidad
for Semantic
Nacional
Category-based
de ColombiaMarch
image13,
retrieval
2017 24 / 27
)
Qualitative results

Presented by: Victor Hugo Contreras O. vhcontreraso@unal.edu.co


Scalable Nonlinear Embeddings
(Universidad
for Semantic
Nacional
Category-based
de ColombiaMarch
image13,
retrieval
2017 25 / 27
)
Discussion and conclusion

The experimental results support the conclusion that the proposed method
is consistently better than standard competitive baselines specially in the
case of low dimensional projections.
Supervised discriminative dimensionality reduction also improves the
performance of the reference system which uses the full high dimensional
feature without any compression.

Presented by: Victor Hugo Contreras O. vhcontreraso@unal.edu.co


Scalable Nonlinear Embeddings
(Universidad
for Semantic
Nacional
Category-based
de ColombiaMarch
image13,
retrieval
2017 26 / 27
)
References I

Gaurav Sharma and Bernt Schiele.


Scalable nonlinear embeddings for semantic category-based image
retrieval.
Max Planck Institute for Informatics and Germany, IEEE International
Conference on Computer Vision, 2015.

Presented by: Victor Hugo Contreras O. vhcontreraso@unal.edu.co


Scalable Nonlinear Embeddings
(Universidad
for Semantic
Nacional
Category-based
de ColombiaMarch
image13,
retrieval
2017 27 / 27
)

Vous aimerez peut-être aussi