Vous êtes sur la page 1sur 2

Tag Annotation

Dzung Nguyen dung.nguyentri@pixta.co.jp

Abstract
This documents presents details of major image tagging models.

1 Problem Formulation

2 Tag Propagation
Tag propagation consists of two steps:

• Finding nearest neighbors using visual information. Concretely, k neighbors are found using
VGG features
• Propagate tags from neighbors
The propagation use the following formula:

2.1 Details

2.1.1 Loss calculation


2.1.2 Derivative calculation

3 Fast Image Tagging


[1]

4 Canonical Correlation Analysis


4.1 Linear CCA

Assume that we have data from two different spaces: X1 and X2 . We want to learn the transformation
W1 and W2 which maps to a joint embedding E, where these two representations match with each
other. Alternative names for these are different views, or different modals. In addition, the number of
spaces could be generalized to N instead of 2 only.
In our case, X1 has dimension 4096 (VGG feature), and X2 has dimension 300 (tag embedding).
The criteria for optimization is canonical correlation, which is defined as following:
<W1 X1 ,W2 X2 >
argmaxW1 ,W2 ||W 1 X1 ||||W2 X2 ||

Canonical correlation could be thought of as normalized correlation. This is equivalent to the


following:
W1T Σ12 W2
argmaxW1 ,W2 √
W1T Σ11 W1 W2T Σ22 W2

Preprint. Work in progress.


Without loss of generality, this is equivalent to the following constrained optimization problem:

maximize W1T Σ12 W2


W1 ,W2

subject to W1T Σ11 W1 = 1 (1)


W2T Σ22 W2 = 1

The Lagrangian function of this constrained optimization problem is:

L(W1 , W2 ) = W1T Σ12 W2 − λ1 (W1T Σ11 W1 − 1) − λ2 (W2T Σ22 W2 − 1)

KKT condition states that the optimal solution W1∗ , W2∗ will satisfy the following (necessary)
conditions:

∂L
= Σ12 W2 − λ1 Σ11 W1 = 0
∂W1
(2)
∂L
= Σ21 W1 − λ2 Σ22 W2 = 0
∂W2
Equivalently:

W1T Σ12 W2 − λ1 W1T Σ11 W1 = 0


(3)
W2T Σ21 W1 − λ2 W2T Σ22 W2 = 0

Subtract these two equations, and apply the constraint:

λ1 W1T Σ11 W1 = λ2 W2T Σ22 W2 (4)


⇒ λ1 = λ2 = λ

Applying this back in the second equation of (2):

W2 = Σ−1
22 Σ21 W1 /λ (5)

And substituting in the first equation of (2):

Σ12 Σ−1 2
22 Σ21 W1 = λ Σ11 W1 (6)

4.2 Kernel CCA

4.3 Deep CCA (backpropagation)

5 Ranking Loss

References
[1] Minmin Chen, Alice Zheng, and Kilian Weinberger. Fast image tagging. In International
conference on machine learning, pages 1274–1282, 2013.

Vous aimerez peut-être aussi