Académique Documents
Professionnel Documents
Culture Documents
L ( x, y , σ ) = G ( x, y , σ ) ∗ I ( x, y ) (1) θ (x, y) = tan ((L(x, y +1) − L(x, y −1))/(L(x + 1, y) − L(x −1, y))) (5)
Where ∗ is the convolution operator, G ( x, y, σ ) is
a variable-scale Gaussian and I ( x, y ) is the input An orientation histogram is formed of all the
image. keypoints. The highest peak in the histogram is
Difference of Gaussian is a technique used to detect located, this peak and another peak within 80% of the
stable Keypoint locations in the scale space. The scale height of this peak is used to create a keypoint with
space extrema D ( x, y, σ ) is located by computing the that orientation. Some points can be assigned multiple
orientations. A parabola is fit to 3 histogram values
difference of two images, one image scale k times the that are closest to each peak inorder to interpolate the
other image. peaks position.
The fourth step of the algorithm is to create the
D( x, y, σ ) = L( x, y , kσ ) − L( x, y, σ ) (2) keypoint descriptor. The gradient information is used
to align with the orientation of the keypoint and then
In order, to detect the local maxima and minima of weighted by a Gaussian with variance of 1.5 x
D( x, y, σ ) each point is compared with its 26 keypoint scale. A set of histograms is created over a
neighbours (8 neighbours at the same scale and 9 window that is centered on the keypoint with this data.
neighbours above and 9 neighbours below that scale) A set of 16 histograms is used, aligned in a 4x4 grid,
in the scale space. If the value is a where there are 8 orientation bins, one each for the
minimum/maximum amongst all the 26 points then it main compass directions and the midpoints of these.
is an extrema. The resulting feature vector has 128 elements, this
The second step of the algorithm eliminates more vector is normalised to unit length and the elements
points from the keypoints by finding those that are with small values are removed by thresholding. The
poorly localized on an edge or have low contrast. SIFT descriptor is unique in the following ways: 1) it
Brown has developed a method [5] which fits a 3D is carefully designed to avoid problems from the
quadratic function to the keypoints inorder to boundaries, smooth changes in location, scale and
determine the interpolated location of the maximum, orientation do not fundamentally change the feature
according to his experiments this has substantially vector, 2) it is compact as it represents a patch of
improved the process of matching and stability. It pixels by a 128 element vector, 3) its not explicitly
uses the Taylor expansion of the scale-space invariant to affine transformations, but it is flexible to
distortions by perspective effects. These
function, D ( x, y , σ ) , shifted so that the keypoint is
characteristics can be seen while comparing it with
the origin. The location of an extremum E, is given by: competing algorithms [1].
− ∂ 2 D −1 ∂D 3. Proposed Methodology
E= * (3)
∂x 2 ∂x
The point is excluded if the value of function at E is This section outlines and explains the proposed
below a threshold. It removes extrema that have low scheme. It is based on the SIFT and PCA- SIFT
contrast. For the elimination of an extrema due to poor algorithms discussed earlier, by modifying their steps
localisation, there is a large principal curvature across and introducing steps that lead to better performance
the edge but a small curvature in the perpendicular and results.
direction. The key point is rejected if the difference is The modifications suggested in the SIFT algorithm
below the ratio of the largest to smallest eigenvector, are as follows: The cost of feature extraction has been
from the Hessian matrix at that location and scale. minimized by using a cascaded filtering approach,
The third step of the algorithm assigns an where the more expensive operations are performed
orientation to the keypoints based on the image only to specific locations that have been approved by
properties. For the orientation computation; the previous steps. It computes a large number of features
keypoints scale is used to select the image L, the over the image. The number of features that are
gradient magnitude m, is computed as: detected and filtered after the first detection step is
very important for recognizing objects. These features
are extracted from a set of sample images. A query
m(x, y) = (L(x +1, y) − L(x −1, y))2 + (L(x, y +1) − L(x, y −1))2 (4)
34
image is matched by the comparison of these features produces a peak at the start of the change in the
by computing the Euclidean distance. intensity value and then at the end of the change.
The keypoint descriptors are computed by using the The points that are obtained after the filtration in
Kernel Principal Component Analysis (KPCA) [12]. this step have a certain degree of repeatability. Hence,
They are highly distinctive where the correct match of the number of the points that are detected after the
features can be found with great accuracy. Moreover, filtration process are fair enough to be used in the
if the scene is much cluttered there might be instances recognition process.
where a correct match is not found, which leads to
false matches. The set of correct matches can be sorted 50
out from the total set of matches by using the scale, 100
350
localization and descriptor computation technique 400
(modification of step 2 and 4 of SIFT) while extrema 450
according to the SIFT algorithm. 50 100 150 200 250 300 350 400 450 500
(a)
Image 1 50
100
150
200
Scale-Space extrema detection 250
300
350
500
50 100 150 200 250 300 350 400 450 500
(b)
Orientation Assignment Figure 2: Initial Extrema detection (a), Extrema
after filtration (b)
35
1 N
1 (6) N
C =
N
∑
i =1
φ ( x i )φ ( x i ) T =
N
XX T
y h = v hφ ( x) = ∑ u ih k ( xi , x) (10)
i =1
The data that has been mapped is centered i.e., Hence the φ image of x can be reconstructed from
φ ( x ) = 0 . The Eigen values and Eigen
N
1
N
∑i =1
i its projections onto the first H ( ≥ P) principal
vectors of the covariance matrix C can be computed components in F by using a projection operator PH
by solving the following Eigen Value problem:
H
λ u = Ku (7)
PH φ ( x) = ∑ y h v h (11)
h =1
Where, α i = [α 1 i , … … , α mi ] T (i =1,……, m) is
4. Evaluation
We evaluate the algorithm on real images with
a coefficient vector. For some test data x, its hth different transformations (scaling, rotation,
principal component yh, which can be computed by the illumination change, change in viewpoint and addition
kernel function given as follows: of noise). We use the criterion proposed in [3], which
36
is based on the number of correct and false matches 5.1. Image DataSet
obtained from image comparison. Receiver Operating
Characteristics (ROC) and Recall-Precision are The technique is evaluated on real images from a
popular metrics and may be used interchangeably at database provided by Ke and
times. Both ROC and recall precision consider the fact (http://www.cs.cmu.edu/~yke/pcasift/). The database
that the number of correct positive has to be increased contains images with different transformations,
while decreasing the number of false positives. ROC is including change in viewpoint and scaling. We have
quiet suitable for the evaluation of classifiers as the applied rotation to the images as rotated versions were
rate of false detection is well defined [14], while recall not available in the database.
precision is suitable for the evaluation of detectors, as The transformations are significant enough to
the number of false detections relative to the total evaluate the performance of the proposed technique
number of detections is and provide comparison with established technique as
accurately given by 1-precisin although, the total well as other alternatives that were proposed during
number of negatives can not be determined. the course of this task.
Following the approach used in [2] the performance
of SIFT on matching keypoint is done as: matches for 5.2. Experimental Results
a point in an image are found in the whole data set,
which being a detection task as the total number of For rotation testing the images are recorded by
negatives is not well defined, and hence the suitable rotating them at three degrees (3°, 6°…). It has been
metric is recall precision. We use recall vs. 1-precision observed that the percentage of matching increased for
graphs as in [3]. 6° rotation. For change in scale the images are
The keypoints in the images are identified using the recorded by scaling them by a factor of 0.5, 0.75 and
modified SIFT algorithm. All pairs of keypoints are 1.5. For change in illumination the images are taken
evaluated. If the Euclidean distance between feature by changing the illuminating factor by 50,75 and 100
vectors for a set of keypoints is below a threshold, (Adobe Photoshop 7.0). For change in viewpoint the
then points are considered as a match. When the two images are taken by changing the viewing direction of
keypoints correspond to the same location it is termed image (Ke’s database). Gaussian noise has been added
as correct-positive. If the two keypoints come from (matlab). Experimenting on a larger database would
different locations then it is termed as a false-positive give a better analysis of the proposed technique. The
match. The total number of positives for a dataset is results acquired are shown in shown in figure 3.
known as priori, from these we can formulate recall
and 1-precicion: 6. Comparison of Techniques
no. of correct positives
recall = (13)
total no. of positives This section outlines and elaborates on a
and comparison between three techniques. Two of them
no. of false positives being those that were proposed by us and one being
1 − precision = the SIFT algorithm which so far has been considered
total no. of matches (correct/false)
as the most efficient algorithm with substantial
(14)
experimental results. One of the techniques has been
The graphs of recall vs. 1-precision are generated.
outlined in the previous section. The other technique
also follows the main steps as that of the SIFT
5. Discussion of Results algorithm, but the Haar wavelet was applied for the
keypoint localization.
This section outlines the experiments that have
Recent researches have proved that Haar wavelet is
been conducted with the implementation that was
a good feature for use in object recognition [15]. The
proposed in the previous section in order to present an
usefulness of using Haar features for the recognition
evaluation as well as comparison with already
process has been studied by Leo et al. [15].
established techniques. Furthermore, evaluation is
The Haar wavelet operates on data by calculating
carried out, as presented in the previous section. The
the sums and differences of adjacent elements. We
results from different transformations will also be
have also used the differencing and averaging
presented. The transformations applied were change in
procedure for the filtering out of extrema that were
illumination, rotation, change in viewpoint, blurring,
detected in step 1.
adding noise and scaling.
Wavelets are good as far as scene matching is
required but to extract features that are repetitive other
techniques have to be used. Which was subsequently
37
replaced by the 2-D Laplacian filter, the points that are and Madiha Hussain Malik. Thanks to Scott Ettinger
obtained this way are repetitive to a fair extent even for the initial code of SIFT
for a transformed image.
10. References
7. Conclusion
[1] K.Mikolajczyk and C. Schmid, “A Performance
This section initially highlights the results that were Evaluation of Local Descriptors” IEEE transactions on
acquired by using the proposed technique and in the Pattern Analysis and Machine Intelligence, Vol. 27, No.
10, October 2005
later half it compares three techniques, 1) SIFT, 2) [2] D. Lowe, “Distinctive Image Features from Scale-
KPCA-SIFT (Haar) and 3) KPCA-SIFT (Laplace). It Invariant Keypoints”, International Journal of
is evident from the results that in the cases of rotation Computer Vision, 2004
and change of scale SIFT performs better, while SIFT [3] Y. Ke and R. Sukthankar, “PCA-SIFT: A more
(Haar) also performs better than the KPCA-SIFT Distinctive Representation for Local Image
(Laplace). For a change in viewpoint the KPCA-SIFT Descriptors”, Proc. Conference Computer Vision and
(Laplace) surpasses the performance of SIFT and Pattern Recognition, 511-517, 2004
KPCA-SIFT (Haar). For the Change in Illumination in [4] K.Mikolajczyk and C. Schmid, “Scale & Affine
some cases there is an overlap between the KPCA- Invariant Interest Point Detectors” International Journal
of Computer Vision 60(1), 63-86,2004
SIFT (Laplace) and the SIFT while KPCA-SIFT(Haar) [5] M. Brown and D. Lowe, “Invariant Features From
performs better than both of them. Similarly in the Interest Pont Groups”, BMVC 2002
case addition of Gaussian noise to the image the [6] C. Harris and M. Stephens, A combined corner and
KPCA-SIFT (Laplace) performs in a similar way to edge detector, Proc. 4th Alvey Vision Conference., 147-
SIFT varying over a larger interval than SIFT but 151, Manchester, UK, 1988
KPCA-SIFT (Haar) does not perform well. For blurred
images, out of the three techniques only SIFT shows [7] K.Mikolajczyk and C. Schmid, “Indexing based on
some results but, if the blurring increases it fails scale invariant interest points”. In: ICCV. Volume 1,
altogether. While for the comparison of the three 525-531,2001
techniques the graphs have been highlighted in this [8] S. Belongie, J. Malik and J. Puzicha, “Shape Matching
and Object Recognition Using Shape Contexts”, IEEE
section as shown in Figure 4. Transactions. Pattern Analysis and Machine
intelligence, Volume2, no. 4, 509-522, 2002
8. Future work [9] W. Freeman and E. Adelson, “The Design and Use of
Steerable Filters”, IEEE transactions Pattern Analysis
Future work would focus on invariant methods for and Machine intelligence, Volume13, no. 9, 891-906,
extrema detection and an efficient method for 1991
orientation assignment. The computational burden on [10] F. Schaffalitzky and A. Zisserman, “Multi-View
Matching for Unordered Image Sets”, Proc. 7th
the keypoint filtration step can be reduced further by European Conference Computer Vision, 414-431, 2002
carefully selecting the initial keypoints. Besides this [11] L. Van Gool, T. Moons, and D. Ungureanu, “Affine/
techniques like kernel discriminant analysis (KDA) Photometric Invariants for Planar Intensity Patterns”,
can be used for the descriptor computation which is a Proc. 4th European Conference Computer Vision, 642-
non-linear discriminating technique based on the 651, 1996
kernel technique. It can be used for extracting the non- [12] V. Vapnik, Statistical Learning Theory, Wiley, New
linear discriminating features. Coloured images can be York, 1998
used for the detection of objects which would make it
more extensive but, also easier to manage and identify [13] B. Scholkopf, A. Smola, and K. -R. Muller, Nonlinear
objects. It can also be extended to be used for object Component Analysis as a Kernel Eigenvalue Problem,
Neural Computation, Vol. 10, n. 5, pp. 1299-1319,
recognition in videos. 1998.
[14] S. Agarwal and D. Roth, “Learning a sparse
9. Acknowledgement representation for object detection”. In Proc. European
Conference on Computer Vision, 113-130, 2002
The authors acknowledge the support and help [15] M. Leo, T. D’Orazio, P. Spagnolo, A. Distante,
provided by Asad Ali, Rubina Sultan, Nagina Hassan “Wavelet and ICA preprocessing for ball recognition in
soccer images”
38
(a) (b) (c) (d) (e)
Figure 3: Matching with 0° rotation (a) Matching with -3° and 3° rotation (b) Matching with Illumination level set at
50 (c) Matching with Change in viewpoint (d) Matching with added noise (e)
1 1
SIFT SIFT
KPCA-SIFT(Laplace) KPCA-SIFT(Laplace)
0.95 KPCA-SIFT(Haar)
KPCA-SIFT(Haar)
0.95
0.9
0.9
1-Precision
1-Precision
0.85
0.8
0.85
0.75
0.8
0.7
0.65 0.75
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0 0.05 0.1 0.15 0.2 0.25
(a) Recall
(b) Recall
1 1.005
SIFT SIFT
0.9 KPCA-SIFT(Laplace) 1 KPCA-SIFT(Laplace)
KPCA-SIFT(Haar) KPCA-SIFT(Haar)
0.8
0.995
0.7
0.99
0.6
1-Precision
1-Precision
0.985
0.5
0.98
0.4
0.975
0.3
0.97
0.2
0.1 0.965
0 0.96
0 0.1
(c)
0.2 0.3 0.4
Recall
0.5 0.6 0.7 0.8 0 0.005 0.01
(d) 0.015 0.02
Recall
0.025 0.03 0.035 0.04
1 1
SIFT SIFT
0.9 KPCA-SIFT(Laplace) 0.9 KPCA-SIFT(Laplace)
KPCA-SIFT(Haar) KPCA-SIFT(Haar)
0.8 0.8
0.7 0.7
0.6 0.6
1-Precision
1-Precision
0.5 0.5
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
0 0.1 0.2 0.3 (e) 0.4 0.5
Recall
0.6 0.7 0.8 0.9 1 0 0.005 0.01 (f)0.015 0.02
Recall
0.025 0.03 0.035 0.04
Figure 4(left to right): SIFT, Proposed Technique and SIFT (Haar) on matching tasks where the images have been:
rotated (a) scaled (b) Gaussian noise has been added (c) illumination has been changed (d) viewpoint has been
changed (d) blurred (f)
39