(2008) Object Recognition by Modified Scale Invariant Feature Transform

2008 Third International Workshop on Semantic Media Adaptation and Personalization
Object Recognition by Modified Scale Invariant Feature Transform

Gul-e-Saman, S. Asif M. Gilani,
GIKI, Topi ,NWFP, Pakistan GIKI, Topi ,NWFP, Pakistan
Email:Gulesaman@gmail.com Email:asif@giki.edu.pk,
Abstract moment matrix. Although, Harris corner detector is

invariant to translation and rotation, but not to scale.
The best performing detectors are Harris–Laplace and
This paper presents a methodology for object Hessian-Laplace by Mikolajczyk and Schmid which
recognition. It relies on the extraction of distinctive are robust and scale-invariant feature detectors with
invariant image features that can be used to find the high repeatability [7]. They use the determinant of
correspondence between different views of an object Hessian matrix to select the location and use Laplacian
or a scene. These features are invariant to image for scale selection.
rotation and scaling, they have substantial robustness For local descriptors the interest points have to be
to changes in viewpoint and illumination and addition localized in scale and position. Generally, the interest
of noise. Mikolajczyk [1] have evaluated the SIFT [2] points are local peaks in a scale space search and
algorithm along with other approaches and have filtered to preserve those which would be most likely
identified it as the most resistant to image distortions. to be stable after transformations. Then a description
This paper improves on the SIFT algorithm by of the interest point has to be built, this description has
modifying its descriptor and the keypoint localization to be unique, concise, invariant after undergoing
steps. The proposed technique uses the salient aspects various transformations. The localization and
of image gradient in keypoints neighbourhood. description of interest point algorithms are formulated
Moreover, instead of smoothed weighted histograms of together, but they are tackled separately [1].
SIFT, Kernel Principal Component Analysis (KPCA) Mikolajczyk and Schmid have presented a comparison
is applied inorder to normalize the image patch. of various local descriptors [1] as well as shape
Comparative results show that KPCA based context [8], PCA-SIFT [3], steerable filters [9],
descriptors are more distinctive, robust to distortions complex filters [10], moment invariants [11] and
and compact. The evaluation of the technique is cross-correlation of different types of interest points
performed using Recall Precision [3]. [6,7]. However, after experimentation it is not clear as
to which descriptors are more appropriate and how is
their performance dependent on the interest region
Keywords: 2D Laplacian filter, Haar wavelets, SIFT, detectors.
KPCA, PCA and Object recognition This paper is organized as follows: Section2
reviews the salient features of the SIFT algorithm,
1. Introduction Section 3 explains the KPCA-SIFT, Section 4
highlights the evaluation methodology used, Section 5
Object recognition is a fundamental part of many gives results and comparisons of techniques, which is
computer vision problems. Object recognition followed by conclusion, acknowledgement and
techniques using invariant features have become very references.
popular and have been developed in the last few years
[2, 4, 5]. Where the feature points corresponding to the
same points are extracted from different views, these
2. Review of SIFT Algorithm
are invariant to image rotation, scaling, change in
The SIFT algorithm [2] has four major steps
illumination and viewpoint. The interest points are
namely, Scale space extrema detection, Keypoint
robust to occlusion, clutter and photometrically
Localisation, Orientation Assignment and Keypoint
distinctive. These extracted features are matched
Descriptor computation.
against reference features. Some of the detectors that
The first step of the algorithm attempts to identify
have been formulated are SIFT [2] and Harris detector
the scales and locations that can be identified from
[6] which is based on the Eigen values of the second
different views of an object or scene. This is done by
978-0-7695-3444-2/08 $25.00 © 2008 IEEE 33

DOI 10.1109/SMAP.2008.12
using a scale space kernel Gaussian. The scale space
is defined as follows: The Orientation θ is computed as:
L ( x, y , σ ) = G ( x, y , σ ) ∗ I ( x, y ) (1) θ (x, y) = tan ((L(x, y +1) − L(x, y −1))/(L(x + 1, y) − L(x −1, y))) (5)
Where ∗ is the convolution operator, G ( x, y, σ ) is
a variable-scale Gaussian and I ( x, y ) is the input An orientation histogram is formed of all the
image. keypoints. The highest peak in the histogram is
Difference of Gaussian is a technique used to detect located, this peak and another peak within 80% of the
stable Keypoint locations in the scale space. The scale height of this peak is used to create a keypoint with
space extrema D ( x, y, σ ) is located by computing the that orientation. Some points can be assigned multiple
orientations. A parabola is fit to 3 histogram values
difference of two images, one image scale k times the that are closest to each peak inorder to interpolate the
other image. peaks position.
The fourth step of the algorithm is to create the
D( x, y, σ ) = L( x, y , kσ ) − L( x, y, σ ) (2) keypoint descriptor. The gradient information is used
to align with the orientation of the keypoint and then
In order, to detect the local maxima and minima of weighted by a Gaussian with variance of 1.5 x
D( x, y, σ ) each point is compared with its 26 keypoint scale. A set of histograms is created over a
neighbours (8 neighbours at the same scale and 9 window that is centered on the keypoint with this data.
neighbours above and 9 neighbours below that scale) A set of 16 histograms is used, aligned in a 4x4 grid,
in the scale space. If the value is a where there are 8 orientation bins, one each for the
minimum/maximum amongst all the 26 points then it main compass directions and the midpoints of these.
is an extrema. The resulting feature vector has 128 elements, this
The second step of the algorithm eliminates more vector is normalised to unit length and the elements
points from the keypoints by finding those that are with small values are removed by thresholding. The
poorly localized on an edge or have low contrast. SIFT descriptor is unique in the following ways: 1) it
Brown has developed a method [5] which fits a 3D is carefully designed to avoid problems from the
quadratic function to the keypoints inorder to boundaries, smooth changes in location, scale and
determine the interpolated location of the maximum, orientation do not fundamentally change the feature
according to his experiments this has substantially vector, 2) it is compact as it represents a patch of
improved the process of matching and stability. It pixels by a 128 element vector, 3) its not explicitly
uses the Taylor expansion of the scale-space invariant to affine transformations, but it is flexible to
distortions by perspective effects. These
function, D ( x, y , σ ) , shifted so that the keypoint is
characteristics can be seen while comparing it with
the origin. The location of an extremum E, is given by: competing algorithms [1].
− ∂ 2 D −1 ∂D 3. Proposed Methodology
E= * (3)
∂x 2 ∂x
The point is excluded if the value of function at E is This section outlines and explains the proposed
below a threshold. It removes extrema that have low scheme. It is based on the SIFT and PCA- SIFT
contrast. For the elimination of an extrema due to poor algorithms discussed earlier, by modifying their steps
localisation, there is a large principal curvature across and introducing steps that lead to better performance
the edge but a small curvature in the perpendicular and results.
direction. The key point is rejected if the difference is The modifications suggested in the SIFT algorithm
below the ratio of the largest to smallest eigenvector, are as follows: The cost of feature extraction has been
from the Hessian matrix at that location and scale. minimized by using a cascaded filtering approach,
The third step of the algorithm assigns an where the more expensive operations are performed
orientation to the keypoints based on the image only to specific locations that have been approved by
properties. For the orientation computation; the previous steps. It computes a large number of features
keypoints scale is used to select the image L, the over the image. The number of features that are
gradient magnitude m, is computed as: detected and filtered after the first detection step is
very important for recognizing objects. These features
are extracted from a set of sample images. A query
m(x, y) = (L(x +1, y) − L(x −1, y))2 + (L(x, y +1) − L(x, y −1))2 (4)
34
image is matched by the comparison of these features produces a peak at the start of the change in the
by computing the Euclidean distance. intensity value and then at the end of the change.
The keypoint descriptors are computed by using the The points that are obtained after the filtration in
Kernel Principal Component Analysis (KPCA) [12]. this step have a certain degree of repeatability. Hence,
They are highly distinctive where the correct match of the number of the points that are detected after the
features can be found with great accuracy. Moreover, filtration process are fair enough to be used in the
if the scene is much cluttered there might be instances recognition process.
where a correct match is not found, which leads to
false matches. The set of correct matches can be sorted 50
out from the total set of matches by using the scale, 100
location and orientation of the query image. The 150
probability of getting the correct matches is greater 200
than false matches. 250
We propose a more accurate and robust Keypoint

300
350
localization and descriptor computation technique 400
(modification of step 2 and 4 of SIFT) while extrema 450
detection and orientation assignment is carried out 500
according to the SIFT algorithm. 50 100 150 200 250 300 350 400 450 500
(a)
Image 1 50
100
150
200
Scale-Space extrema detection 250
300
350
Keypoints Localization 400
using Laplacian Filter 450
500
50 100 150 200 250 300 350 400 450 500
(b)
Orientation Assignment Figure 2: Initial Extrema detection (a), Extrema
after filtration (b)
Keypoints Descriptor 3.2. Local Image Descriptor

using KPCA
In the earlier step location, scale and orientation
have been assigned to each keypoint. These
Output Image parameters give the description of the local image
region hence, providing invariance to these
Figure 1: Proposed Algorithm
parameters.
KPCA [12] was investigated as a generalization of
3.1. Accurate Keypoint localization PCA. As PCA finds a second order correlation, KPCA
considers higher order correlations in images which
The local extrema that have been detected in the allow it to model data that has been generated by non-
earlier step are good candidates to become keypoints, Gaussian distributions.
but they have to be localized. Local extrema are found KPCA can be computed by using the fact that PCA
by using the 2-D Laplacian filter. can be used in the dot product matrix instead of
Laplacian filters are used to find areas of rapid M N
change in images, they are basically derivative filters. covariance matrix. Let {xi ∈ R }i =1 represent a data
As derivative filters are sensitive to noise, it is a set. KPCA will first map the data in a higher
practice to smooth the image (e.g., using a Gaussian dimensional feature space F by the
filter) before applying the Laplacian (in this case) . function φ : R → F , and then performs PCA on
M
The Laplacian filter is a standard Laplacian of
Gaussian convolution. It is a second derivative the mapped data. If the data matrix X is defined
function designed such that to measure changes in as [φ ( x1 )φ ( x 2 ) ……φ ( x N )] , the covariance matrix
intensity values without being too sensitive to noise. It C is as follows:
35
1 N
1 (6) N
C =
N
∑
i =1
φ ( x i )φ ( x i ) T =
N
XX T
y h = v hφ ( x) = ∑ u ih k ( xi , x) (10)
i =1
The data that has been mapped is centered i.e., Hence the φ image of x can be reconstructed from
φ ( x ) = 0 . The Eigen values and Eigen
N
1
N
∑i =1
i its projections onto the first H ( ≥ P) principal
vectors of the covariance matrix C can be computed components in F by using a projection operator PH
by solving the following Eigen Value problem:
H
λ u = Ku (7)
PH φ ( x) = ∑ y h v h (11)
h =1
Where K is the NxN matrix which is the dot

The kernel used is the Gaussian Kernel:
1 T
product matrix defined as K = X X where ⎛ x− y ⎞ (12)
N k ( x , y ) = exp ⎜⎜ − ⎟
⎝ 2σ 2 ⎟⎠
1 1 (8)
K ij = φ ( x i )φ ( x j ) = k (xi , x j ) 3.3. Implementation Details
N N
In order to compute the image descriptor a set of
Consider λ ≥ …… ≥ λ p to be the non-zero images is selected and the keypoints are detected, for
Eigen values of K ( P ≤ N , P ≤ M ) and each keypoint a 41x41 pixels image patch is extracted.
1 P The horizontal and vertical gradients are calculated
u , …… , u to be the Eigen vectors. Hence C will forming a vector these vectors are then put into a
have the same Eigen values as K forming a one to one matrix. The kernel function is applied and then the
correspondence between the non-zero Eigen covariance calculated. The Eigenvectors and Eigen
h
vectors {u } of K and non-zero Eigen vectors {v } of
h values are computed. The first n Eigen vectors are
selected to form the projection matrix. The projection
C i.e. v = α h Xu
h h
where α h is a normalization matrix is computed once and recorded for use.
constant. If both of the Eigen vectors have unit While matching the keypoints, location of
keypoints, scale and orientation act as the input to this
length, α h = 1 . It is assumed u h = 1 so descriptor. For each keypoint a 41x41 pixels image
λh N λh N patch is extracted at the given scale which is rotated to
that α h = 1 . its orientation. The horizontal and vertical gradients
are calculated forming a vector. This vector is
Eigenvectors in kernel space can be represented as
multiplied using the pre-computed projection matrix,
linearly independent samples. Without the loss of
which results in a KPCA-SIFT descriptor. The
generality, assumption should be taken that a set of m
keypoints of all images in the data set are found. All
linearly independent samples
pairs of keypoint descriptors from the images are
{φ ( x1 ),……, φ ( x m )}(m ≤ N ) span the space where N examined, those having Euclidean distance less than a
training samples are distributed. Therefore the ith threshold are considered a match.
Eigen vector vi can be represented as follows:
3.4. Feature Matching
⎡ α 1i ⎤
⎢ ⎥ Feature matching is performed by computing the
v i = [φ ( x1 ), …… , φ ( x m )]⎢ ⎥=Φ α (9) Euclidean distance between two feature vectors to
⎢ ⎥ m i
determine whether the two vectors correspond to the
⎢ ⎥
⎣α mi ⎦ same keypoint.
Where, α i = [α 1 i , … … , α mi ] T (i =1,……, m) is
4. Evaluation
We evaluate the algorithm on real images with
a coefficient vector. For some test data x, its hth different transformations (scaling, rotation,
principal component yh, which can be computed by the illumination change, change in viewpoint and addition
kernel function given as follows: of noise). We use the criterion proposed in [3], which
36
is based on the number of correct and false matches 5.1. Image DataSet
obtained from image comparison. Receiver Operating
Characteristics (ROC) and Recall-Precision are The technique is evaluated on real images from a
popular metrics and may be used interchangeably at database provided by Ke and
times. Both ROC and recall precision consider the fact (http://www.cs.cmu.edu/~yke/pcasift/). The database
that the number of correct positive has to be increased contains images with different transformations,
while decreasing the number of false positives. ROC is including change in viewpoint and scaling. We have
quiet suitable for the evaluation of classifiers as the applied rotation to the images as rotated versions were
rate of false detection is well defined [14], while recall not available in the database.
precision is suitable for the evaluation of detectors, as The transformations are significant enough to
the number of false detections relative to the total evaluate the performance of the proposed technique
number of detections is and provide comparison with established technique as
accurately given by 1-precisin although, the total well as other alternatives that were proposed during
number of negatives can not be determined. the course of this task.
Following the approach used in [2] the performance
of SIFT on matching keypoint is done as: matches for 5.2. Experimental Results
a point in an image are found in the whole data set,
which being a detection task as the total number of For rotation testing the images are recorded by
negatives is not well defined, and hence the suitable rotating them at three degrees (3°, 6°…). It has been
metric is recall precision. We use recall vs. 1-precision observed that the percentage of matching increased for
graphs as in [3]. 6° rotation. For change in scale the images are
The keypoints in the images are identified using the recorded by scaling them by a factor of 0.5, 0.75 and
modified SIFT algorithm. All pairs of keypoints are 1.5. For change in illumination the images are taken
evaluated. If the Euclidean distance between feature by changing the illuminating factor by 50,75 and 100
vectors for a set of keypoints is below a threshold, (Adobe Photoshop 7.0). For change in viewpoint the
then points are considered as a match. When the two images are taken by changing the viewing direction of
keypoints correspond to the same location it is termed image (Ke’s database). Gaussian noise has been added
as correct-positive. If the two keypoints come from (matlab). Experimenting on a larger database would
different locations then it is termed as a false-positive give a better analysis of the proposed technique. The
match. The total number of positives for a dataset is results acquired are shown in shown in figure 3.
known as priori, from these we can formulate recall
and 1-precicion: 6. Comparison of Techniques
no. of correct positives
recall = (13)
total no. of positives This section outlines and elaborates on a
and comparison between three techniques. Two of them
no. of false positives being those that were proposed by us and one being
1 − precision = the SIFT algorithm which so far has been considered
total no. of matches (correct/false)
as the most efficient algorithm with substantial
(14)
experimental results. One of the techniques has been
The graphs of recall vs. 1-precision are generated.
outlined in the previous section. The other technique
also follows the main steps as that of the SIFT
5. Discussion of Results algorithm, but the Haar wavelet was applied for the
keypoint localization.
This section outlines the experiments that have
Recent researches have proved that Haar wavelet is
been conducted with the implementation that was
a good feature for use in object recognition [15]. The
proposed in the previous section in order to present an
usefulness of using Haar features for the recognition
evaluation as well as comparison with already
process has been studied by Leo et al. [15].
established techniques. Furthermore, evaluation is
The Haar wavelet operates on data by calculating
carried out, as presented in the previous section. The
the sums and differences of adjacent elements. We
results from different transformations will also be
have also used the differencing and averaging
presented. The transformations applied were change in
procedure for the filtering out of extrema that were
illumination, rotation, change in viewpoint, blurring,
detected in step 1.
adding noise and scaling.
Wavelets are good as far as scene matching is
required but to extract features that are repetitive other
techniques have to be used. Which was subsequently
37
replaced by the 2-D Laplacian filter, the points that are and Madiha Hussain Malik. Thanks to Scott Ettinger
obtained this way are repetitive to a fair extent even for the initial code of SIFT
for a transformed image.
10. References
7. Conclusion
[1] K.Mikolajczyk and C. Schmid, “A Performance
This section initially highlights the results that were Evaluation of Local Descriptors” IEEE transactions on
acquired by using the proposed technique and in the Pattern Analysis and Machine Intelligence, Vol. 27, No.
10, October 2005
later half it compares three techniques, 1) SIFT, 2) [2] D. Lowe, “Distinctive Image Features from Scale-
KPCA-SIFT (Haar) and 3) KPCA-SIFT (Laplace). It Invariant Keypoints”, International Journal of
is evident from the results that in the cases of rotation Computer Vision, 2004
and change of scale SIFT performs better, while SIFT [3] Y. Ke and R. Sukthankar, “PCA-SIFT: A more
(Haar) also performs better than the KPCA-SIFT Distinctive Representation for Local Image
(Laplace). For a change in viewpoint the KPCA-SIFT Descriptors”, Proc. Conference Computer Vision and
(Laplace) surpasses the performance of SIFT and Pattern Recognition, 511-517, 2004
KPCA-SIFT (Haar). For the Change in Illumination in [4] K.Mikolajczyk and C. Schmid, “Scale & Affine
some cases there is an overlap between the KPCA- Invariant Interest Point Detectors” International Journal
of Computer Vision 60(1), 63-86,2004
SIFT (Laplace) and the SIFT while KPCA-SIFT(Haar) [5] M. Brown and D. Lowe, “Invariant Features From
performs better than both of them. Similarly in the Interest Pont Groups”, BMVC 2002
case addition of Gaussian noise to the image the [6] C. Harris and M. Stephens, A combined corner and
KPCA-SIFT (Laplace) performs in a similar way to edge detector, Proc. 4th Alvey Vision Conference., 147-
SIFT varying over a larger interval than SIFT but 151, Manchester, UK, 1988
KPCA-SIFT (Haar) does not perform well. For blurred
images, out of the three techniques only SIFT shows [7] K.Mikolajczyk and C. Schmid, “Indexing based on
some results but, if the blurring increases it fails scale invariant interest points”. In: ICCV. Volume 1,
altogether. While for the comparison of the three 525-531,2001
techniques the graphs have been highlighted in this [8] S. Belongie, J. Malik and J. Puzicha, “Shape Matching
and Object Recognition Using Shape Contexts”, IEEE
section as shown in Figure 4. Transactions. Pattern Analysis and Machine
intelligence, Volume2, no. 4, 509-522, 2002
8. Future work [9] W. Freeman and E. Adelson, “The Design and Use of
Steerable Filters”, IEEE transactions Pattern Analysis
Future work would focus on invariant methods for and Machine intelligence, Volume13, no. 9, 891-906,
extrema detection and an efficient method for 1991
orientation assignment. The computational burden on [10] F. Schaffalitzky and A. Zisserman, “Multi-View
Matching for Unordered Image Sets”, Proc. 7th
the keypoint filtration step can be reduced further by European Conference Computer Vision, 414-431, 2002
carefully selecting the initial keypoints. Besides this [11] L. Van Gool, T. Moons, and D. Ungureanu, “Affine/
techniques like kernel discriminant analysis (KDA) Photometric Invariants for Planar Intensity Patterns”,
can be used for the descriptor computation which is a Proc. 4th European Conference Computer Vision, 642-
non-linear discriminating technique based on the 651, 1996
kernel technique. It can be used for extracting the non- [12] V. Vapnik, Statistical Learning Theory, Wiley, New
linear discriminating features. Coloured images can be York, 1998
used for the detection of objects which would make it
more extensive but, also easier to manage and identify [13] B. Scholkopf, A. Smola, and K. -R. Muller, Nonlinear
objects. It can also be extended to be used for object Component Analysis as a Kernel Eigenvalue Problem,
Neural Computation, Vol. 10, n. 5, pp. 1299-1319,
recognition in videos. 1998.
[14] S. Agarwal and D. Roth, “Learning a sparse
9. Acknowledgement representation for object detection”. In Proc. European
Conference on Computer Vision, 113-130, 2002
The authors acknowledge the support and help [15] M. Leo, T. D’Orazio, P. Spagnolo, A. Distante,
provided by Asad Ali, Rubina Sultan, Nagina Hassan “Wavelet and ICA preprocessing for ball recognition in
soccer images”
38
(a) (b) (c) (d) (e)
Figure 3: Matching with 0° rotation (a) Matching with -3° and 3° rotation (b) Matching with Illumination level set at
50 (c) Matching with Change in viewpoint (d) Matching with added noise (e)
1 1
SIFT SIFT
KPCA-SIFT(Laplace) KPCA-SIFT(Laplace)
0.95 KPCA-SIFT(Haar)
KPCA-SIFT(Haar)
0.95
0.9
0.9
1-Precision
1-Precision
0.85
0.8
0.85
0.75
0.8
0.7
0.65 0.75
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0 0.05 0.1 0.15 0.2 0.25
(a) Recall
(b) Recall
1 1.005
SIFT SIFT
0.9 KPCA-SIFT(Laplace) 1 KPCA-SIFT(Laplace)
KPCA-SIFT(Haar) KPCA-SIFT(Haar)
0.8
0.995
0.7
0.99
0.6
1-Precision
1-Precision
0.985
0.5
0.98
0.4
0.975
0.3
0.97
0.2
0.1 0.965
0 0.96
0 0.1
(c)
0.2 0.3 0.4
Recall
0.5 0.6 0.7 0.8 0 0.005 0.01
(d) 0.015 0.02
Recall
0.025 0.03 0.035 0.04
1 1
SIFT SIFT
0.9 KPCA-SIFT(Laplace) 0.9 KPCA-SIFT(Laplace)
KPCA-SIFT(Haar) KPCA-SIFT(Haar)
0.8 0.8
0.7 0.7
0.6 0.6
1-Precision
1-Precision
0.5 0.5
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
0 0.1 0.2 0.3 (e) 0.4 0.5
Recall
0.6 0.7 0.8 0.9 1 0 0.005 0.01 (f)0.015 0.02
Recall
0.025 0.03 0.035 0.04
Figure 4(left to right): SIFT, Proposed Technique and SIFT (Haar) on matching tasks where the images have been:
rotated (a) scaled (b) Gaussian noise has been added (c) illumination has been changed (d) viewpoint has been
changed (d) blurred (f)
39

(2008) Object Recognition by Modified Scale Invariant Feature Transform

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

(2008) Object Recognition by Modified Scale Invariant Feature Transform

Transféré par

Droits d'auteur :

Formats disponibles

2008 Third International Workshop on Semantic Media Adaptation and Personalization

Object Recognition by Modified Scale Invariant Feature Transform

Abstract moment matrix. Although, Harris corner detector is

978-0-7695-3444-2/08 $25.00 © 2008 IEEE 33

location and orientation of the query image. The 150

probability of getting the correct matches is greater 200

than false matches. 250

We propose a more accurate and robust Keypoint

detection and orientation assignment is carried out 500

Keypoints Localization 400

using Laplacian Filter 450

Keypoints Descriptor 3.2. Local Image Descriptor

Where K is the NxN matrix which is the dot

Vous aimerez peut-être aussi