Vous êtes sur la page 1sur 24

INTRODUCTION

In our world vision plays a very important role and computers are slowly catching up
with the qualities of human vision with the help of image descriptors.

In the early days image descriptors were based on low-level features but nowadays
the descriptors are approaching image analysis from a higher level, resulting in image
descriptors that are based on, for instance, salient details or image patches.

Interest points are a specific kind of salient details, which describe locations in an
image that are interesting in a certain way.

TOP-SURF, is an image descriptor that combines interest points with visual words. It
harnesses the high level qualities of interest points, while significantly reducing the
memory needed to represent and compare images.

Because TOP-SURF is based on SURF, let us discuss more about SURF.

Types of variance
Illumination
Scale
Rotation
Affine
Perspective
We want to find Repeatability features
2
Challenges in computer vision
Literature Survey

Sl No Description Author(s) Year
1 SURF: Speeded Up Robust
Features
Herbert Bay, Andreas
Ess, Tinne Tuytelaars,
Luc Van Gool
2008
2 TOP-SURF: A Visual Words
Toolkit
Bart Thomee, Erwin M.
Bakker, Michael S. Lew
2012
Existing System


SURF (Speeded Up Robust Features)

It a robust local feature detector, first presented by Herbert Bay et al. in 2006, that can
be used in computer vision tasks like object recognition or 3D reconstruction.

It is partly inspired by the SIFT descriptor.

SURF is based on sums of 2D Haar wavelet responses and makes an efficient use of integral
images.

For features, it uses the sum of the Haar wavelet response around the point of interest. Again,
these can be computed with the aid of the integral image.
1> Representing an image

The SURF technique uses a Hessian matrix-based measure for the
detection of interest points and a distribution of Haar wavelet responses
within the interest point neighbourhood as descriptor.

An image is analyzed at several scales, so interest points can be
extracted from both global (coarse) and local (fine) image details.

Additionally, the dominant orientation of each of the interest points is
determined to support rotation-invariant matching.
Detected interest points in an image, including their
orientation and scale.
Hessian matrix interest points

SURF uses a hessian based blob detector to nd interest
points. The determinant of a hessian matrix expresses the extent of the
response and is an expression of the local change around the area.
L
xx
(x,) iis the convolution of the image.

The convolutions is very costly to calculate and it is approximated and
speeded-up with the use of integral images and approximated kernels.

In the illustration grey areas corresponds to 0 in
the kernel where as white are positive and
black are negative. This way it is possible to
calculate the approximated convolution
effectively for arbitrarily sized kernel utilizing
the integral image.
The w term is theoretically sensitive to scale but it can be kept constant at 0.9.
2> Matching an image

When enough interest points in the first image match with those in the second image
the images are likely to depict the same scene or object(s).

To determine these matches SURF uses the nearest neighbour ratio matching technique.

A match is found between a point in the first image and a point in the second image if the
distance between them is closer than 0.65 times the distance when any other point in the
second image is considered.
Ex: Near-duplicate images: original
image (left), framing (middle),
overlaid text and logo (right).
Matching with a 10
o
rotated version.
Proposed System

TOP-SURF

When considering to find matches in collections containing millions of images it
is clear that using the SURF method in its default form is storage-wise infeasible. One of
our reasons for developing TOP-SURF was to overcome this issue by significantly reducing
the descriptor size.

1> Representing an image
Several steps need to be performed in order to calculate the TOPSURF descriptor of an
image.

1.1> Representative interest points

The aim is to compose a general purpose imagery set that would be representative for
the kind of images used by researchers and students in content-based image retrieval.

For each of these images we extract their SURF interest points and randomly chose 25
points.
1.2> Clustering into visual words

Here we devise an approach based on the bag-of-words technique of Philbin et al.
to group the collection of representative interest points into a number of clusters.

For each cluster we determine its 100 nearest neighbours, i.e. its closest interest points. If a
point is close to multiple clusters we only assign it to the cluster that it was closest to.

Because discovering the exact nearest neighbours in such a high dimensional space is very
time consuming, we use an approximate nearest neighbours technique based on a forest of
randomized kd-trees to speed up this process.

The final clusters are commonly referred to as the visual word dictionary.






1.3>Selecting the most descriptive visual words

Given a particular total number of available visual words, we can now calculate
the TOP-SURF descriptor of an image.

First we extract its regular SURF descriptor. We then convert the detected points into a
frequency histogram of occurring visual words, by analyzing which visual word each
interest point is most similar to.

Next, we apply the tf-idf weighting to assign a score to all the visual words in the histogram.

To form our image descriptor we finally select the highest scoring visual words. Because we
only use the top N visual words we call the descriptor as TOPSURF.
The histogram of the 25 highest scoring visual words of the
image shown above when using a dictionary of 200,000
visual words.
2> Matching an image

To compare the TOP-SURF descriptors of two images we determine the normalized
cosine similarity d
cos
between their tf idf histograms T
A
and T
B


A distance of 0 means the descriptors are identical and a distance of 1 means they are
completely different.

Note that, by definition, comparisons with an image in which zero interest points have been
detected will always result in a distance of 1, which is the desired behaviour.
2> Steps involved in TOP SURF
Consider the following images
Near-duplicate images: original image (left), framing (middle), overlaid text and logo (right).
Interest points for original image
After extracting descriptors we get the following
Interest points for framed image
Interest points for framed image
When we compare the descriptors we get the following result
Comparison Results
Applications
Object tracking.
Object detection.
Due to its small descriptor size they are very useful for mobile and embedded
applications.
It is used in automatic robotic navigation based on real time video input.
3D reconstruction.
Conclusion
TOP SURF reduces the amount of space used to store the image descriptors.
It uses kd trees for fast indexing.
It reduces the time required for matching.
It overcomes the computer vision challenges such as rotation, perspective,
affine,..
It has high repeatability.
Instead of combining interest points with visual words, we can combine visual
words into visual phrases, that may open up possibilities for improved
matching of objects.
Future Enhancements
References
http://en.wikipedia.org/wiki/Scale-invariant_feature_transform

http://en.wikipedia.org/wiki/SURF

http://press.liacs.nl/researchdownloads/topsurf/

http://www.vision.ee.ethz.ch/~surf/eccv06.pdf

Herbert Bay, Andreas Ess, Tinne Tuytelaars, Luc Van Gool, "SURF: Speeded Up
Robust Features", Computer Vision and Image Understanding (CVIU), Vol. 110, No.
3, pp. 346--359, 2008

Vous aimerez peut-être aussi