Vous êtes sur la page 1sur 45

CS 223-B Part A 223Lect.

: Advanced Features
Sebastian Thrun Gary Bradski

http://robots.stanford.edu/cs223b/index.html

Readings
This lecture is in 2 separate parts: A - Fourier, Gabor, SIFT and B - Texture and other operators. B is optional due to time limitations. Good to look through nevertheless. Read: Computer Vision, Forsyth & Ponce
Chapters 7 and (optional for texture) 9 but do it lightly just for the gist. David G. Lowe, Distinctive Image Features from Scale-Invariant Keypoints, IJCV04.
Just read/take notes on basic flow of the algorithm.

W. Freeman and E. Adelson, The Design and Use of Steerable Filters, IEEE Trans. Patt. Anal. and Machine Intell., Vol. 13, No. 9.
Read pages 1-15.

Left over questions


Calibration question the optimization is based on gradient descent iterations which depend on finding a good initial starting guess. How do we scale image derivatives?? Great question
Images exist as brightness values over pixels. What are the units then of a simple derivative operator like [-1 0 1]? 1-D image: In the features lecture, we only wanted to find edges (identification), but what if we had instead wanted to make measurements? In optical flow, we end up wanting to calculate the velocity v which is found (in the optical flow lecture) to be equal to It, the temporal derivative (image difference) I(t+1) I(t) which is in pixels divided by the spatial derivative Ix in brightness/pixel vx [pixels] = It / Ix [brightness/(brightness/pixel)] Oops! Our derivative is a factor of 2 too great => NEED TO NORMALIZE: Ix: [-1/2 0 1/2].
3 Brightness

Pixels

Ix: [-1

1],

the spatial derivative, has units 2*brightness/pixels

Sobel 1/8 operator 2/8 needs to be normalized 1/8

0 0 0

-1/8 -2/8 -1/8

Good Features beat Good Algorithms


For tasks such as recognition, tracking, and segmentation, experience shows: With the right features, all algorithms will work well. With the wrong features, good algorithms will work marginally better than bad/simple algorithms, but it wont work well.
4

Fourier Transform 1
Foundational trick: represent signal/data in terms of an orthogonal basis. For example, a vector v in 3 space can be represented as a projection onto 3 orthonormal vectors: In the same way, a function can be represented as a point projected into a space of (infinitely many) orthogonal functions. For Fourier transforms, we project a function into a space of cos and sin

Intuitively, how do we know this sin, cos basis is orthogonal?


Sin or Cos periodically spend as much time above as below the axis. If the frequency is mismatched, the functions will cancel each other out over minus to plus infinity.
Formally, one could use To prove
5

* Eqns from Computer Vision IT412

Fourier Transform 2
Fourier transform is defined as continuous Inverse transform gets rid of freq. components In general, Fourier transform is complex The Fourier Spectrum is then The Phase is then We often view the Power Spectrum

Fourier Properties
Fourier Transform:
Is linear Its spatial scale is inverse to frequency Shift goes to phase change Fourier Transform Symmetries are:

* Is the complex conjugate

Convolution Property

Note that scale property implies delta function goes to uniform

Fourier Discrete (DFT)


Animals and Machines live in a discrete world. To move the continuous Fourier world to its discrete version, we sample => Multiply by infinite series of delta functions spaced ( apart => Convolve with a uniform function inversely spaced 1 / (

Fourier Discrete (DFT) 2


All real world signals are band limited That is, they dont have infinite frequencies nor infinite spatial extend. This is good, otherwise our discrete Fourier copies would collide and alias together. But, what if we still sample too seldom? Even band limited will eventually collide.

How do we keep the copies apart? Sample at at least twice the signals band limit frequency => Niquist Criterion

[c !

1 2( where ( is our sample interval.

2D DFT
Discrete Fourier Transform (DFT)

Inverse DFT

Optimally implemented on serial machines via the Fast Fourier Transform (FFT), DFT is faster on parallel machines.

10

Fourier Examples
Raw Image Sinusoid, higher frequency Fourier Amplitude DC term + side lobes wide spacing

Sinusoid, lower frequency

DC term+ side lobes close spacing

Sinusoid, tilted

Titled spectrum
11
Images from Steve Lehar http://cns-alumni.bu.edu/~slehar An Intuitive Explanation of Fourier Theory

More Fourier Examples


Fourier basis element

e i2T ux vy
example, real part Fu,v(x,y) Fu,v(x,y)=const. for (ux+vy)=const. Vector (u,v) Magnitude gives frequency Direction gives orientation.

12
Slides from Marc Pollefeys, Comp 256 lecture 7

More Fourier Examples


Here u and v are larger than in the previous slide.

13
Slides from Marc Pollefeys, Comp 256 lecture 7

More Fourier Examples


And larger still...

14
Slides from Marc Pollefeys, Comp 256 lecture 7

Fourier Filtering

Multiply by a filter in the frequency domain => convolve with the fiter in spatial domain.
Fourier Amplitude

15
Images from Steve Lehar http://cns-alumni.bu.edu/~slehar An Intuitive Explanation of Fourier Theory

Fourier Lens
Remember that Fourier transform takes delta functions to uniform, and uniform to delta? Well, when focused at infinity (parallel rays to a point), so do lenses!

A lens approximates a Fourier transform processed at the speed of light

16
Figures from Steve Lehar http://cns-alumni.bu.edu/~slehar An Intuitive Explanation of Fourier Theory

Phase Caries More Information


Raw Images:

Magnitude and Phase: Reconstruct (inverse FFT) mixing the magnitude and phase images

Phase Wins
17

Phase Coherence for Feature Detection?


Note that the Fourier components for a square wave cohere (are in phase) at the step junction Here, they must all pass through zero right at the step edge, and achieve local maximums at the corners.

Phase coherence is maximal at corner points of triangle and trapezoid waves too

Triangle Wave

Trapezoid Wave

18

Images: Peter Kovesi, Proc. VIIth Digital Image Computing: Techniques and Applications, Sun C., Talbot H., Ourselin S. and Adriaansen T. (Eds.), 10-12 Dec. 2003, Sydney

Phase Coherence for Feature Detection


Gist of the idea: Fourier transform yields a series of real and imaginary sinusoidal terms. At any point x, the local Fourier components will each have an amplitude An(x) and a phase angle n(x). Vector addition of these terms yields an vector E(x) at the average phase angle.

Morrone defined a measure that at absolute phase coherence will be 1 everything points in the same direction -- and for no phase coherence will be zero. Local maximums indicate edges and corners, insensitive to contrast in the image.

In practice, these local components are calculated with Gabor filters at several orientations that can yield oriented edges and corners.

19

Images: Peter Kovesi, Proc. VIIth Digital Image Computing: Techniques and Applications, Sun C., Talbot H., Ourselin S. and Adriaansen T. (Eds.), 10-12 Dec. 2003, Sydney

Phase Coherence for Feature Detection


Comparison of phase vs. Harris Corner detector. Harris response varies by 2 or more orders of magnitudethreshold? Phase can only vary between 0 and 1 and is not sensitive to contrast or lighting.

20
Images: Peter Kovesi, Proc. VIIth Digital Image Computing: Techniques and Applications, Sun C., Talbot H., Ourselin S. and Adriaansen T. (Eds.), 10-12 Dec. 2003, Sydney

Gabor filters and Jets


Global information is used for physical systems identification.
Impulse response of a centrifuge to identify resonance points which indicate which spin frequencies to avoid.

Local information is used for physical signal analysis.


In images, it is the relationship of details that matter, not (usually) things like average brightness.

In 1946, Gabor suggested representing signals over space and time called Information diagrams. He showed that a Gaussian occupies minimal area in such diagrams. Time and Frequency analysis are the two extremes of such an analysis.
21

Gabor filters and Jets


Gabor filters are formed by modulating a complex sinusoid by a Gaussian function. Gabor filters became popular in vision partly because J.G Daugman (1980, 88, 90) showed that the receptive fields of most orientation receptive neurons in the (cats) brain looked very much like Gabor functions. As with Gabor filters, the brain often makes use of over complete, non-orthogonal functions.
J.G.Daugman, Two dimensional spectral analysis of cortical receptive field profiles, Vision Res., vol.20.pp.847-856.1980 J. Daugman, Complete discrete 2-d gabor transforms by neural network for image analysis and compression, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 36, no. 7, pp. 11691179, 1988. 22 Daugman, J.G. (1990) An informationtheoretic view of analogue representation in striate cortex, Computational Neuroscience, Ed. Schwartz, E. L., Cambridge, MA: MIT Press, 403424.

Gabor filters and Jets


Rotated Gaussian Oriented Complex Sinusoid

2D Gabor filter:

2 2 where W x and W x control the spatial extent of the filter, U is the

orientation of the filter and W is the radial frequency of the sinusoid.

Depending on ones task (object ID, texture analysis, tracking,) one must then decide what size filters, in what orientations and what frequencies to use.
23

Gabor filters and Jets


In practice, once the scales, orientation and radial frequencies are chosen one usually sets up filters in quadrature (90o phase shift) pairs and just empirically normalizes them such that the response is zero to a uniform background. Quadrature pairs, in practice the center point (p,q) is set to (0,0).

The magnitude response is then calculated as:


24

Gabor filters and Jets


Von Der Malsburg organized Gabor filters at multiple scales and orientations in a vector, or Jet

A graph of such Jets (Elastic Graph Matching) has proven to be a good primitive for object recognition.
L. Wiskott, J-M. Fellous, N. Kuiger, C. Malsburg, Face Recognition by Elastic Bunch Graph Matching, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.19(7), July 1997, pp. 775-779. 25 Image from Laurenz Wiskott, http://itb.biologie.hu-berlin.de/~wiskott/

Gabor filters and Jets Example


Gabor Filters used Training and Recognition Flow Chart

BayesNet Facial Model Instead of an Malsburg Elastic Graph Model (EGM). Pose variable added Results: BN Pose Face Rec. vs. EGM
Pose

26
Gang Song, Tao Wang, Yimin Zhang, Wei Hu, Guangyou Xu, Gary Bradski, Face Modeling and Recognition Using Bayesian Networks, Submitted to CVPR 2004

Scale
3D to 2D Perspective projections give widely varying scale for the same object. Computer vision needs to address scale. Gabor discussion above addressed image scale via the sigma of the modulating Gaussians and the frequency of the complex sinusoid. We can directly deal with scale by repeatedly down-sampling the image to look for courser and courser patterns. We call this scale space, or Image Pyramids
27

Image Pyramids
Commonly, we down-sample by 2 or sqrt(2). Sqrt(2) obviously calls for inter-pixel interpolation
Gaussian blur

Gaussian Pyramid

Laplacian Pyramid

For down-sample by 2, typical Gaussian sigma is 1.4. For Sqrt(2) sigma is typically the sqrt(1.4).

Full power 2 pyramid only doubles the number of pixels to process.

28
Laplacian Pyramid ~ Error Pyramid

Steerability
Bill Freeman, in his 1992 Thesis determined the necessary conditions for Steerability -- the ability to synthesize a filter of any orientation from a linear combination of filters at fixed orientations. The simplest example of this is oriented first derivative of Gaussian filters, at 0o and 90o:

Steering Eqn: 0o 90o Synthesized 30o

Filter Set:

Response:
Raw Image

Taken from: W. Freeman, T. Adelson, The Design and Use of Sterrable Filters, IEEE Trans. Patt, Anal. and Machine Intell., vol 13, #9, pp 891-900, Sept 1991

29

Steerability
Freeman showed that any band limited signal could form a steerable basis with as many bases as it had non-zero Fourier coefs. Important example is 2nd derivative of Gaussian (~Laplacian):

30
Taken from: W. Freeman, T. Adelson, The Design and Use of Steerable Filters, IEEE Trans. Patt, Anal. and Machine Intell., vol 13, #9, pp 891-900, Sept 1991

Steerable Pyramid
We may combine Steerability with Pyramids to get a Steerable Laplacian Pyramid as shown below

Decomposition
High pass, since band pass in pyramid low pass at bottom.

Reconstruction

Low Pass

2 Level decomposition of white circle example:

31
Images from: http://www.cis.upenn.edu/~eero/steerpyr.html

Scale Invariant Feature Transform


Idea is to find local features that stay the same (as much as possible) under:
Scale change 2D rotation in the image x,y plane 3D rotation (affine variation) Illumination

Collections of such features can be used for reliable


3D object recognition User interface, toy interface Robot localization, navigation and mapping Digital image stitching, organization 3D scene understanding
32

Scale Invariant Feature Transform


High Level Algorithm
1. Find peak responses (over scale) in Laplacian pyramid. 2. Find response with sub-pixel accuracy. 3. Only keep corner like responses 4. Assign orientation 5. Create recognition signature 6. Solve affine parameters (~3D rot. changes)

33

Scale Invariant Feature Transform


From Gaussian scale pyramid -create Difference of Gaussian (DOG) images

And find maximum response over space and scale:

34
Images from: David G. Lowe, Object recognition from local scale-invariant features, International Conference on Computer Vision, Corfu, Greece (September 1999), pp. 1150-1157

Scale Invariant Feature Transform


At the location and scale of peak found, find the gradient orientation:
Images from: David G. Lowe, Object recognition from local scale-invariant features, International Conference on Computer Vision, Corfu, Greece (September 1999), pp. 1150-1157

Use the gradients to only keep corner like peaks in manner similar to Harris corner detector: At each peak location and scale, use gradients to form slip tolerant orientation histogram recognition keys:

35

Scale Invariant Feature Transform


To account for out of image plane (3D) rotation, solve for affine distortion parameters: For features found, set up system of equations

Which take the form of

. Over determined (least sqrs) solution is then:

Eqns from: David G. Lowe, Object recognition from local scale-invariant features, International Conference on Computer Vision, Corfu, Greece (September 1999), pp. 1150-1157

36

Scale Invariant Feature Transform


Recognition example. Learned models of SIFT features, and got object outline from background subtraction:
Images from: David Lowe, Object Recognition from Local Scale-Invariant Features Proc. of the International Conference on Computer Vision,Corfu (Sept. 1999)

Objects may then be found under occlusion and 3D rotation:

37

Scale Invariant Feature Transform


Image stitching example. Attach images together from keypoints: Solving the homography: Finding similar images in a roll and stitching:
Images from: M. Brown and D. G. Lowe. Recognising Panoramas. In Proceedings of the 9th International Conference on Computer Vision (ICCV2003)

38

Scale Invariant Feature Transform


Localizing Example: Given key images, find and trigger on them1:
Find different views of same scene in video2:

1) David G. Lowe, Distinctive Image Features from Scale-Invariant Keypoints, Submitted to International Journal of Computer Vision. Version date: June 2003

39 2) Josef Sivic and Andrew Zisserman, Video Google: A Text Retrieval Approach to Object Matching in Videos, ICCV 2003

Log-Polar Transform
Go from Euclidian (x,y) to log-polar space log(reiU) => (log r, U) space. Log-polar transform is always done relative to a chosen center point (xc,yc): U
1) Images, further advances in: George Wolberg, Siavash Zokai, ROBUST IMAGE REGISTRATION USING LOG-POLAR TRANSFORM, ICIP 2000

U (xc,yc) r

U (xc,yc) r

x
Log-Polar

log r

x
Log-Polar

log r

Rotation and scale are converted to shifts along the U or log r axis. Shifting back to a canonical location gives rotation and scale invariance. If used on a Fourier image (translation invariant), we get rotation, scale and translation invariance (called Fourier-Mellin transform)1.

40

Bilateral Filtering
We want smoothing that preserves edges.
Typically done via P. Perona and J. Malik anisotropic diffusion. More clever is the Tomasi and Manduchi* approximation: Rather than just convolve with a Gaussian in space the convolution weights use a Gaussian in space together with a Gaussian in gray level values.

41
* C. Tomasi and R. Manduchi, "Bilateral Filtering for Gray and Color Images", Proceedings of the 1998 IEEE International Conference on Computer Vision, Bombay, India

But Bio-Vision is more dynamic


Artifacts of competitive edge/diffusion process: Neon Color Spreading Illusion

Best explanation is Grossberg and Mingolla edge detectors need to be shut off, performed by competitive inhibition. When weaker edges meet stronger, the weaker edge is suppressed breaking the dikes that hold back the diffusion process. When the edges are disconnected, the illusion goes away or is diminished below:

42
Grossberg, S., & Mingolla, E. (1985). Neural Dynamics of Form Perception: Boundary Completion. Psychol. Rev., 92, 173--211.

Local vs. Global


Still, vision is a stranger thing than simple processing:

43

Local vs. Global


Still, vision is a stranger thing than simple processing:

44

Computer vision often misses the fact that vision is an active sense
These lines are straight Nothing is moving here

45

Vous aimerez peut-être aussi