Académique Documents
Professionnel Documents
Culture Documents
: Advanced Features
Sebastian Thrun Gary Bradski
http://robots.stanford.edu/cs223b/index.html
Readings
This lecture is in 2 separate parts: A - Fourier, Gabor, SIFT and B - Texture and other operators. B is optional due to time limitations. Good to look through nevertheless. Read: Computer Vision, Forsyth & Ponce
Chapters 7 and (optional for texture) 9 but do it lightly just for the gist. David G. Lowe, Distinctive Image Features from Scale-Invariant Keypoints, IJCV04.
Just read/take notes on basic flow of the algorithm.
W. Freeman and E. Adelson, The Design and Use of Steerable Filters, IEEE Trans. Patt. Anal. and Machine Intell., Vol. 13, No. 9.
Read pages 1-15.
Pixels
Ix: [-1
1],
0 0 0
Fourier Transform 1
Foundational trick: represent signal/data in terms of an orthogonal basis. For example, a vector v in 3 space can be represented as a projection onto 3 orthonormal vectors: In the same way, a function can be represented as a point projected into a space of (infinitely many) orthogonal functions. For Fourier transforms, we project a function into a space of cos and sin
Fourier Transform 2
Fourier transform is defined as continuous Inverse transform gets rid of freq. components In general, Fourier transform is complex The Fourier Spectrum is then The Phase is then We often view the Power Spectrum
Fourier Properties
Fourier Transform:
Is linear Its spatial scale is inverse to frequency Shift goes to phase change Fourier Transform Symmetries are:
Convolution Property
How do we keep the copies apart? Sample at at least twice the signals band limit frequency => Niquist Criterion
[c !
2D DFT
Discrete Fourier Transform (DFT)
Inverse DFT
Optimally implemented on serial machines via the Fast Fourier Transform (FFT), DFT is faster on parallel machines.
10
Fourier Examples
Raw Image Sinusoid, higher frequency Fourier Amplitude DC term + side lobes wide spacing
Sinusoid, tilted
Titled spectrum
11
Images from Steve Lehar http://cns-alumni.bu.edu/~slehar An Intuitive Explanation of Fourier Theory
e i2T ux vy
example, real part Fu,v(x,y) Fu,v(x,y)=const. for (ux+vy)=const. Vector (u,v) Magnitude gives frequency Direction gives orientation.
12
Slides from Marc Pollefeys, Comp 256 lecture 7
13
Slides from Marc Pollefeys, Comp 256 lecture 7
14
Slides from Marc Pollefeys, Comp 256 lecture 7
Fourier Filtering
Multiply by a filter in the frequency domain => convolve with the fiter in spatial domain.
Fourier Amplitude
15
Images from Steve Lehar http://cns-alumni.bu.edu/~slehar An Intuitive Explanation of Fourier Theory
Fourier Lens
Remember that Fourier transform takes delta functions to uniform, and uniform to delta? Well, when focused at infinity (parallel rays to a point), so do lenses!
16
Figures from Steve Lehar http://cns-alumni.bu.edu/~slehar An Intuitive Explanation of Fourier Theory
Magnitude and Phase: Reconstruct (inverse FFT) mixing the magnitude and phase images
Phase Wins
17
Phase coherence is maximal at corner points of triangle and trapezoid waves too
Triangle Wave
Trapezoid Wave
18
Images: Peter Kovesi, Proc. VIIth Digital Image Computing: Techniques and Applications, Sun C., Talbot H., Ourselin S. and Adriaansen T. (Eds.), 10-12 Dec. 2003, Sydney
Morrone defined a measure that at absolute phase coherence will be 1 everything points in the same direction -- and for no phase coherence will be zero. Local maximums indicate edges and corners, insensitive to contrast in the image.
In practice, these local components are calculated with Gabor filters at several orientations that can yield oriented edges and corners.
19
Images: Peter Kovesi, Proc. VIIth Digital Image Computing: Techniques and Applications, Sun C., Talbot H., Ourselin S. and Adriaansen T. (Eds.), 10-12 Dec. 2003, Sydney
20
Images: Peter Kovesi, Proc. VIIth Digital Image Computing: Techniques and Applications, Sun C., Talbot H., Ourselin S. and Adriaansen T. (Eds.), 10-12 Dec. 2003, Sydney
In 1946, Gabor suggested representing signals over space and time called Information diagrams. He showed that a Gaussian occupies minimal area in such diagrams. Time and Frequency analysis are the two extremes of such an analysis.
21
2D Gabor filter:
Depending on ones task (object ID, texture analysis, tracking,) one must then decide what size filters, in what orientations and what frequencies to use.
23
A graph of such Jets (Elastic Graph Matching) has proven to be a good primitive for object recognition.
L. Wiskott, J-M. Fellous, N. Kuiger, C. Malsburg, Face Recognition by Elastic Bunch Graph Matching, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.19(7), July 1997, pp. 775-779. 25 Image from Laurenz Wiskott, http://itb.biologie.hu-berlin.de/~wiskott/
BayesNet Facial Model Instead of an Malsburg Elastic Graph Model (EGM). Pose variable added Results: BN Pose Face Rec. vs. EGM
Pose
26
Gang Song, Tao Wang, Yimin Zhang, Wei Hu, Guangyou Xu, Gary Bradski, Face Modeling and Recognition Using Bayesian Networks, Submitted to CVPR 2004
Scale
3D to 2D Perspective projections give widely varying scale for the same object. Computer vision needs to address scale. Gabor discussion above addressed image scale via the sigma of the modulating Gaussians and the frequency of the complex sinusoid. We can directly deal with scale by repeatedly down-sampling the image to look for courser and courser patterns. We call this scale space, or Image Pyramids
27
Image Pyramids
Commonly, we down-sample by 2 or sqrt(2). Sqrt(2) obviously calls for inter-pixel interpolation
Gaussian blur
Gaussian Pyramid
Laplacian Pyramid
For down-sample by 2, typical Gaussian sigma is 1.4. For Sqrt(2) sigma is typically the sqrt(1.4).
28
Laplacian Pyramid ~ Error Pyramid
Steerability
Bill Freeman, in his 1992 Thesis determined the necessary conditions for Steerability -- the ability to synthesize a filter of any orientation from a linear combination of filters at fixed orientations. The simplest example of this is oriented first derivative of Gaussian filters, at 0o and 90o:
Filter Set:
Response:
Raw Image
Taken from: W. Freeman, T. Adelson, The Design and Use of Sterrable Filters, IEEE Trans. Patt, Anal. and Machine Intell., vol 13, #9, pp 891-900, Sept 1991
29
Steerability
Freeman showed that any band limited signal could form a steerable basis with as many bases as it had non-zero Fourier coefs. Important example is 2nd derivative of Gaussian (~Laplacian):
30
Taken from: W. Freeman, T. Adelson, The Design and Use of Steerable Filters, IEEE Trans. Patt, Anal. and Machine Intell., vol 13, #9, pp 891-900, Sept 1991
Steerable Pyramid
We may combine Steerability with Pyramids to get a Steerable Laplacian Pyramid as shown below
Decomposition
High pass, since band pass in pyramid low pass at bottom.
Reconstruction
Low Pass
31
Images from: http://www.cis.upenn.edu/~eero/steerpyr.html
33
34
Images from: David G. Lowe, Object recognition from local scale-invariant features, International Conference on Computer Vision, Corfu, Greece (September 1999), pp. 1150-1157
Use the gradients to only keep corner like peaks in manner similar to Harris corner detector: At each peak location and scale, use gradients to form slip tolerant orientation histogram recognition keys:
35
Eqns from: David G. Lowe, Object recognition from local scale-invariant features, International Conference on Computer Vision, Corfu, Greece (September 1999), pp. 1150-1157
36
37
38
1) David G. Lowe, Distinctive Image Features from Scale-Invariant Keypoints, Submitted to International Journal of Computer Vision. Version date: June 2003
39 2) Josef Sivic and Andrew Zisserman, Video Google: A Text Retrieval Approach to Object Matching in Videos, ICCV 2003
Log-Polar Transform
Go from Euclidian (x,y) to log-polar space log(reiU) => (log r, U) space. Log-polar transform is always done relative to a chosen center point (xc,yc): U
1) Images, further advances in: George Wolberg, Siavash Zokai, ROBUST IMAGE REGISTRATION USING LOG-POLAR TRANSFORM, ICIP 2000
U (xc,yc) r
U (xc,yc) r
x
Log-Polar
log r
x
Log-Polar
log r
Rotation and scale are converted to shifts along the U or log r axis. Shifting back to a canonical location gives rotation and scale invariance. If used on a Fourier image (translation invariant), we get rotation, scale and translation invariance (called Fourier-Mellin transform)1.
40
Bilateral Filtering
We want smoothing that preserves edges.
Typically done via P. Perona and J. Malik anisotropic diffusion. More clever is the Tomasi and Manduchi* approximation: Rather than just convolve with a Gaussian in space the convolution weights use a Gaussian in space together with a Gaussian in gray level values.
41
* C. Tomasi and R. Manduchi, "Bilateral Filtering for Gray and Color Images", Proceedings of the 1998 IEEE International Conference on Computer Vision, Bombay, India
Best explanation is Grossberg and Mingolla edge detectors need to be shut off, performed by competitive inhibition. When weaker edges meet stronger, the weaker edge is suppressed breaking the dikes that hold back the diffusion process. When the edges are disconnected, the illusion goes away or is diminished below:
42
Grossberg, S., & Mingolla, E. (1985). Neural Dynamics of Form Perception: Boundary Completion. Psychol. Rev., 92, 173--211.
43
44
Computer vision often misses the fact that vision is an active sense
These lines are straight Nothing is moving here
45