Académique Documents
Professionnel Documents
Culture Documents
https://sites.google.com/site/deeplearningcvpr2014
Tutorial Overview
Basics
https://sites.google.com/site/deeplearningcvpr2014
Introduction
Supervised Learning
Unsupervised Learning
- Honglak Lee
- MarcAurelio Ranzato
- Graham Taylor
Libraries
Torch7
Theano/Pylearn2
CAFFE
- MarcAurelio Ranzato
- Ian Goodfellow
- Yangqing Jia
Advanced topics
Object detection
Regression methods for localization
Large scale classification and GPU parallelization
Learning transformations from videos
Multimodal and multi task learning
Structured prediction
- Pierre Sermanet
- Alex Toshev
- Alex Krizhevsky
- Roland Memisevic
- Honglak Lee
- Yann LeCun
Image
feature
representation
(hand-crafted)
Learning
Algorithm
(e.g., SVM)
Low-level
vision features
(edges, SIFT, HOG, etc.)
Object detection
/ classification
SIFT
HoG
Spin image
Textons
Motivation
Features are key to recent progress in recognition
Multitude of hand-designed features currently in use
Where next? Better classifiers? building better features?
Felzenszwalb, Girshick,
McAllester and Ramanan, PAMI 2007
Slide: R. Fergus
Slide: R. Fergus
Mid-Level Representations
Mid-level cues
Continuation
Parallelism
Junctions
Corners
Object parts:
Image/Video
Pixels
Layer 1
Layer 2
Layer 3
Simple
Classifier
3rd layer
Objects
2nd layer
Object parts
1st layer
Edges
Pixels
Unsupervised Learning
Learn statistical structure or dependencies of the data
from unlabeled data
Layer-wise training
Useful when the amount of labels is not large
Deep
T Neural Net
Logistic Regression
Perceptron
Shallow
Deep
Denoising Autoencoder
Sparse coding*
Unsupervised
Supervised Learning
Slide: R. Fergus
Feed-forward:
Convolve input
Non-linearity (rectified linear)
Pooling (local max)
Supervised
Train convolutional filters by
back-propagating classification error
Pooling
Non-linearity
Convolution
(Learned)
Slide: R. Fergus
Filter with
Dictionary
+ Non-linearity
(convolutional
or tiled)
Spatial/Feature
(Sum or Max)
[Optional]
Normalization
between
feature
responses
Output
Features
Slide: R. Fergus
Filtering
Convolutional
Input
Feature Map
Slide: R. Fergus
Non-Linearity
Non-linearity
Per-element (independent)
Tanh
Sigmoid: 1/(1+exp(-x))
Rectified linear
Simplifies backprop
Makes learning faster
Avoids saturation issues
Preferred option
Slide: R. Fergus
Pooling
Spatial Pooling
Non-overlapping / overlapping regions
Sum or max
Boureau et al. ICML10 for theoretical analysis
Max
Sum
Slide: R. Fergus
Normalization
Contrast normalization (across feature maps)
Local mean = 0, local std. = 1, Local 7x7 Gaussian
Equalizes the features maps
Feature Maps
Feature Maps
After Contrast Normalization
Slide: R. Fergus
Apply
Gabor filters
Spatial pool
(Sum)
Normalize to
unit length
Feature
Vector
Slide: R. Fergus
Applications
Handwritten text/digits
MNIST (0.17% error [Ciresan et al. 2011])
Arabic & Chinese [Ciresan et al. 2012]
Slide: R. Fergus
Application: ImageNet
Validation classification
Validation classification
Validation classification
30
25
20
15
10
0
SuperVision
ISI
Oxford
INRIA
Amsterdam
Slide: R. Fergus
0.17
0.16
0.15
0.14
0.13
0.12
0.11
0.1
Slide: R. Fergus
Feature Generalization
Pre-train on
Imagnet
(Caltech-101,256)
(Caltech-101, SunS)
(VOC 2012)
(lots of datasets)
6 training
examples
Retrain classifier
on Caltech256
CNN features
Slide: R. Fergus
Industry Deployment
Used in Facebook, Google, Microsoft
Image Recognition, Speech Recognition, .
Fast at test time
Slide: R. Fergus
Unsupervised Learning
Unsupervised Learning
Model distribution of input data
Can use unlabeled data (unlimited)
Can be refined with standard supervised
techniques (e.g. backprop)
Useful when the amount of labels is small
Unsupervised Learning
Main idea: model distribution of input data
Reconstruction error + regularizer (sparsity, denoising, etc.)
Log-likelihood of data
Models
Example: Auto-Encoder
Output Features
Feed-back /
generative /
top-down
path
Decoder
Encoder
Feed-forward /
bottom-up path
Slide: R. Fergus
Stacked Auto-Encoders
Class label
Decoder
Encoder
Features
Decoder
Encoder
Features
Decoder
Bengio et al., NIPS07;
Vincent et al., ICML08;
Ranzato et al., NIPS07
Encoder
Input Image
Slide: R. Fergus
At Test Time
Class label
Encoder
Remove decoders
Features
Encoder
Encoder
Bengio et al., NIPS07;
Vincent et al., ICML08;
Ranzato et al., NIPS07
Input Image
Slide: R. Fergus
Natural Images
50
100
150
200
50
250
100
300
150
350
200
400
250
50
300
100
450
500
50
100
150
200
350
250
300
350
400
450
150
500
200
400
250
450
300
500
50
100
150
350
200
250
300
350
100
150
400
450
500
400
450
500
50
200
250
300
350
400
450
500
Test example
~ 0.8 *
+ 0.3 *
+ 0.5 *
x
~ 0.8 * w36 + 0.3 *
w42
+ 0.5 * w65
[0, 0, , 0, 0.8, 0, , 0, 0.3, 0, , 0, 0.5, ] Compact & easily
= coefficients (feature representation) interpretable
[Olshausen & Field, Nature 1996, Ranzato et al., NIPS 2007; Lee et al., NIPS 2007;
Lee et al., NIPS 2008; Jarret et al., CVPR 2009; etc.]
[Olshausen & Field, Nature 1996, Ranzato et al., NIPS 2007; Lee et al., NIPS 2007;
Lee et al., NIPS 2008; Jarret et al., CVPR 2009; etc.]
Eye detector
Shrink
(max over 2x2)
filter1
Filtering
output
filter2
filter4
filter3
Input image
W3
Layer 2
W2
Layer 1
Layer 1 activation (coefficients)
W1
Filter
visualization
Input image
Example image
Cars
Elephants
Chairs
Applications:
Object recognition (Lee et al., ICML09, Sohn et al., ICCV11; Sohn et al., ICML13)
Verification (Huang et al., CVPR12)
Cf. Convnet [Krizhevsky et al., 2012];
Image alignment (Huang et al., NIPS12)
Deconvnet [Zeiler et al., CVPR 2010]
Slide: M. Ranzato
Achieved state-of-the-art
performance on Imagenet
classification 10k categories
Le et al. Building high-level features using
large-scale unsupervised learning, 2011
Unsupervised models
Work well given limited amounts of labels.
Promise of exploiting virtually unlimited amount
of data without need of labeling
Summary
Deep Learning of Feature Hierarchies
showing great promises for computer vision problems
Tutorial Overview
Basics
https://sites.google.com/site/deeplearningcvpr2014
Introduction
Supervised Learning
Unsupervised Learning
- Honglak Lee
- MarcAurelio Ranzato
- Graham Taylor
Libraries
Torch7
Theano/Pylearn2
CAFFE
- MarcAurelio Ranzato
- Ian Goodfellow
- Yangqing Jia
Advanced topics
Object detection
Regression methods for localization
Large scale classification and GPU parallelization
Learning transformations from videos
Multimodal and multi task learning
Structured prediction
- Pierre Sermanet
- Alex Toshev
- Alex Krizhevsky
- Roland Memisevic
- Honglak Lee
- Yann LeCun