Generalizing Backpropagation To Include Sparse Coding: David M. Bradley and Drew Bagnell

Generalizing Backpropagation to
Include Sparse Coding

David M. Bradley (dbradley@cs.cmu.edu)
and Drew Bagnell
Robotics Institute
Carnegie Mellon University
Outline
Discuss value of modular and deep gradient based
systems, especially in robotics
Introduce a new and useful family of modules
Properties of new family
Online training with non-gaussian priors
E.g. encourage sparsity, multi-task weight sharing
Modules internally solve continuous optimization

problems
Captures interesting nonlinear effects such as inhibition that
involve coupled outputs
Sparse Approximation
Modules can be jointly optimized by a generalization of

backpropagation
Deep Modular Learning systems

Efficiently represent complex functions
Particularly efficient for closely related tasks
Recently shown to be powerful learning machines

Greedy layer-wise training improves initialization
Greedy module-wise training is useful for

designing complex systems
Design and Initialize modules independently
Jointly optimize the final system with backpropagation
Gradient methods allow the incorporation of

diverse data sources and losses
G. Hinton, S. Osindero, and Y. Teh, A fast learning algorithm for deep belief networks., Neural Computation 2006
Y. Bengio, P. Lamblin, H. Larochelle, Greedy layer-wise training of deep networks., NIPS 2007
Y. LeCun, L. Bottou, Y. Bengio and P. Haffner: Gradient-Based Learning Applied to Document Recognition, 1998
Mobile Robot Perception

Ladar
RGB Camera
NIR Camera
Lots of unlabeled data

Hard to define traditional
supervised learning data
Target task is defined by
weakly-labeled structured
output data
Perception Problem: Scene labeling
Cost for each

2-D cell
Motion
Planner
Goal System
Camera
Lighting
Variance Cost
Object Classification Cost

Proprioception
Prediction Cost
Point
Classifier
Max Margin
Planner
Ground Plane
Estimator
Laser
Data Flow
Gradient
Webcam Data
Labelme
Labelme
IMU data
Classification
Cost
Labeled 3-D
points
Motion plans
Human-Driven
Example Paths
Observed Wheel Heights
New Modules
Modules that are important in this system
require two new abilities
Induce new priors on weights
Allow modules to solve internal optimization
problems
Standard Backpropagation assumes L2 prior
Gradient descent with convex loss functions:

Small steps with early stopping imply L 2
regularization
Minimizes a regret bound by solving the optimization:
Which bounds the true regret

M. Zinkevich, Online Convex Programming and Generalized Infinitesimal Gradient Ascent, 03
Alternate Priors
KL-divergence
Useful if many features are irrelevant

Approximately solved with exponentiated gradient
descent
multi-task priors (encourage sharing between

related tasks)
Argyriou and Evgeniou, Multi-task Feature Learning, NIPS 07
Bradley and Bagnell 2008
L2 Backpropagation
Loss Function
c
w1t 1 w1t 1c
Module (M1)
Input
w2t 1 w2t 2c
Module (M2)
c
c
w3t 1 w3t 3c
Module (M3)
c
c
Loss Function
With KL prior modules

Loss Function
c
w1t 1 w1t e 1c
Module (M1)
Input
w2t 1 w2t e 2c
Module (M2)
c
c
w3t 1 w3t 3c
Module (M3)
c
c
Loss Function
General Mirror Descent

Loss Function
c
t 1
1
( w ) c ( w )
t
1
t
1
Module (M1)
Input
Module (M2)
t 1
2
b
( w ) c( w2t )
t
2
w3t 1 ( w3t ) c( w3t )
Module (M3)
c
c
Loss Function
New Modules
Modules that are important in this system
require two new abilities
Induce new priors on weights
Allow modules to solve internal
optimization problems
interesting nonlinear effects such as inhibition
that involve coupled outputs
Inhibition
Input
Basis
Inhibition
Input
Projection
Basis
Inhibition
Input
KL-regularized Optimization
Basis
Assumes the input is a sparse combination of
elements, plus observation noise
Many possible elements
Only a few present in any particular example
True for many real-world signals

Many applications
Compression (JPEG), Sensing (MRI), Machine
Learning
Produces effects observed in biology

V1 receptive fields, Inhibition
Tropp et al. Algorithms For Simultaneous Sparse Approximation, 2005
Raina et al. Self Taught Learning: Transfer Learning from unlabeled data, ICML 07
Olhausen and Field, Sparse Coding of Natural Images Produces Localized, Oriented, Bandpass Receptive Fields, Nature 95
Doi and Lewicki, Sparse Coding of natural images using an overcomplete set of limited capacity units, NIPS 04
Semantic meaning
is sparse
Visual Representation is Sparse (JPEG)
MNIST Digits Dataset
60,000 28x28 pixel handwritten digits

10,000 reserved for a validation set
Separate 10,000 digit test set
Basis Coefficients (w1)
Error
gradient
Input
Reconstruction Error
(Cross Entropy)
r1=Bw
KL-regularized Coefficients on a KL-regularized Basis
Input
Output
Sparse Coding
Basis Coefficients (w(i))
r=Bw(i)
Input
Training
Examples
Reconstruction Error
(Cross Entropy)
Minimize over W and B
Optimization Modules
L1 Regularized Sparse Approximation
Reconstruction Loss
Regularization Term
Convex
L1 Regularized Sparse Coding

Not Convex
Lee et al. Efficient Sparse Coding Algorithms, NIPS '06
KL-regularized Sparse Approximation

Unnormalized KL
Reconstruction Loss
Since this is continuous and differentiable, at the minimum

we have:
Differentiating both sides with respect to B, and solving for

the kth row we get:
Preliminary Results
L1 sparse coding
KL improves classification performance
Backpropagation further improves performance
KL sparse coding with backpropagation
Main Points
Modular, gradient based systems are an important
design tool for large scale learning systems
Need new tools to include a family of modules that
have important properties
Presented a generalized backpropagation
technique that
Allow priors that encourage, e.g. sparsity (KL prior): uses
mirror descent to modify weights
Uses implicit differentiation to compute gradients through
modules (e.g. sparse approximation) that internally solve
optimization
Demonstrated work-in-progress on building deep,

sparse coders using generalized backpropagation
Acknowledgements
The Authors would like to thank the UPI
team, especially Cris Dima, David Silver,
and Carl Wellington
DARPA and the Army Research Office for
supporting this work through the UPI
program and the NDSEG fellowship

Generalizing Backpropagation To Include Sparse Coding: David M. Bradley and Drew Bagnell

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Generalizing Backpropagation To Include Sparse Coding: David M. Bradley and Drew Bagnell

Transféré par

Droits d'auteur :

Formats disponibles

Generalizing Backpropagation to

Include Sparse Coding

Modules internally solve continuous optimization

Modules can be jointly optimized by a generalization of

Deep Modular Learning systems

Recently shown to be powerful learning machines

Greedy module-wise training is useful for

Gradient methods allow the incorporation of

Mobile Robot Perception

Lots of unlabeled data

Perception Problem: Scene labeling

Cost for each

Object Classification Cost

Observed Wheel Heights

Standard Backpropagation assumes L2 prior

Gradient descent with convex loss functions:

Which bounds the true regret

Useful if many features are irrelevant

multi-task priors (encourage sharing between

With KL prior modules

General Mirror Descent

w3t 1 ( w3t ) c( w3t )

True for many real-world signals

Produces effects observed in biology

Visual Representation is Sparse (JPEG)

MNIST Digits Dataset

60,000 28x28 pixel handwritten digits

Separate 10,000 digit test set

Basis Coefficients (w(i))

Minimize over W and B

L1 Regularized Sparse Coding

Lee et al. Efficient Sparse Coding Algorithms, NIPS '06

KL-regularized Sparse Approximation

Since this is continuous and differentiable, at the minimum

Differentiating both sides with respect to B, and solving for

KL improves classification performance

Backpropagation further improves performance

KL sparse coding with backpropagation

Demonstrated work-in-progress on building deep,

Vous aimerez peut-être aussi