Académique Documents
Professionnel Documents
Culture Documents
Srihari
Machine Learning
Srihari
3. Early stopping
Invariances
Tangent propagation
Machine Learning
Srihari
What is Regularization?
In machine learning (also, statistics and inverse problems):
introducing additional information to prevent over-tting
(or solve ill-posed problem)
Machine Learning
Srihari
Possible approach is to get maximum likelihood estimate of M for balance between under-tting and over-tting
Machine Learning
Srihari
Sinusoidal Regression Problem Two layer network trained on 10 data points M = 1, 3 and 10 hidden units Minimizing sum-of-squared error function Using conjugate gradient descent Generalization error is not a simple function of M due to presence of local minima in error function
Machine Learning
Srihari
Machine Learning
Srihari
Need to control network complexity to avoid over-tting
Choose a relatively large M and control complexity by addition of regularization term
Simplest regularizer is weight decay
T
E (w) = E(w) + 2 w w
Effective model complexity determined by choice of regularization coefcient
Regularizer is equivalent to a zero mean Gaussian prior over weight vector w
Machine Learning
Srihari
ji i
j0
Machine Learning
Srihari
x i x i = ax i + b
w ji w ji =
1 b w ji and w j 0 = w j 0 w ji a a i
Machine Learning
Srihari
If we train one network using original data
Another for which input and/or target variables are transformed by one of the linear transformations
Then they should only differ by the weights as given
Regularizer should have this property
Otherwise it arbitrarily favors one solution over another
Simple weight decay does not have this property
Treats all weights and biases on an equal footing
10
Machine Learning
Srihari
w W2
Machine Learning
Srihari
functions
Priors are governed by four hyper-parameters 1b, precision of Gaussian of rst layer bias 1w, ..of rst layer weights 2b,. ..of second layer bias 2w, . of second layer weights
input
12
Machine Learning
Srihari
3. Early Stopping
Alternative to regularization
In controlling complexity
Training Set Error
Iteration Step
13
Machine Learning
Srihari
14
Machine Learning
Srihari
Invariances
Quite often in classication problems there is a need
Predictions should be invariant under one or more transformations of input variable
Example: handwritten digit should be assigned same classication irrespective of position in the image (translation) and size (scale)
Such transformations produce signicant changes in raw data, yet need to produce same output from classier
Examples: pixel intensities,
in speech recognition, nonlinear time warping along time axis
15
Machine Learning
Srihari
Impractical
Number of different samples grows exponentially with number of transformations
16
Machine Learning
Srihari
Approaches to Invariance
1. Training set augmented by transforming training patterns according to desired invariances
E.g., shift each image into different positions
2. Add regularization term to error function that penalizes changes in model output when input is transformed.
Leads to tangent propagation.
3. Invariance built into pre-processing by extracting features invariant to required transformations
4. Build invariance property into structure of neural network (convolutional networks)
Local receptive elds and shared weights
17
Machine Learning
Srihari
Synthetically warp each handwritten digit image before presentation to model Easy to implement but computationally costly
Random displacements x,y (0,1) at each pixel Then smooth by convolutions of width 0.01, 30 and 60 resply
18
Machine Learning
Srihari
Two-dimensional input space showing effect of continuous transformation with single parameter Vector resulting from transformation is s(x,) so that s(x,0)=x Tangent to curve M is given by
n =
s(x n , ) = 0
19
Machine Learning
Srihari
20
Machine Learning
Srihari
Tangent Distance used to build invariance properties with nearest-neighbor classiers is a related technique
21
Machine Learning
Srihari
22
Machine Learning
Srihari
Instead of treating image as input to a fully connected network , local receptive elds are used
23
Machine Learning
Srihari
24