Académique Documents
Professionnel Documents
Culture Documents
Machine
Supervised Learning
Training Set
Learning Algorithm
Input h Predicted
Output
The hypothesis Representation
1
h(x)= (logistic regression)
1+
If h(x)=0.7 that means there is 70% chance that the input x is malicious
The hypothesis representation
And the sigmoid activation function looks like this
The hypothesis representation
We have an example where y = 1Then we hope h(x) is close to 1
With h(x) close to 1, (T x) must be much larger than 0
Decision Boundary
h(x) = g(0 + 1x1 + 2x2)
for example
0 = -3
1 = 1
2 = 1
So we get
SVM COST FUNCTION
For SVMs the convention is to use a different parameter called C
So do CA + B
If C were equal to 1/ then the two functions (CA + B and A + B) would give the same value
So, our overall equation is
Large Margin Intuition
If you have a positive example, you only really need z to be greater or equal to 0
If this is the case then you predict 1
SVM wants a bit more than that - doesn't want to *just* get it right, but have the value be quite a bit bigger than
zero
Throws in an extra safety margin factor
Large Margin Intuition
Sometimes people refer to SVM as large margin classifiers
Consider a case where we set C to be hugeC = 100,000
So considering we're minimizing CA + B
If C is huge we're going to pick an A value so that A is equal to zero
What is the optimization problem here - how do we make A = 0?
Making A = 0
If y = 1
Then to make our "A" term 0 need to find a value of so (T x) is greater than or equal to 1
Similarly, if y = 0
Then we want to make "A" = 0 then we need to find a value of so (T x) is equal to or less than -1
So - if we think of our optimization problem a way to ensure that this first "A"
term is equal to 0, we re-factor our optimization problem into just minimizing the
"B" (regularization) term, because
When A = 0 --> A*C = 0
Large Margin Classifier
Min 12 2
=1
S.T.
T i 1 = 1
T i 1 = 0
Large Mcargin Intuition
The blak line has a larger minimum distance (margin) from any of the training examples
Large Margin Classifier
Large Margin Intuition
SVM is more sophisticated than the large margin might look
If you were just using large margin then SVM would be
very sensitive to outliers
if C is reasonably small, or a not too large, then you stick with the
black decision boundary
Pros
Do not suffer from curse of dimensionality
It helps to avoid overfitting
It works really well with clear margin of separation
It is effective in cases where number of dimensions is greater than the
number of samples
It uses a subset of training points in the decision function, so it is also
memory efficient
Cons
It does not perform well , when we have large dataset because
required training time is higher
It also does not perform very well, when the data set has more noise.
Applications of SVM
Time series analysis
Text Categorization
Machine Vision
Classification
Anomaly Detection
Regression
Thank You