Vous êtes sur la page 1sur 6

TRANSCRIPT MY CLASS NOTES

Here is an example of a very popular


classification algorithm, the k nearest
neighbor. We will explore this algorithm with
the help of an example. Let’s say we have
observation on 7 data points and have 2
features x1 and x2 for each of these 7 points.
Our response is a binary variable which takes
on the color either green or blue. We can plot
a scatterplot of the given data points with x1
on the x axis and x2 on the y axis.

Color each point according the color in the


response. So if the response is green color the
point green. So given this scatterplot, what if
you are given a new red point on the
scatterplot and choose whose color it is.
Which color would you like to assign to the red
point? Green or Blue? By now you should have
an idea of how linear classifiers work.

If you use a linear classifier on this dataset to


predict the color for the new point, you would
try to find the line that best separates the
green dots from the blue dots. If the red
point falls on the right side of the line, you
would mark it as blue. If it falls on the left
side of the line you would mark it as green.

One additional point that we have not


mentioned explicitly is there might be cases
where perfect separation using a straight line
or a decision plane may not be possible. All
the methods still will work . In that case, all

1|Page
© Jigsaw Academy Education Pvt Ltd
TRANSCRIPT MY CLASS NOTES

possible decision planes will make mistakes.


So you choose the line that makes the
minimum mistake. We now take a diversion
from the approach of trying to fit decision
planes and introduce another family of
learning algorithms.

Before moving on further into this perticular


algorithm, let’s take a look at an important
geometrical concept, the concept of
distances. Distances are an important part of
machine learning algorithms. Having a
measure of distance helps you figure out
similar observations in the data, identify
outlying observations and things like that.
They also are particularly important role in
unsupervised learning methods.

The measure of a distance helps the computer


in figuring out how far are 2 data points in a
very high dimensional setting. For example, in
the 2-dimensional case, how far apart are the
points say 0, 0 and 1, 2 are available in a
scatterplot. Can you agree that 0,0 and 1,2
are much closer than 0,0 and 100, -100? Yes,
you would but how would you quantify this. In
mathematical terms, the 2-dimensional space
charted out in a scatterplot is also called the
Euclidean space or a Euclidean plane.

Euclidean distance is a geometric concept of


distance between 2 points on an Euclidean

2|Page
© Jigsaw Academy Education Pvt Ltd
TRANSCRIPT MY CLASS NOTES
plane. Mathematical formula is also quite
simple to calculate. In a 2-dimensional plane,
let’s say you have 2 points x1, y1 and another
point x2, y2. Take the difference of the x
values and the y values separately and square
each of them and add up. This gives you the
squared distance. Take the square root of the
squared distance to obtain the Euclidean
distance between the 2 points. So here in the
example you see the Euclidean distance
between the points 1, 2 and 3, 1
approximately is around 2.23.

The Euclidean distance are not limited to


simple two dimensional settings. They can
generalize to cases where the number of
dimensions is more than 2 also. In general,
distance measure is not limited to Euclidean
distances. There are many different distance
measures defined on the Euclidean Plane for
example there is Manhattan distance, there is
Murkowski distance which are again the
different measurements of distance in a
Euclidean plane. In fact, Euclidean distances
work only when you are working on the
Euclidean plane. If you are not in the
Euclidean setting you need to work with more
complicated measures.

Equipped with this concept of distance we


come back to the K nearest neighbor algorithm
and the idea now is very simple. For the new
red point, you look at the nearest neighbors of

3|Page
© Jigsaw Academy Education Pvt Ltd
TRANSCRIPT MY CLASS NOTES
this red point in the data set that you already
have. How do you compute whether a point is
near to a red point or not? You calculate the
distance between the point, which is nothing
but the Euclidean distance. From the red
point, calculate the Euclidean distances to
each of the 7 points in the given data.

Arrange the data points in ascending order of


distances. And the lowest distance is first.
Take the first K points in an order and take a
majority vote of it. For example, if say that K
is equal to 3, and after ordering the data in
terms of closest first and we will look at the
first 3 responses and if the responses are like
green, green and red. The majority vote in
this case is Green. So go ahead and you’re
your color of the red point as Green. Now the
exercise for you is to complete this problem
assuming K= 3.

What immediately stands out in this problem


is what is k? How do you know k’s nearest
neighbor? The value of k is unknown. Many
machine learning algorithms are depended on
parameters like this which cannot be
estimated directly from the data. But all the
key results of this algorithm, is based on the
fixed value of the parameter.

These are typically called tuning parameters


or hyper parameters. To build a good model, it
is sometimes very important to find out a good
value for a tuning parameter. An algorithm
can have one or multiple tuning parameters.

4|Page
© Jigsaw Academy Education Pvt Ltd
TRANSCRIPT MY CLASS NOTES
Hyper parameter optimization is a very big sub
discipline in machine learning. And in this
course we will only take a look at one of the
many ways of hyper-parameter optimization.

In order to select a tuning parameter for the k


nearest neighbor, define a candidate list of
values like k equal to 3, 5 and 7. These are
some popular choices. This is also one of the
many cases where splitting your data into
train test and validate comes in really handy.
For each value of K, run the nearest neighbor,
calculate the error on the testing split. Select
that K for which your misclassification rate is
the lowest.

This way of determining a tuning parameter is


called grid search, which is simply a search
through manually specified subset of the hyper
parameter space of a learning algorithm. Some
other popular ways to determine the tuning
parameters are like randomized search,
Bayesian optimization and even gradient based
optimization is available.

So let’s get back to our end to end workflow.


At the start read all of the data into memory.
Split the data randomly into train test and
validate. Make sure that the train split gets
the majority of your data. Perform extensive
exploration on your training datasets. You can
omit, add, or create new features in the
training set. Prepare a list of candidate
models. These can be different models or the

5|Page
© Jigsaw Academy Education Pvt Ltd
TRANSCRIPT MY CLASS NOTES
same models with different tuning parameters
for the purposes of grid search.

Remember that 3 nearest neighbor is a


different model from a 5 nearest neighbor
although they are from the same family of
models. Once the training is done, evaluate
each of these models on the testing split. Fix a
performance measure beforehand. It is
recommended to not alter your performance
measure after seeing the results on the testing
split.

Select the model that performs best on the


testing split and report it as your final model.
Calculate another round of performance
measure for this using your validate split. This
measure is the proxy for how well your model
will perform when deployed over new and
unseen data points. If this performance on
validate is not up to the mark or it decays very
fast, you might need to repeat all or some of
the processes. Most of the professionals
working in this field rarely get a model right at
the first step. This is no harm. This can often
be a long and tedious process.

6|Page
© Jigsaw Academy Education Pvt Ltd

Vous aimerez peut-être aussi