Homework 8 Notes

Question 1
The image of Charlie contains 500x500 pixels. Each of these pixels is coded
with 3 coordinates between 0 and 255 representing the color of red, green,
and blue. The size of the image will be dependent on the number of bits used
to store the color of the pixel, multiplied by the number of pixels.
Eight bits are required to record a 3 character string. The number of bits
required to store Charlie will be: m p b = (500 500) (8x3) = 6, 000, 000 For
this problem, I passed the asdata tensor to the provided function maptocluster.m with a q k = 3 2 matrix. When I did this maptocluster.m outputs
a 500x500 matrix with points mapped to 2 different cluster centers. The new
size of the clustered image is then :500 500 = 250, 000 bits
Question 2:
Convex function - a function is convex if a line segment between any two

points lies above the graph.
With the problem formulated with so few centers, we can afford to enter hypothetical values for one center and the two data points x1 , x2 to make D and R
function of the center c1 .
c2 = 1
x2 = 1
x1 = 0
In this case our function D becomes:
D(c1 ) = ||x1 c1 ||2 + ||x2 c2 ||2
= min{||0 cj ||2 } + {||1 1||2 }
= min{c2j }
What this function represents is the distance we have between x1 = 0 and
cj . While c1 is within 1 0 1 the point will be mapped to c1 no matter
what, and the total distance will be a quadratic function.
Say that c1 was outside of this range though, for instance, c1 = 2. In this
case the point would just map straight to the other cluster center c2 . This
would make either function D or R stop changing when |c1 | > 1.
To summarize, the functions has the following behavior:
D and R have quadratic behavior when |c1 | 1
1
D(c1 ), R(c1 ) remain at the same value when |c1 | 1 .

To make this easier to imagine, here is a diagram of D(c1 ):
We know that the definition of a convex function is that it will be one in

which we can pick any two points on the graph, draw a line between them,
and have said line be above the actual graph. In this case, if we pick a value of
c1 between -1 and 1, and a second point outside this region, we get a line which
is below the graph at some points. Therefore, R and D are not convex functions.
(b)
It is obvious from question (a) that the functions D(c1 ), R(c1 ) are not smooth
at c1 = 1. One of the qualities a function must have in order to be differentiable is that it must be smooth. Therefore, D(c1 ) and ; R(c1 ) are not
differentiable when there are two points and two centers.
(c)
Derive a minimization formula for the case where k = 1 and n is arbitrary.
In other words, derive a minimization formula for when we have any number of
data points and only one center.
We will want to pick a cluster c which will minimize the sum squared distance
from all of the points:
D = ni=1 d2i = min||x1 c||2 +min||x2 c||2 +min||x3 c||2 +...+min||xn c||2
If I expand out the above expression I get:

D = min[(x21 2x1 c+c2 )+(x22 2x2 c+c2 )+(x23 2x3 c+c2 )+...+(x2n 2xn c+c2 )]
When I take the derivative this again simplifies to:
(2x1 + 2c 2x2 + 2c 2x3 + 2c + ... 2xn + 2c)
2n c 2(x1 + x2 + x3 + ... + xn )
n c (x1 + x2 + x3 + ... + xn )
Setting this equal to zero in the same manner that I did in part (a):
n c = (x1 + x2 + x3 + ... + xn )
n xi
c = i=1
n
In other words, the center will be the average of the distance of the different points!
(d)
The cluster point xj will be mapped to one of our different centers, lets call it
cj . The point will only effect the placement of cj which is placed so that it is
the average of different points that make up its cluster. With this in mind, we
can say that as we increase the distance of point xj , the respective center will
slowly follow it and move away from the other points mapped to it.
Question 3 and 4 :
poc_q3_q4.m
Use your favorite optimization algorithm to minimize R with l = 2 and the
Euclidean norm. Use Figure 1s data and provide a function to evaluate R. Try
k = 3, 4, 5.
For this problem, I employed the
fminunc
recommended by OLeary. I employed four algorithms:
poc_calculate_distance
to calculate D for the clusters.
poc_calculate_radius
to calculate R as well as the radii of the different clusters.
poc_map_to_cluster
a function which maps the data points to different clusters.
3
poc_my_cluster
to implement the k-means algorithm.
(a)
We are dealing with q = 3 and k = 3, 4, 5. These combined give a total of
k q different variables to take into account.
(b)
Displayed below is a table comparing the computation speed of the differenct
algorithms used for clustering the pixels:
Variables (q*k)
9.0000
12.0000
15.0000
R Minimization (seconds)
17.1933
24.3919
31.1522
D Minimization (seconds)
7.4806
9.4167
11.1050
K Means (seconds)
3.0800
3.0839
3.0388
For greater clarity, displayed below is the graph comparing the performance
of these three different method:
The algorithm optimizing the radius is the most time intensive. It tends to
take about 3 times longer than the same algorithm optimizing distance instead,
and is far slower than the k-means algorithm. It should also be noted that the
run time for the optimization algorithms seems to increase at a linear pace,
albeit the optimization algorithm for R seems to have a much higher slope than
for D.
What distinguishes the k-means algorithm (aside from the improved results to
be discussed in the next section) is that there appears to be no increase in the
time it takes to finish processing the image. The run-time for it stays roughly
4
the same as the number of centers increases.
(c)
It is obvious that there is a big difference between the image quality of these
algorithms when compared to the quality of the original image. We are categorizing each of the pixels into k different categories each summed up as a
combination of coordinates. In other words instead of just using 0 to 255 for
the color of each pixel, we just map a pixel to a center which has a predetermined value for RGB. The more of these predetermined values of RGB we have,
the better choice we can make when we approximate the color of each pixel by
mapping it to a cluster.
From the above image, it looks like minimizing in terms of R produced the
best quality image, but this also took the most time. D does a slightly poorer
job, but takes much less time, and k-means tends too much towards the blue
scale. I should note that the way that we select the initial placement of the
clusters probably is a huge factor in this as my examples are very different from
OLearys results despite similar methods. I think that this is due to my initialization of the centers by creating an event spaced vector of length q*k, and the
subsequent reshaping of it into the center coordinates.
d.
How might a good value of k be determined experimentally?
One possibility is to keep track of the size of the radius or distance of the
cluster algorithm for different values of k. We could continue to run a loop until
the image either is equal is equal to the size of the original image, or meets
some criteria set for the user which indicates the changes to the radii are small
enough as to indicate only small improvements in adding more cluster centers.
We could use also use the 2-norm of the change for each pixel in order to
measure how different the clustered image will be from the original.
||xj xcj ||
Where xcj represents pixel js placement after being clustered.
Furthermore, we might incorporate this into a measurement of change in the
image to the size of the image by constructing a ratio:
c
n
j=1 ||xj xj ||2
Clustered image size , and then evaluate this ratio over a number of different k
values.
Question 5 and 6 :
poc_question5.m
First, note the placement of the initial centers that we are feeding to the algorithm: They have one coordinate aligned with 1 (or -1), and the other coordinate
varies from -1 to 1. The result is the graph below:
Displayed below are the graphical results of the use of the function pocmycluster.m with k=2 to 4:
When k = 2, we get what looks like a reasonable clustering of the data

into two groups: -1 and 1.
When k = 3, we get three groupings:

(1) (1, xj < 0)
(2) (1, xj > 0)
(3) (1, xj )
This seems less reasonable- we have an arbitrary division of one side into
two sub-parts, but the other side is still 1 group.
k = 4 we get the two sides split into their own respective halves based on
whether xj is greater than or less than 0.
As I mentioned above, we start off with a reasonable grouping of the different
data points between -1 and 1. As I add more cluster types though, the algorithm groups them more arbitrarily. I would say that 4 does make more sense
than 3 however as it groups the four data points into (1, > 0) and (, < 0).
Three clusters does this with one of the categories, but fails to apply the same
criterion to the other set of clusters. It seems like what we would want in the
case of k=3 is a division of centers based away from the corners and closer to
the center of the actual graph shown above. As it turns out, in question 6, we
get behavior that is more similar to this.
Graph of Radii for different cluster sizes
k=2
2.0000
1.7778
k=3
2.0000
1.7778
0.8889
k=4
0.8889
0.8889
0.8889
0.8889
For the above system, we see that we do get progessively smaller radii for the
different clusters we have chosen. One thing I can gather from this is that the
decrease in the size of the radius is not necessarily an indicator of a meaningful
or optimal number of different categories.
Part 2:
Displayed below is a picture of the plots of the clusters under the center initialization mentioned in part b of the question:
The difference between this initialization and the one from part a is obvious. In
the case of k = 2, we get a centers which appear to be at opposite ends of the
diagonal of the graph. When we move up to k = 3, we see that we still get a
somewhat arbitrary splitting of the groups, but this one seems less so. When
k = 4, we get a similar result to (a).
k=2
k=3
k=4
1.0000 1.7958 0.5185
1.3333 1.0444 1.3333
1.1746 1.8889
0.3333
There are two things I noticed with this. First, when k=4, the centers have completely different radii whereas with part 1 of this question, k=4 produces four
centers with identical radii. Also, the second initialization has slightly higher
average radii (in the case of k=4, it has an average radii of 1.0185 versus part 1s
.8889) and we can therefore say that it is less accurate for grouping the different
data points to relevant clusters.
Observations:
1.
A fundamental part of clustering is the fact that we are grouping the data to
different labels, and then adjusting the labels in order to fit the data better.
It must be kept in mind though that we may initialize too many of the clusters (labels) and then end up with less meaningful results. In the above case,
we start off with a data set that is located either in combinations of 1 and -1.
We then feed the cluster algorithm a list of centers consisting of these points.
In the second case, the data points are grouped nearly events between -1, and
one. However, when we increase the number of categories, we end up with data
points occupying different centers, but who also have similar placement in terms
9
of one another. For (5), this occurs when k is greater than 2.
Question 6:
poc_question6.m
Displayed below are graphs showing the clustering of the data after multiplying
the data points by 100:
The groupings for the data appears much less arbitrary. We see that for k = 3,
the algorithm now produces three groups - one clustered along the bottom of
the graph, one centered at the upper left-hand area, and one centered at the
upper right-hands area. When k = 4, we get a result similar to k = 4 without
the multiplication by 100.
10
Displayed below is a table showing the radii created by the algorithm for different values of k:
k=2
k=3
k=4
44.4557 44.4557 44.4444
44.4557 44.4444 44.4444
44.4444 44.4444
44.4444
Scaling the second part of question 5 by 100 produces the below set of radii:
k=2
50.0225
51.8615
k=3
44.4457
66.6742
47.2460
k=4
27.7823
66.6742
33.3671
1.0000
This final change to the coordinates actually produces a better average radius
than part 1 of this question. For k=4,the average radius size becomes 32.2059
as opposed to 44.4444 from part 1.
Based on the results shown in the graphs, and the results that I have gotten
for the radii, I can say that this scaling the image by 100 has produced much
better results than simply leaving the results unchanged. There is not change
for the general clustering in the case of k = 2, but for k = 3, 4 we can see that
the data points are groups into k different sets, spread out from -100 to 100.
Coordinate scaling may help with the classification of these sorts of problems
because of the way that an appropriately chosen scaling may improve the arrangement into categories. For instance, if a particular part of the data set is
very similar, scaling of this section would increase the distance between each of
11
these data points and reduce mistakes made in categorization.
12

Homework 8 Notes

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Homework 8 Notes

Transféré par

Droits d'auteur :

Formats disponibles

Question 1

Convex function - a function is convex if a line segment between any two

D(c1 ), R(c1 ) remain at the same value when |c1 | 1 .

We know that the definition of a convex function is that it will be one in

If I expand out the above expression I get:

the same as the number of centers increases.

When k = 2, we get what looks like a reasonable clustering of the data

When k = 3, we get three groupings:

of one another. For (5), this occurs when k is greater than 2.

these data points and reduce mistakes made in categorization.

Vous aimerez peut-être aussi