Vous êtes sur la page 1sur 2

CL688: Articial Intelligence in Process Engineering Tutorial Set 2

Linear and nonlinear classiers


1. Generate four 2-dimensional data sets x
i
, i = 1, . . . , 4 each containing data vectors from two
classes. In all x
i
s the rst class (denoted -1) contains 100 vectors uniformly distributed in the
square [0, 2] [0, 2]. The second class (denoted +1) contains another 100 vectors uniformly dis-
tributed in the squares [3, 5] [3, 5], [2, 4] [2, 4], [0, 2] [2, 4], and [1, 3] [1, 3] for x
1
, x
2
, x
3
, and
x
4
, respectively. Each data vector is augmented with a third coordinate that equals 1. Perform
the following steps:
(a) Plot the four data sets and notice that as we move from x
1
to x
3
the classes approach each
other but remain linearly separable. In x
4
the two classes overlap.
(b) Run the perceptron algorithm for each x
i
, i = 1, . . . , 4, with learning rate parameters 0.01
and 0.05 and initial estimate for the parameter vector [1, 1, 0.5]

.
(c) Run the perceptron algorithm for x
3
with learning rate 0.05 using as initial estimates for w
[1, 1, 0.5]

and [1, 1, 0.5]

.
(d) Comment on the results.
2. (a) Generate a set X
1
of N
1
= 200 data vectors, such that the rst 100 vectors stem from class
1
which is modeled by the Gaussian distribution with mean m
1
= [0, 0, 0, 0, 0]

. The rest stem


from class
2
, which is modeled by the Gaussian distribution with mean m
2
= [1, 1, 1, 1, 1]

.
Both distributions share the following covariance matrix:
S =

0.9 0.3 0.2 0.05 0.02


0.3 0.8 0.1 0.2 0.05
0.2 0.1 0.7 0.015 0.07
0.05 0.2 0.015 0.8 0.01
0.02 0.05 0.07 0.01 0.75

Generate an additional data set X


2
of N
2
= 200 data vectors, following the prescription used
for X
1
. Apply the optimal Bayes classier on X
2
and compute the classication error.
(b) Augment each feature vector in X
1
and X
2
by adding a 1 as the last coordinate. Dene the
class labels as -1 and +1 for the two classes, respectively. Using X
1
as the training set, obtain
the minimum squared-error (LS) estimate w. Use this estimate to classify the vectors of X
2
according to the inequality
w

x > (<)0
Compute the probability of error. Compare the results with those obtained in step 1.
(c) Repeat the previous steps, rst with X
2
replaced by a set X
3
containing N
3
= 10, 000 data
vectors and then with a set X
4
containing N
4
= 100, 000 data vectors. Both X
3
and X
4
are
generated using the prescription adopted for X
1
. Comment on the results.
3. In the 2-dimensional space, we are given two equiprobable classes, which follow Gaussian dis-
tributions with means m
1
= [0, 0]

and m
2
= [1.2, 1.2]

and covariance matrices S


1
= S
2
= 0.2I,
where I is the 2 2 identity matrix.
(a) Generate and plot a data set X
1
containing 200 points from each class (400 points total), to
be used for training (use the value of 50 as seed for the built-in MATLAB randn function).
Generate another data set X
2
containing 200 points from each class, to be used for testing
(use the value of 100 as seed for the built-in MATLAB randn function).
(b) Based on X
1
, generate SVM classiers that separate the two classes, using C = 0.1, 0.2, 0.5, 1, 2, 20.
Set tol = 0.001.
i. Compute the classication error on the training and test sets.
ii. Count the support vectors.
iii. Compute the margin (2/||w||).
iv. Plot the classier as well as the margin lines.
v. Are your results reproducible? (What would happen if you changed your randn seed?
Try the classication a few times).
c SBN, 2014, IIT-Bombay 1
CL688: Articial Intelligence in Process Engineering Tutorial Set 2
4. (a) Generate a 2-dimensional data set X
1
(training set) as follows. Select N = 150 data points
in the 2-dimensional [5, 5] [5, 5] region according to the uniform distribution (set the
seed for the rand function equal to 0). Assign a point x = [x(1), x(2)]

to the class +1 (1)


according to the rule 0.05(x
3
(1) + x
2
(1) + x(1) +1) > (<)x(2). (Clearly, the two classes are
nonlinearly separable; in fact, they are separated by the curve associated with the cubic. Plot
the points in X
1
. Generate an additional data set X
2
(test set) using the same prescription
as for X
1
(set the seed for the rand function equal to 100).
(b) Design a linear SVM classier. Compute the training and test errors and count the number
of support vectors. Use a tolerance of 0.001.
(c) Generate a nonlinear SVM classier using the radial basis kernel functions (Eq. 6.100 of
handout) for = 0.1 and 2. Use a tolerance of 0.001. Compute the training and test error
rates and count the number of support vectors. Plot the decision regions designed by the
classier.
(d) Repeat step 3 using the polynomial kernel functions (x

y +)
n
for (n, ) = (5, 0) and (3, 1).
Draw conclusions.
(e) Design the SVM classiers using the radial basis kernel function with = 1.5 and using the
polynomial kernel function with n = 3 and = 1.
(f) Comment on the reproducibility of your answers.
5. (a) Generate a 2-dimensional data set X
1
(training set) as follows. Consider the nine squares
[i, i + 1] [j, j + 1], i = 0, 1, 2, j = 0, 1, 2 and draw randomly from each one 30 uniformly
distributed points. The points that stem from squares for which i + j is even (odd) are
assigned to class +1 (-1) (like the white and black squares on a chessboard). Plot the data
set and generate an additional data set X
2
(test set) following the prescription used for X
1
(as in the previous problem, set the seed for rand at 0 for X
1
and 100 for X
2
).
(b) i. Design a linear SVM classier. Compute the training and test errors and count the
number of support vectors.
ii. Employ the previous algorithm to design nonlinear SVM classiers, with radial basis
kernel functions. Use = 1, 1.5, 2, 5. Compute the training and test errors and count
the number of support vectors.
iii. Repeat for polynomial kernel functions, using n = 3, 5 and = 1.
(c) Draw your conclusions regarding the efcacy of the SVM, and the reproducibility of your
results.
6. (a) i. Generate a data set X
1
consisting of 400 2-dimensional vectors that stem from two
classes. The rst 200 stem from the rst class, which is modeled by the Gaussian
distribution with mean m
1
= [8, 8]

; the rest stem from the second class, modeled


by the Gaussian distribution with mean m
2
= [8, 8]

. Both distributions share the


covariance matrix
S =

0.3 1.5
1.5 9.0

ii. Perform PCA on X


1
and compute the percentage of the total amount of variance ex-
plained by each component.
iii. Project the vectors of X
1
along the direction of the rst principal component and plot
the data set X
1
and its projection to the rst principal component. Comment on the
results.
(b) Repeat on data set X
2
, which is generated as X
1
but with m
1
= [1, 0]

and m
2
= [1, 0]

.
(c) Compare the results obtained and draw conclusions.
(d) Apply Fishers linear discriminant analysis on the data set X
2
and compare the results
obtained which those from the analysis above.
c SBN, 2014, IIT-Bombay 2

Vous aimerez peut-être aussi