Vous êtes sur la page 1sur 65

UNIVERSITY OF PUERTO RICO

MAYAGEZ CAMPUS

DEPARTMEN OF INDUSTRIAL ENGINEERING

ININ 6048 Final Project

Wall Following Robot Path Planning Problem:

A Statistical Learning Approach

Presented by:
Rafael Batista 502-15-0623

Presented to:
Saylisse Davila Padilla, PhD.

Date:
15 December 2016
Contents
I. ABSTRACT..........................................................................................................................................................1
II. INTRODUCTION................................................................................................................................................2
III. LITERATURE REVIEW....................................................................................................................................3
IV. METHODS.......................................................................................................................................................5
A. DEVELOPED APPROACH......................................................................................................................................5
B. FLOW CHART REPRESENTATION OF THE METHODOLOGY.......................................................................................7
C. METHODOLOGY EXPLANATION.............................................................................................................................9
V. RESULTS............................................................................................................................................................11
A. PREPROCESSING RESULTS..................................................................................................................................11
B. TUNING RESULTS...............................................................................................................................................15
C. MODEL TRAINING, VALIDATION AND SELECTION.................................................................................................18
VI. CONCLUSION..............................................................................................................................................24
Figures
Figure 1. SCITOS G5 robot, taken from (MetraLabs GmbH, 2011)............................................................5
Figure 2. Sketch of the robots navigation environment, taken from (Freire et al., 2009)............................6
Figure 3. Fist part of the methodology flow chart........................................................................................7
Figure 4. Second part of the methodology flow chart..................................................................................8
Figure 5. Analysis of multivariate normality for the dataset......................................................................11
Figure 6. Graphical analysis for outlier detection using the R package CORElearn...............................12
Figure 7. Scree plot of the principal components and variance explained.................................................12
Figure 8. Fist linear discriminant loading histogram.................................................................................13
Figure 9. Second linear discriminant loading histogram............................................................................13
Figure 9. Third linear discriminant loading histogram...............................................................................14
Figure 10. Tuning RF sample size per class for LDA loadings..................................................................15
Figure 11. Tuning RF sample size per class for PCA rotated dataset.........................................................16
Figure 12. Tuning SVM gamma and cost with radial kernel for LDA loading dataset...............................16
Figure 12. Tuning SVM gamma and cost with radial kernel for LDA loading dataset...............................17
Figure 12. Tuning SVM gamma and cost with radial kernel for LDA loading dataset...............................17
Figure 13. One standard error rules graphic for KNN and PCA...............................................................18
Figure 14. One standard error rules graphic for RF and PCA...................................................................19
Figure 15. One standard error rules graphic for SVM and PCA...............................................................19
Figure 16. Proposed arrangement for stacked learner................................................................................20
Figure 17. One standard error rules graphic for stacked learner for PCA.................................................21
Figure 18. One standard error rules graphic for kNN and LDA................................................................21
Figure 19. One standard error rules graphic for RF and LDA..................................................................22
Figure 20. One standard error rules graphic for SVM and LDA...............................................................22
Figure 21. One standard error rules graphic for SVM and LDA...............................................................23

Tables
Table I. Dataset Characteristics...................................................................................................................6
Table II. Variability explained by principal components............................................................................14
Table III. SVM tuning results....................................................................................................................17
Table IV. Constructed dataset for stacked learner......................................................................................20
I. Abstract

Autonomous robot navigation has become an important research subject. This technology offers
the opportunity of adding value to multiple industrial processes, as well as, enhancing our quality
of life. This work presents a machine learning approach for implementing the autonomous
navigation system for a wall following robot. For this purpose, the subsequent machine learning
algorithms are put under test: k-neirest neighbor (kNN), random forest (RF), and support vector
machine (SVM). Performance evaluation by means of cross validation has been used for
selecting the best models for each case. The selected dataset for this work is composed by the
measurements of twenty four ultrasonic distance sensors and the desired response for each one of
the observations. Furthermore, considerations for data preprocessing (outlier detection,
collinearity reduction, and dimensionality reduction) and machine learning algorithms tuning has
been taken into account. Linear discriminant analysis (LDA) and principal components analysis
(PCA) are applied to our dataset, as preprocessing techniques, and their performance for our
application is compared. Additionally, this work proposes the implementation of stacked
machine learner for improving the classification process. Our results show how the best
performance for a single classifier, with both PCA and LDA preprocessing, was obtained by
using random forest, and how the stacked learner achieve the best performance for all cases.

II. Introduction
1
Navigation within a closed environment remains an open issue in the design of autonomous
robots. Path planning is a fundamental part of robot navigation process, the robot must take a
decision to follow a path from the assessment of the measurements provided by its location
sensors. One of the first approach for the solution of this problem was the implementation of rule
based algorithms, in what is known as reactive planning (Otte, 2015), this required the creation
and adjustment of a set of rules using, as a reference, the expected value of the measurements
obtained from the robots sensors, creating a finite state machine that will model the desired
robot behavior. Each state would represent the action the robot will take and the transition
between the states would depend on the tuning made by the system designer.
Nevertheless, this type of solution requires a constant process of adjustment until the desired
behavior is obtained, depending on the expert knowledge from the system designer. For this
reason is of interest, for the implementation of the path planning process, the study of machine
learning techniques as substitute or complement of the aforementioned algorithms. This open the
possibility of adding robustness and diminish configuration times in the creation of the robots
navigation system.
There are many challenges in the creation of a robust path following process. Dynamically
adjusting to changes in the environment and the necessity of real time computation are open
research topics in the creation of path planning algorithms. Machine learning techniques could
help improve the performance and reduce the computational time of traditional path planning
algorithms (Pol & Murugan, 2015).
This work proposes the study of path planning with machine learning techniques for a wall
following robot using a dataset composed by twenty four measurement from ultrasound distance
sensors (UDS)(Freire, A. Veloso, M. Barreto, 2010). These UDS are placed around the robot
body and the obtained measures from the sensors are in meters. This problem is defined as a
classification problem, in which a training dataset will be used for supervised learning. The
resulting system should be capable of choosing one of the following actions for the robot: (1)
move forward, (2) turn slight right, (3) turn slight left and (4) turn sharp right.
Finally, is the proposal of this work to apply three nonlinear classifier methods and obtain an
acceptable classification error for the chosen dataset. The performance of the selected three
nonlinear classifiers will be evaluated using multiple performance measures and cross-validation
techniques. Furthermore, the required data preprocessing will be made to determine if there is
any correlation between two or more measured variables using principal components analysis
(PCA) (Hastie, Tibshirani, & Friedman, 2009), expecting to reduce the collinearity between the
predictors.

III. Literature Review

Path planning is an open research topic undergoing constant evolution and its the subject of
many works in robotic navigation literature, a trending research line on this subject is the

2
implementation of machine learning algorithms to solve diverse types of path planning problems.
One of these path planning problems is motion planning on dynamic environments, in which the
robot must take a path following decision from the measurements of distance sensors mounted in
its chassis. (Freire, Barreto, Veloso, & Varela, 2009) present a dynamic path planning problem
using a wall following robot, this problem is stated as a non-linearly separable classifier,
therefore, they had to select machine learning algorithms capable of handling this condition. In
our work, the dataset provided by (Freire, A. Veloso, M. Barreto, 2010) through the UCI machine
learning repository will be used.
(Freire et al., 2009) proposed the use of a short term memory mechanism in conjunction with
four standard neural network classifiers, in contrast, our current work proposes the use of the
same dataset but using nonlinear classifiers without a memory mechanism, this implies no
memory allocation will be required for the implementation of the machine learning algorithms.
From this perspective, the proposed methods for handling the nonlinear classification problems
are: k-nearest neighbor (k-NN), random forests and support vector machine.
In the case of k-NN, (Zhou, Wen, & Yang, 2016) proposed the use of this technique to isolated
faults in industrial processes with nonlinear, multimode and non-Gaussian distributed data. The
authors make note that traditional data driven fault isolation methods based on principal
components analysis (PCA) assumed that the data follows a Gaussian distribution with unknown
mean and covariance but, many industrial processes dont follow this assumption. For this
reason, k-NN is proposed as a nonparametric classification method with no assumption of the
data distribution. The k-NN method exploits the distance relations in local samples, normal
samples will be closer to their neighbor and if the distance of the sample is greater than a preset
threshold then the sample is faulty thus detecting a fault in the system. Nevertheless, (Zhou et al.,
2016) comments on the difficulty of isolating the type of fault using k-NN because of the lack of
knowledge from complex industrial processes, therefore a combination of contribution analysis
(CA), based on PCA decomposition, and k-NN is proposed. The authors call the proposed
technique variable contribution by k-NN (VCkNN). (Zhou et al., 2016) reports good fault
isolation results with the hybrid VCkNN method in comparison with CA methods.
In the scope of the problem presented by our work and taking into consideration the presented by
(Zhou et al., 2016), k-NN is as a valid alternative for addressing the nonlinearity of the path
following problem. Furthermore, in our dataset we have a total of twenty four distance
measurement sensors, thus the use of PCA for reducing collinearity of the data obtained from the
sensors is of interest, this would improve the performance of the k-NN algorithm by reducing the
number of predictors in the distance calculations. Our implementation wont require any
predictor contribution analysis, for this reason the VCkNN algorithm is not of interest.

Two important aspect to consider in the implementation of kNN are: The tuning required by the
algorithm (selection of number of neighbors and type of distance measurement) and the

3
computational time as function of the number of observations. These factors create the necessity
of evaluating other machine learning algorithms to solve the dynamical path planning problem.
Due to the aforementioned requirements of the k-NN algorithm, it is important to compare its
performance with another machine learning technique with less requirement for tuning and
computational time, this is the case of random forest (RF). (Belgiu & Drgu, 2016) defines RF as
an ensemble classifier that produces multiple decision trees with using random sampling
processes on the training dataset. This technique offers the possibility of reduce computational
time with just only two tuning factors: Number of random trees and the number of variables to
be selected for the best split when growing the trees (Belgiu & Drgu, 2016). The authors
proposes the use of RF for remote sensing applications, in which the high dimensionality,
nonlinearity and collinearity are characteristics of the obtained measures.
Nevertheless, (Belgiu & Drgu, 2016) make note that there are some important considerations
when choosing the training samples of the RF classifier regarding the balance of the classes and
how representative is the training data in comparison with the target classes. Concerning the path
following application, RF is a good alternative for handling the dimensionality of our dataset and
ensuring low processing time as the implementation of RF is based on decision trees, on the
other hand, the required settings for the RF algorithm must be made to ensure that our training
process takes into consideration the class distribution in our dataset. (Belgiu & Drgu, 2016)
reports good results in their literature review in the implementation of RF in the classification of
hyperspectral data, which has similar characteristics to the data on our chosen dataset.
Another machine learning algorithm with applications in the solution of path planning problems
is Support Vector Machines (SVM). (Morales, Toledo, & Acosta, 2016) proposes the use of SVM
in the generation of trajectories using as an input a geographical map. This map identifies
different types of obstacles and free routes with the purpose of creating the training dataset for
the SVM algorithm. The authors define the problem of multiclass SVM (MSVM) and the use of
nonlinear boundaries for addressing the nonlinearity of the path planning problem. With the
training data, the MSVM algorithm is trained to identify a separating hyperplane between the
obstacles and them, the path with shortest distance is selected by means of an optimization
algorithm. The results show that the method is able to find a smooth path and has comparable
performance to other heuristic graph based techniques.
Finally, for the case of the wall-following robot dataset, SVM implementation seems as a viable
alternative. Nevertheless, the tuning process for SVM is not an easy task (Tian, Gao, & Lu,
2007). Also, is important to note that the comparison of this machine learning algorithms by
means of cross-validation will be an important part of our work.

IV. Methods

This section will discussed the methods that were applied to achieve the proposed objectives of
this project, a brief description of the approach taken for the implementation of this project will
be presented alongside a flowchart showing the complete methodology followed in this project.
Finally, this section ends with a detailed description of this flowchart.
4
A. Developed Approach

The proposed dataset consist of measurements obtained from ultrasonic sensors, these sensors
have the same construction characteristic and for this reason, the scaling of the data is not
necessary. All measurements are presented in meters, and only consideration for outlier (due to
noise) or missing values should be checked. As previously stated, the objective of this project is
to develop a machine learning algorithm capable of choosing the correct moving decision for the
robot in function of the measurements from its sensors. For this purpose, a reference dataset is
presented with the corresponding class labels with the goal of training machine learning
algorithms. Therefore, this project focus on supervised learning method for the path following
decision made by the robot. In Fig. 1 a photo of the robot used to generated this dataset by
(Freire, A. Veloso, M. Barreto, 2010) is shown.

Figure 1. SCITOS G5 robot, taken from (MetraLabs GmbH, 2011)


This dataset has a total of 5,456 observations with twenty four predictors and four possible
classes in the response: (1) move forward, (2) turn slight right, (3) turn slight left and (4) turn
sharp right. Table I shows an overview of the dataset generated for this problem, as we can see
there is imbalance between classes, the range of measurements from the sensor is in the same
scale, there are no missing values, and all the predictors are quantitative.

Table I. Dataset Characteristics


5
Variable Description Type Values Predictor Missing Values

6
UDS1 Position Angle = 180 (0.400,0.401,0.402) YES 0%
,
Quantitative
(5.000,4.866,
4.860)
UDS3 Position Angle = -150 (0.470,0.471,0.492) YES 0%
,
Quantitative
(5.029,5.028,
5.026)
UDS6 Position Angle = -105 (1.114,1.115,1.118), YES 0%
Quantitative (5.005,5.000,
4.980)
UDS9 Position Angle = -60 (0.836,0.854,0.861) YES 0%
,
Quantitative
(5.000,4.956,
4.955)
UDS12 Position Angle = -15 (0.778,0.779,0.780) YES 0%
,
Quantitative
(5.000,4.992,
4.981)
UDS15 Position Angle = 30 (0.495,0.496,0.497) YES 0%
,
Quantitative
(5.000,4.921,
4.920)
UDS18 Position Angle = 75 (0.354,0.355,0.356) YES 0%
,
Quantitative
(5.000,4.608,
4.591)
UDS21 Position Angle = 120 (0.380,0.381,0.382) YES 0%
,
Quantitative
(5.000,4.822,
4.812)
UDS24 Position Angle = 165 (0.377,0.380,0.381) YES 0%
,
Quantitative
(5.000,4.871,
4.865)
Move Forward Path Following 2205 (44.41%) NO 0%
Qualitative
Decision
Sharp Right Turn Path Following 2097 (38.43%) NO 0%
Qualitative
Decision
Slight Left Turn Path Following 328 (6.01%) NO 0%
Qualitative
Decision
Slight Right Turn Path Following 826 (15.13%) NO 0%
Qualitative
Decision

From (Freire et al., 2009) it is stated that the heuristic algorithm used for generating the chosen
dataset, was put under test in the room show in Fig. 2. Four rounds of measurements were taken
by (Freire et al., 2009) to be used in the training process.

7
Figure 2. Sketch of the robots navigation environment, taken from (Freire et al., 2009)

B. Flow chart representation of the methodology

Fig. 3-4 shows the flow chart containing the followed methodology for the implementation of
this project. This methodology is focused in these main parts: loading dataset, preprocessing,
model training, cross validation (CV), model selection, stacked learners, and drawing
conclusions. A comparison between linear discriminant analysis (LDA) and PCA is included in
the methodology.

8
Figure 3. Fist part of the methodology flow chart.

9
Figure 4. Second part of the methodology flow chart.

10
C. Methodology explanation

All the parts of the proposed methodology were implemented in R. The first part of the proposed
methodology is to load the dataset and identified any missing value. If a missing value is found,
the complete row on which the missing value is found is removed. In the case of this work, no
missing data was found, for this reason, the data set was load from the comma separated value
file obtained from (Freire, A. Veloso, M. Barreto, 2010) without any modification. This dataset
was load into a data frame to serve as an input for the next part of our methodology,
preprocessing.
For preprocessing, the first step consisted in testing if the chosen dataset is multivariate normal,
this was done by using the package MVN. This test resulted in a negative outcome, for this
reason robust Mahalanobis distance could not be used for detecting outliers in our dataset.
Therefore, four outlier detection, package CORElearn was used, this approach consisted in
training a RF model with the loaded dataset and, with the measurements from the distance
between the nodes of this RF, determine if a measure is an outlier. This outlier detection package,
provides a graphical approach to determine the threshold for outlier detection. After removing
the observations containing outliers, a new data frame was created as an input for the second part
of the preprocessing stage, collinearity and model input dimensionality reduction.
Is the interest of this work to reduce the collinearity between inputs predictors for the training
process, and consequently, diminishing the number of predictors used in this training. Therefore,
LDA and PCA were considered for this part of the preprocessing of the data, both approaches are
considered because the dataset resulted to be non-multivariate normal, this is a good opportunity
to make an assessment of both techniques in similar cases as the one in this project. Using the
data frame with removed outliers, both PCA and LDA models were train, and the loading for
LDA and the rotated data for PCA were store in separated data frames, ending the preprocessing
stage of out project.
Tuning the machine learning algorithms is an essential part for obtaining the best possible
results, in our case, this tuning process was conducted for RF and SVM. In the case kNN, as a
non-weighted approached was consider, the only tuning parameter is the number of neighbors,
which is taken as a complexity measure in the model training, validation and selection part. For
RF, as an unbalanced dataset is presented, the number of samples extracted from each class was
the selected tuning parameter. For this process an iterative loop was constructed, and the quantity
of sample for the two majoritarian classes were changed (a total of one hundred steps). The two
minority classes were left in their maximum sample value (taking into consideration the data
partitioning done with 10 fold CV). Both LDA and PCA resultant data frames were used, in each
case, using the RF model object, out of bag (OOB) error was used as a measure and plotted to
find the best value for the quantity of samples for the two majoritarian classes.

11
After this, the tuning of SVM was done by using the function tune.svm from the R package
e1071, this is a grid search based algorithm for selecting the best gamma and cost for the SMV.
For the LDA data frame, only a radial kernel was considered (the polynomial kernel presented
convergence problems). In the case of PCA, both radial and polynomial kernel were considered.
This SVM tuning function uses a cross validation approach for obtaining the best gamma and
cost parameters. Finally, the results of this tuning process were used in the model training,
validation and selection process.
The first model training, validation and selection process was done using as an input the dataset
rotated by the PCA loadings. Additionally, it is important to highlight which were the error
measurements used for the CV process of this section: classification error, kappa statistics and
multiclass logarithmic loss. The CV process consisted in 10 folds cross validation and the
obtained errors were stored for each fold (on each iteration the complexity of the model was
changed). In the case of kNN, the chosen complexity measure were the number of neighbors
used in the training of the model, for RF the complexity measure were the number of trees, and
finally, for SVM the complexity measure were the degrees of the polynomial kernel.
After completing the model training and validation for the three selected methods, a desirability
function was constructed for the selection process. Before applying this desirability function, the
necessary error normalization was conducted (between one and zero), furthermore, the errors
were arranged in order to ensure that the lowest the error the higher the value of the desirability
0.5 C .E. 0.4 Kappa 0.1 MLogLoss

function. The selected desirability function was: . The
reasoning behind the selected weights is that we desire a model with low classification error (to
prevent choosing the wrong path following decision and damaging the robot), second high kappa
value (better performance than random path following decision) and low multiclass logarithmic
loss (high precision in the distribution of classes probabilities). After constructing the desirability
function for each model, the one standard error rule (using a graphic approach) was implemented
and the best models were selected.
With these best models, and ensemble o machine learning algorithms was constructed. This
ensemble consisted in using the class probabilities predictions from each of the three best models
and constructing a new dataset, by binding the original response from the PCA rotated dataset.
The central part of this ensemble was a random forest classifier using as an input these new
dataset. For the evaluation of the performance of this meta-learner a CV process with the
previous performance measures was done.
In the case of the dataset generated using LDA loadings, a similar process was performed for
kNN and RF. In the case of SVM, as a radial kernel was used, the chosen complexity measure
was the value of the cost (with lower cost value, lower bias) (Fletcher, 2009). An ensemble was
constructed but only using kNN and RF models. Finally, conclusions were presented with a
discussion of the recommended model for the dataset under test in this project.

12
V. Results

This section presents the obtained results after following the previous methodology. First the
results for data preprocessing are shown, after this, the tuning process for the selected methods,
then the model training, validation and selection, and finally, the evaluation of the stacking/meta-
learning approach.

A Preprocessing Results

After loading the UDS measurements dataset it was tested for missing values and none were
found. Also, as can be seen in table I, all the data is presented in the same scale. Therefore, the
first step of our preprocessing stage is to test multivariate normality for our dataset, for this the
function mardiaTest and hzTest from the R package MVN were applied, obtaining the
results shown in Fig. 5. As we can see, this resulted in a not multivariate normal decision for our
dataset.

Chi-Square Q-Q Plot


Chi-Square Quantile

50
30
10

20 40 60 80

Squared Mahalanobis Distance

Figure 5. Analysis of multivariate normality for the dataset.

13
With this result, an appropriate method for outlier detection was applied. The package
CORElearn has an implementation of outlier detection by means of training a RF model. This
was applied to our dataset with the function CoreModel and rfOutliers, the rfOutliers
function computes the distance between nodes of the RF, then the absolute value of these
distances is taken and the plot shown in Fig. 6 is constructed, the X axis represents the sample
number and the Y axis is the absolute value of the RF node distance.
14

Slight-Right-Turn
Sharp-Right-Turn
12

Move-Forward
Slight-Left-Turn
10
8
6
4
2
0

0 1000 2000 3000 4000 5000


c
Figure 6. Graphical analysis for outlier detection using the R package CORElearn.

From this figure, a threshold of a node distance greater than 4.5 was selected for labeling a value
as an outlier. After removing the identified outliers, the next step was the collinearity and input
dimensionallity reduction process. For this LDA and PCA were implemented, Fig. 7 shows the
scree plot for PCA and Fig. 8-10 show the histogram for LDA.

sensors.pca
4
Variances

3
2
1

1 2 3 4 5 6 7 8 9 10
Figure 7. Scree plot of the principal components and variance explained.

14
0.6
0.3
0.0
-2 0 2 4

group Move-Forw ard

0.6
0.3
0.0

-2 0 2 4

group Sharp-Right-Turn
0.6
0.3
0.0

-2 0 2 4

group Slight-Left-Turn
0.6
0.3
0.0

-2 0 2 4

Figure 8. First linear discriminant loadings histogram.


0.6
0.3
0.0

-4 -2 0 2

group Move-Forw ard


0.6
0.3
0.0

-4 -2 0 2

group Sharp-Right-Turn
0.6
0.3
0.0

-4 -2 0 2

group Slight-Left-Turn
0.6
0.3
0.0

-4 -2 0 2

Figure 9. Second linear discriminant loadings histogram.

15
0.6
0.3
0.0
-4 -2 0 2

group Move-Forw ard

0.6
0.3
0.0

-4 -2 0 2

group Sharp-Right-Turn
0.6
0.3
0.0

-4 -2 0 2

group Slight-Left-Turn
0.6
0.3
0.0

-4 -2 0 2

Figure 9. Third linear discriminant loadings histogram.

For the case of PCA the function prcomp was used, and for LDA the function lda was
used. As can be seen in Table II, with PCA the fist fifteen principal components exaplined the
85% o the variability of the dataset, therefore, fifteen principal components were chosen for
PCA. In the case of LDA, looking at the previous histograms, the combination o the first and
second linear discriminant seems to give good separation for all the classes. Nevertheless, for
LDA analysis, all three discriminants were chosen.

Table II. Variability explained by principal components.


PC PC PC PC PC PC PC PC PC PC1 PC1 PC1 PC1 PC1 PC1
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
Standard
2.15 1.85 1.35 1.26 1.15 1.08 1.04 1.00 0.88 0.85 0.80 0.79 0.77 0.73 0.72
deviation
Proportion of
0.19 0.14 0.08 0.07 0.05 0.05 0.05 0.04 0.03 0.03 0.03 0.03 0.02 0.02 0.02
Variance
Cumulative
0.19 0.33 0.41 0.48 0.53 0.58 0.63 0.67 0.70 0.73 0.76 0.78 0.81 0.83 0.85
Proportion

16
B Tuning Results

The next step in our project was the implementation of the tuning process for the selected
methods. As previously stated, this tuning process was only conducted for RF and SVM. In the
case of RF, OOB error was calculated by using the object created after running the function
randomForest from the R package randomForest. This tuning process was divided in two
parts, first tuning the sample size for LDA and then tuning the sample size for PCA. Fig. 10
shows the results for the two majoritarian classes using the resulting dataset after applying LDA.
The sample size for the two minority classes was set in 250 and 700. The code developed for this
part is presented in Appendix C.

First Majoritarian Class


OOB Error

0.228
0.222

1000 1200 1400 1600 1800

# Samples per Class


Second Majoritarian Class
OOB Error

0.228
0.222

1000 1200 1400 1600 1800

# Samples per Class


Figure 10. Tuning RF sample size per class for LDA loadings.

This same approach was repeated for the PCA rotated dataset. The sample size for the two
minority classes is the same, and the variation of the sample size for the two majoritarian classes
is done in a hundred steps (as in the previous case). Fig. 11 shows the results for this dataset and
RF.

17
First Majoritarian Class

0.116
OOB Error

0.110

1000 1200 1400 1600 1800

# Samples per Class


Second Majoritarian Class
0.116
OOB Error

0.110

1000 1200 1400 1600 1800

# Samples per Class


Figure 11. Tuning RF sample size per class for PCA rotated dataset.

After finishing with the tuning process for RF, the following procedure was the tuning process
for SVM. For LDA loadings dataset, Fig. 12 show the resulting tuning process using a radial
kernel. In the case of PCA rotated dataset, Fig. 13-14 shows the resulting plot for polynomial and
radial dataset respectly.
Performance of `svm'

2.0

1.8
0.29
1.6

1.4 0.28
gamma

1.2
0.27
1.0

0.8 0.26

0.6
0.25
200 400 600 800 1000

cost
Figure 12. Tuning SVM gamma and cost with radial kernel for LDA loading dataset.

18
Performance of `svm'

2.0
0.14
1.8

1.6
0.13
1.4

gamma
1.2
0.12
1.0

0.8
0.11
0.6

20 40 60 80 100

cost
Figure 12. Tuning SVM gamma and cost with radial kernel for LDA loading dataset.

Performance of `svm'

4.0 0.6

3.5
0.5
3.0

0.4
2.5
gamma

2.0
0.3

1.5
0.2
1.0

0.5 0.1
200 400 600 800 1000

cost
Figure 12. Tuning SVM gamma and cost with radial kernel for LDA loading dataset.

Table III shows the results given by the tune.svm function for each case. As can be seen, the
polynomial kernel offers better performance than the radial kernel for the PCA rotated dataset. In
the case of LDA, as previously explained, only a radial kernel is considered.

Table III. SVM tuning results


Dataset Kernel Degree Cost Gamma Resulting Error
PCA rotated Radial NA 10 0.5 0.1082
PCA rotated Polynomial Third 0.1 0.5 0.1049
LDA loadings Radial NA 100 2 0.2515

19
C Model training, validation and selection

After obtaining the necessary tuning parameters for our models, the following stage consisted in
training and testing the selected approaches for your project. This section is divided in two parts:
evaluating the model with the results obtained from the PCA implementation, and evaluating the
model with the results obtained from the LDA implementation. In each part the following
methods are evaluated: kNN, RF, SVM and meta-learner. The code developed for this part is
presented in Appendix A and B.
i. PCA rotated dataset.
The following figures show the constructed one standard error rules graphics for each of the
studied methods. The idea behind this approach is to select the least complex method with the
highest desirability function value. For kNN, the higher the number of neighbors, the less
complex the model, in RF, the lower the number of trees, the less complex the model, and for
SVM, the lower the degrees of the polynomial, the less complex model.

0.83

0.82 Number.of.Neighbors
10.0
meanDes

7.5

5.0

2.5
0.81

0.80

10.0 7.5 5.0 2.5 0.0


Number.of.Neighbors
Figure 13. One standard error rules graphic for KNN and PCA.

From Fig. 13, we can see that highest desirability function value was obtained for six neighbors,
nevertheless, applying the one standard error rule, the model with the best performance is the one
with nine neighbors.

20
0.88

Number.of.Trees
2000

meanDes
0.86
1500

1000

500

0.84

0.82

0 500 1000 1500 2000


Number.of.Trees
Figure 14. One standard error rules graphic for RF and PCA.

From Fig. 14, we can see that highest desirability function value was obtained for 2,000 trees,
nevertheless, applying the one standard error rule, the model with the best performance is the one
with a 100 trees.

0.80

0.75 Poly.Degree
6
meanDes

0.70 2

0.65

2 3 4 5 6
Poly.Degree
Figure 15. One standard error rules graphic for SVM and PCA.

From Fig. 15, we can see that highest desirability function value was obtained for a polynomial
kernel of third degree. In this case the least complex model was the best performer, for this
reason this was chosen as the best obtained model with SVM.

21
In general, the method with the best performance for the PCA rotated dataset was RF. It is
important to note that SVM presented a high MLogLoss error in comparison with kNN and RF,
for this reason it was the method with the lowest performance. In terms of classification error
and kappa statistics, all the methods presented similar characteristics. After identifying the
models with the best performance, and stacking learning approach was tested. Table IV shows
the resulting dataset constructed by using the probabilities assigned for each class for the
previously selected best methods. Fig. 16 shows the arrangement made for this stacked learner.

Table IV. Constructed dataset for stacked learner


P1 P2 P3 P4 P1 P2 P3 P4 P1 P2 P3 P4
Decisin
KNN KNN KNN KNN RF RF RF RF SVM SVM SVM SVM
1 0.00 0.33 0.00 0.67 0.02 0.02 0.00 0.96 0.76 0.16 0.08 0.00 Slight-Right-Turn
2 0.00 0.33 0.00 0.67 0.01 0.01 0.00 0.98 0.77 0.15 0.07 0.00 Slight-Right-Turn
3 0.00 0.33 0.00 0.67 0.01 0.01 0.00 0.98 0.77 0.16 0.07 0.00 Slight-Right-Turn
4 0.00 0.33 0.00 0.67 0.02 0.04 0.00 0.94 0.75 0.17 0.08 0.00 Slight-Right-Turn
5 0.00 0.33 0.00 0.67 0.01 0.01 0.00 0.98 0.77 0.16 0.07 0.00 Slight-Right-Turn
6 0.00 0.33 0.00 0.67 0.01 0.01 0.00 0.98 0.77 0.16 0.07 0.00 Slight-Right-Turn

PCA Rotated
Dataset

Prob. KNN Prob. RF Prob. SVM


Neighbors = 9 N.Trees = 100 Degree = 3

Dataset
Class Labels

CV RF

Result

Figure 16. Proposed arrangement for stacked learner.

22
Fig. 17 shows the results obtained from the training and validation for the stacked learner, as we
can see the maximum value for our desirability functions is achieve with only ten trees.
Therefore, the best performance for the PCA rotated dataset has been obtained by using this
stacked learning approach.

1.0050

1.0025

Number.of.Trees
2000
meanDes

1500
1.0000
1000

500

0.9975

0.9950

0 500 1000 1500 2000


Number.of.Trees
Figure 17. One standard error rules graphic for stacked learner for PCA.

ii. LDA loading dataset.


The same procedure of the previous section was follow for the analysis of the models by using
the LDA loading dataset, as can be seen in the following pictures.

0.83

0.82

Number.of.Neighbors
10.0
meanDes

7.5
0.81
5.0

2.5

0.80

0.79

10.0 7.5 5.0 2.5 0.0


Number.of.Neighbors

Figure 18. One standard error rules graphic for kNN and LDA.

23
Analyzing Fig. 18, we can see that highest desirability function value was obtained for eight
neighbors, nevertheless, applying one standard error rule, even ten neighbors offered a good
performance for this case.

0.96

0.94 Number.of.Trees
2000
meanDes

1500

1000

500
0.92

0.90

0 500 1000 1500 2000


Number.of.Trees
Figure 19. One standard error rules graphic for RF and LDA.

From Fig. 19, we can see that highest desirability function value was obtained for 600 trees,
nevertheless, applying the one standard error rule, the model with the best performance is the one
with a 100 trees.

0.8

0.7 Poly.Degree
1000
meanDes

750

500

250
0.6

0.5

0 250 500 750 1000


Poly.Degree
Figure 20. One standard error rules graphic for SVM and LDA.

24
As contrary as see for kNN and RF, SVM performance is degraded by using as an input the LDA
loadings dataset. Even though in the tuning process the best selected cost was 10, we can see
from our CV process that the best performance was obtain for cost equal to 1000. Due to the low
performance offered by SVM in this case, this model wasnt considered for the stacking process
using as an input the LDA loadings dataset. Fig 21, show the result of the stacking process, as we
can see, using just RF and kNN, the performance is similar to that of Fig. 17.

1.0050

1.0025

Number.of.Trees
2000
meanDes

1500
1.0000
1000

500

0.9975

0.9950

0 500 1000 1500 2000


Number.of.Trees

Figure 21. One standard error rules graphic for SVM and LDA.

VI. Conclusion

25
These project had the objective of evaluating the performance of machine learning algorithms
applied to an autonomous robot navigation problem. This problem was stated as a dynamical
navigation problem and a reference dataset was used for the development of this project
proposal. Our results showed the importance of the data preprocessing stage to improve the
performance of the subsequent machine learning algorithms, in our case, less than ten outliers
were detected in our experimental dataset. Furthermore, the multivariate normality test showed
that the experimental dataset was not multivariate normal, for this reason Mahalanobis distance
could not be considered for outlier detection and, instead, a RF based approach was used.
Additionally, it was possible to successfully apply the techniques LDA and PCA for the
reduction of collinearity and dimensionality in the data input for the model training, validation
and selection process. One of the contributions presented in this work are the results showing the
differences between applying PCA and LDA in this type of application, concluding that for kNN
and RF the best performance was obtained using the resulting dataset after applying LDA, and
for SVM the best performance was obtained by using the resulting dataset after applying PCA.
Our results show how the best single classifier performance was obtained by using the
combination of RF and the LDA loadings dataset, this resulted in a desirability value of 94.75%.
Another important aspect for this work was the evaluation of the combination of heterogeneous
prediction model to create a new stacked learner with improved performance. For this ensemble,
the best performance was obtained by using the combination of RF and KNN with the LDA
loadings dataset, this resulted in a desirability value of 99.99%.
Future work related to this project should focus on the evaluation of real time performance of the
selected algorithms for testing with a real robot system. Also, testing new optimization
techniques for SVM tuning with the objective of speeding up the tuning process would help to
have a better parameter selection for this algorithm and to improve the performance of this
classifier using LDA loadings dataset.

REFERENCES

Belgiu, M., & Drgu, L. (2016). Random forest in remote sensing: A review of applications and future directions.
ISPRS Journal of Photogrammetry and Remote Sensing, 114, 2431.
https://doi.org/10.1016/j.isprsjprs.2016.01.011
26
Fletcher, T. (2009). Support Vector Machines Explained. Retrieved from www.cs.ucl.ac.uk/staff/T.Fletcher/
Freire, A. Veloso, M. Barreto, G. (2010). UCI Machine Learning Repository
[http://www.ics.uci.edu/~mlearn/MLRepository.html]. Irvince, CA: University of California , School of
Information and Computer Science.
Freire, A. L., Barreto, G. A., Veloso, M., & Varela, A. T. (2009). Short-term memory mechanisms in neural network
learning of robot navigation tasks: A case study. 2009 6th Latin American Robotics Symposium, LARS 2009,
(4). https://doi.org/10.1109/LARS.2009.5418323
Hastie, T. J., Tibshirani, R. J., & Friedman, J. H. (2009). The elements of statistical learning: data mining,
inference, and prediction. book, New York: Springer.
MetraLabs GmbH. (2011). SCITOS G5 Embedded PC and Operating System. Retrieved December 14, 2016, from
www.metralabs.com
Morales, N., Toledo, J., & Acosta, L. (2016). Path planning using a Multiclass Support Vector Machine. Applied Soft
Computing, 43, 498509. https://doi.org/10.1016/j.asoc.2016.02.037
Otte, M. W. (2015). A Survey of Machine Learning Approaches to Robotic Path-Planning. Cs.Colorado.Edu.
Retrieved from http://www.cs.colorado.edu/~mozer/Teaching/Computational Modeling Prelim/Otte.pdf
Pol, R. S., & Murugan, M. (2015). A review on indoor human aware autonomous mobile robot navigation through a
dynamic environment survey of different path planning algorithm and methods. 2015 International
Conference on Industrial Instrumentation and Control, ICIC 2015, (Icic), 13391344.
Tian, J., Gao, M., & Lu, E. (2007). Dynamic Collision Avoidance Path Planning for Mobile Robot Based on Multi-
sensor Data Fusion by Support Vector Machine. Mechatronics and Automation, 2007. ICMA 2007.
International Conference on, 27792783.
Zhou, Z., Wen, C., & Yang, C. (2016). Fault Isolation Based on k -Nearest Neighbor Rule for Industrial Processes.
IEEE Transactions on Industrial Electronics, 63(4), 25782586.

APPENDICES

APPENDIX A PCA R MARKDOWN IMPLEMENTATION

27
---
title: "Wall_Following"
author: "Rafael Batista"
date: "4 de diciembre de 2016"
output: pdf_document
---

```{r setup, include=FALSE}


knitr::opts_chunk$set(echo = TRUE)
```

##Loading Packages
```{r, echo=F}
#library(mlbench)
#library(reshape)
#library(caret)
#library(PRROC)
#library(class)
library(ggplot2)
library (MASS)

#Library for Multi Log Loss error


library(MLmetrics)

#Library for kappa error


library(psych)

```

***

##Loading Sensors' Data


```{r, echo=F}
#Reading data file
data=read.csv(file ="C:/Users/Rafael Batista/Desktop/R
Files/sensor_readings_24.csv")

#Inspecting data and searching for missing values


summary(data)
head(data)
number.na<-sum(is.na(data))
number.na
```

No missing data was found in the original dataset. The dataset has 5456
observations with 24 predictors and one response. This response is label as:
-Move-Forward
-Sharp-Right-Turn
-Slight-Left-Turn
-Slight-Right-Turn

***

#Testing for multivariate normality and searching for outliers

28
```{r}

#Library for multivarate normality test


library(MVN)

result<-mardiaTest(data[,-25],qqplot=TRUE)
result
result<-hzTest(data[,-25],qqplot=FALSE)
result
detach("package:MVN", unload=TRUE)

#Library for detecting outliers by diverse methods


library(CORElearn)
#Library for detecting outliers using robus mahalanobis distance
#library(mvoutlier)

dataset <- data


#Creating random forest model for outlier detection, using a distance measure
between nodes
md <- CoreModel(Decision ~ ., dataset, model="rf", rfNoTrees=50, maxThreads=1)
outliers <- rfOutliers(md, dataset)
plot(md, dataset, rfGraphType="outliers")
#Identifying observations as outliers
resOut<-which(abs(outliers) >= 4.5);
resOut
data.outlier=subset(data, select = -c(Decision))

#Data ins Multivaria Normal, so mahalanobis distance is not applied.


#MAHALANOBIS
# res<-aq.plot(x=data.outlier, delta=qchisq(0.975, df=ncol(data.outlier)),
quan=1, alpha=0.05)
# resOut<-which(abs(outliers) == TRUE);

#Removing outlier and reorganizing the data


data.nout<-data.outlier[-c(resOut),]
data.nout.resp<-data[-c(resOut),]
row.names(data.nout) <- 1:nrow(data.nout)

#Removing no further needed variables


rm(data.outlier)
rm(data)
rm(dataset)
rm(outliers)

#Detaching no further needed packages


detach("package:mvoutlier", unload=TRUE)
detach("package:CORElearn", unload=TRUE)
```
Dataset is not multivariate normal (two different test were conducted and the
same result was found), for this reason mahalanobis distance couldnt be used
for outlier detection (when performed almost 300 outliers are detected). For
this reason random forest was used as an alternative for detecting outliers.
In this case only 5 outliers were detected, the threshold for the outlier
detection was selected by visual inspection of the outlier graph generated by
the package "CORElearn".

29
***

##Applying LDA to reduce collinearity


```{r}

#Building LDA model


myLDA=lda(Decision ~., data=data.nout.resp)
myLDA
lda.values <- predict(myLDA)
dev.off()
par(mar=c(2,4,2,2))

#Stacked Histogram
ldahist(data = lda.values$x[,1], g=data.nout.resp$Decision)
ldahist(data = lda.values$x[,2], g=data.nout.resp$Decision)
ldahist(data = lda.values$x[,3], g=data.nout.resp$Decision)

#Selecting loadings
sensor.lda<-as.data.frame(lda.values$x[,1:3])
sensor.lda<-cbind(sensor.lda,data.nout.resp$Decision)
colnames(sensor.lda)[4]<-"decision"
head(sensor.lda)

#Removing no further needed variables


rm(data.nout)
rm(data.nout.resp)
```
In this case, Linear Discriminant analysis is being tested. As we can see from
the histograms, each discriminant .

***

##Using KNN method for classification and and CV


```{r}

#Library kknn is selected for the implenentation of knn (non weighted)


library(kknn)

#Creating list for storing the results of the cross-validation process


ErrorKNN1 <- vector("list", 10)
ErrorKNN2 <- vector("list", 10)
ErrorKNN3 <- vector("list", 10)

#Creating labels for the application of one standard error rule (last section
of the code)
knes=c(rep(1,10),rep(2,10),rep(3,10),rep(4,10),rep(5,10),rep(6,10),rep(7,10),r
ep(8,10),rep(9,10),rep(10,10))

ii=1

#Using number of neighbors as model complexity.


for (kn in 1:10)

30
{
ErrorKNN1[[ii]]=cvErrorKNN(sensor.lda[,1:3],sensor.lda[,4],PM="error",k=10,kn=
3,model="classification")
ii=ii+1;
}
ii=1
for (kn in 1:10)
{
ErrorKNN2[[ii]]=cvErrorKNN(sensor.lda[,1:3],sensor.lda[,4],PM="kappa",k=10,kn=
3,model="classification")
ii=ii+1;
}
ii=1
for (kn in 1:10)
{
ErrorKNN3[[ii]]=cvErrorKNN(sensor.lda[,1:3],sensor.lda[,4],PM="mLogLoss",k=10,
kn=3,model="classification")
ii=ii+1;
}

#Function for cross validation using knn


cvErrorKNN=function(X,Y,PM,k=3,kn=3,model="classification")
{

data=data.frame(X,Y)
ycol=ncol(data)
colnames(data)[ycol]<-"Y"
nFolds=k
permRows=sample(x=1:nrow(data),size=nrow(data),replace=FALSE)
performanceMeasure=matrix(nrow=nFolds,ncol=1)

# Create testing and training folds


obsFold=floor(nrow(data)/nFolds)
pending=nrow(data)-floor(nrow(data)/nFolds)*nFolds
j=0
for (i in 1:nFolds){
if (i>=(nFolds-pending+1) & pending>0) {
assign(paste("F",i,sep=""),data[permRows[(j+1):(j+obsFold)],]) ; j= j +
obsFold + 1 }
else
{ assign(paste("F",i,sep=""),data[permRows[(j+1):(j+obsFold)],]); j= j +
obsFold }
}

if (is.null(PM)==TRUE & is.null(model)==TRUE) {"One or more arguments


missing"
} else {
if (model=="classification" & is.null(PM)==TRUE) {PM="error"}
if (model=="regression" & is.null(PM)==TRUE) {PM="PRESS"}

#Fit model and estimate error


for (i in 1:nFolds){
testing=get(paste("F",i,sep=""))
trainingRows=setdiff(1:nrow(data),as.numeric(row.names(testing)))
31
training=data[trainingRows,]

if (model=="classification"){ data$Y=as.factor(data$Y)
} else {data$Y=as.numeric(data$Y) }

#Predicting classes
myKNN <- train.kknn(Y~., data = training, ks = kn, kernel =
"rectangular")
predicted <- predict(myKNN, newdata=testing)
actual=testing$Y

if (model=="classification")
{
#Calculating classes probabilities to be used as input for
Multiclass Logarithmic Loss
probs=predict(myKNN,newdata=testing,type="prob")
}

#Error evaluation
if (model=="classification") {
if (PM=="error") { performanceMeasure[i]=sum(actual!
=predicted)/nrow(testing)}
if(PM=="kappa") {
cm=as.matrix(table(Actual = actual, Predicted = predicted))
#Using the function for Kappa from the package "psych"
performanceMeasure[i]=cohen.kappa(cm)$kappa
}
if (PM=="mLogLoss") {
#This function needs the probabilities, is from the package
"MLmetrics".
performanceMeasure[i]=MultiLogLoss(y_pred = probs,y_true = actual)

}
} else { if(model=="regression" & PM=="PRESS") {
performanceMeasure[i]=sum((actual-predicted)*(actual-predicted))
} else { if(model=="regression" & PM=="MAD")
{ performanceMeasure[i]=sum(abs(actual-predicted)/length(predicted))
}
}
}
if(model=="regression" & PM=="RMSE")
{ performanceMeasure[i]=(sum((actual-
predicted)^2)/length(predicted))^0.5}

}
if ((model=="classification"& (PM=="error" | PM=="kappa" |
PM=="mLogLoss")) | (model=="regression" & (PM=="PRESS" | PM=="MAD" |
PM=="RMSE"))) {
performanceMeasure=round(performanceMeasure,4)
meanError=round(mean(performanceMeasure),4)
sdError=round(sd(performanceMeasure),4)

output=list(meanError,sdError,performanceMeasure)
output1=list(performanceMeasure)
names(output)=c(paste("mean",PM,sep="."),paste("sd",PM,sep="."),PM)

32
return(output1)
# return(performanceMeasure)
# return(meanError)
# return(sdError)
} else {
"One or more invalid arguments have been inputed."
}
}
}

#Removing function
rm(CvErrorKNN)

#Detaching package
detach("package:kknn", unload=TRUE)

```
In this section the kNN model was trained and cross validated using three
errors: Classification error, Kappa and Multiclass Logarithmic Loss. For the
complexity evaluation, the numbers of neighbors was the chosen variable, only
the rectangular kernel was used, as the proposal of this project is only for
normal kNN (which is the rectangular case as stated in the documentation for
the "kknn" package)
***

##Using Random Forest method for classification and and CV


```{r}
#Package randomFores is selected for the implenentation of random forest
library(randomForest)

#Using number of trees as model complexity.


tree_n=c(10,20,50,100,200,300,400,500,1000,2000)

#Creating labels for the application of one standard error rule (last section
of the code)
tree_n_es=c(rep(10,10),rep(20,10),rep(50,10),rep(100,10),rep(200,10),rep(300,1
0),rep(400,10),rep(500,10),rep(1000,10),rep(2000,10))

ii=1
#Creating list for storing CV errors
ErrorRF1 <- vector("list", 10)
ErrorRF2 <- vector("list", 10)
ErrorRF3 <- vector("list", 10)
for (n_t in tree_n)
{
ErrorRF1[[ii]]=cvErrorRF(sensor.lda[,1:3],sensor.lda[,4],PM="error",k=10,nt=n_
t,model="classification")
ii=ii+1;
}
ii=1
for (n_t in tree_n)
{
ErrorRF2[[ii]]=cvErrorRF(sensor.lda[,1:3],sensor.lda[,4],PM="kappa",k=10,nt=n_
t,model="classification")

33
ii=ii+1;
}
ii=1
for (n_t in tree_n)
{
ErrorRF3[[ii]]=cvErrorRF(sensor.lda[,1:3],sensor.lda[,4],PM="mLogLoss",k=10,nt
=n_t,model="classification")
ii=ii+1;
}

#Function for cross validation using random forest


cvErrorRF=function(X,Y,PM,k=3,nt=50,model="classification")
{
library(randomForest)
data=data.frame(X,Y)
ycol=ncol(data)
colnames(data)[ycol]<-"Y"
nFolds=k
permRows=sample(x=1:nrow(data),size=nrow(data),replace=FALSE)
performanceMeasure=matrix(nrow=nFolds,ncol=1)

# Create testing and training folds


obsFold=floor(nrow(data)/nFolds)
pending=nrow(data)-floor(nrow(data)/nFolds)*nFolds
j=0
for (i in 1:nFolds){
if (i>=(nFolds-pending+1) & pending>0) {
assign(paste("F",i,sep=""),data[permRows[(j+1):(j+obsFold)],]) ; j= j +
obsFold + 1 }
else
{ assign(paste("F",i,sep=""),data[permRows[(j+1):(j+obsFold)],]); j= j +
obsFold }
}

if (is.null(PM)==TRUE & is.null(model)==TRUE) {"One or more arguments


missing"
} else {
if (model=="classification" & is.null(PM)==TRUE) {PM="error"}
if (model=="regression" & is.null(PM)==TRUE) {PM="PRESS"}

#Fit model and estimate error


for (i in 1:nFolds){
testing=get(paste("F",i,sep=""))
trainingRows=setdiff(1:nrow(data),as.numeric(row.names(testing)))
training=data[trainingRows,]

if (model=="classification"){ data$Y=as.factor(data$Y)
} else {data$Y=as.numeric(data$Y) }

#Creating model, different weight for the classes were tested, with no
good result.
#Also undersampling the majority classes but with no good results.
myRF=randomForest(Y~.,data=training, ntree =
nt,sampsize=c(round(1660),round(1610),round(250), round(700)))
#classwt = c(1,1,2,2)
#sampsize=c(200, 200, 200, 200), strata = data$Y
34
predicted=predict(myRF,newdata=testing)
actual=testing$Y
if (model=="classification")
{
#Calculating classes probabilities to be used as input for
Multiclass Logarithmic Loss
probs=predict(myRF,newdata=testing,type="prob")
}

if (model=="classification") {
if (PM=="error") { performanceMeasure[i]=sum(actual!
=predicted)/nrow(testing)}
if(PM=="kappa") {
cm=as.matrix(table(Actual = actual, Predicted = predicted))
#Using the function for Kappa from the package "psych"
performanceMeasure[i]=cohen.kappa(cm)$kappa
}
if (PM=="mLogLoss") {
#This function needs the probabilities, is from the package
"MLmetrics"
performanceMeasure[i]=MultiLogLoss(y_pred = probs,y_true = actual)
}
} else { if(model=="regression" & PM=="PRESS") {
performanceMeasure[i]=sum((actual-predicted)*(actual-predicted))
} else { if(model=="regression" & PM=="MAD")
{ performanceMeasure[i]=sum(abs(actual-predicted)/length(predicted))
}
}
}
if(model=="regression" & PM=="RMSE")
{ performanceMeasure[i]=(sum((actual-
predicted)^2)/length(predicted))^0.5}

}
if ((model=="classification"& (PM=="error" | PM=="kappa" |
PM=="mLogLoss")) | (model=="regression" & (PM=="PRESS" | PM=="MAD" |
PM=="RMSE"))) {
performanceMeasure=round(performanceMeasure,4)
meanError=round(mean(performanceMeasure),4)
sdError=round(sd(performanceMeasure),4)
output=list(meanError,sdError,performanceMeasure)
names(output)=c(paste("mean",PM,sep="."),paste("sd",PM,sep="."),PM)
output1=list(performanceMeasure)
return(output1)
} else {
"One or more invalid arguments have been inputed."
}
}
}

detach("package:randomForest", unload=TRUE)

```

35
In this section the random forest model was trained and cross validated using
three errors: Classification error, Kappa and Multiclass Logarithmic Loss. For
the complexity evaluation, the numbers of trees was the chosen variable, also,
differents weights for each classes were used, but using the same
***

##Using SVM and CV


```{r}
library("e1071")

#Choosing as a complexity measure the degrees of the polynomial kernel for the
SVM
d_g=c(0.1,1,10,100,1000)

#Creating labels for degrees of polynomial kernel


dges=c(rep(0.1,10),rep(1,10),rep(10,10),rep(100,10),rep(1000,10))
ErrorSVM1 <- vector("list", 5)
ErrorSVM2 <- vector("list", 5)
ErrorSVM3 <- vector("list", 5)

#Running CV function for SVM


ii=1
for (dg_i in d_g)
{
ErrorSVM1[[ii]]=cvErrorSVM(sensor.lda[,1:3],sensor.lda[,4],PM="error",k=10,dg=
dg_i,model="classification")
ii=ii+1;
}
ii=1
for (dg_i in d_g)
{
ErrorSVM2[[ii]]=cvErrorSVM(sensor.lda[,1:3],sensor.lda[,4],PM="kappa",k=10,dg=
dg_i,model="classification")
ii=ii+1;
}
ii=1
for (dg_i in d_g)
{
ErrorSVM3[[ii]]=cvErrorSVM(sensor.lda[,1:3],sensor.lda[,4],PM="mLogLoss",k=10,
dg=dg_i,model="classification")
ii=ii+1;
}

#Function for cross validation using SVM


cvErrorSVM=function(X,Y,PM,k=3,dg=3,model="classification")
{
data=data.frame(X,Y)
ycol=ncol(data)
colnames(data)[ycol]<-"Y"
nFolds=k
permRows=sample(x=1:nrow(data),size=nrow(data),replace=FALSE)
performanceMeasure=matrix(nrow=nFolds,ncol=1)

# Create testing and training folds


obsFold=floor(nrow(data)/nFolds)

36
pending=nrow(data)-floor(nrow(data)/nFolds)*nFolds
j=0
for (i in 1:nFolds){
if (i>=(nFolds-pending+1) & pending>0) {
assign(paste("F",i,sep=""),data[permRows[(j+1):(j+obsFold)],]) ; j= j +
obsFold + 1 }
else
{ assign(paste("F",i,sep=""),data[permRows[(j+1):(j+obsFold)],]); j= j +
obsFold }
}

if (is.null(PM)==TRUE & is.null(model)==TRUE) {"One or more arguments


missing"
} else {
if (model=="classification" & is.null(PM)==TRUE) {PM="error"}
if (model=="regression" & is.null(PM)==TRUE) {PM="PRESS"}

#Fit model and estimate error


for (i in 1:nFolds){
testing=get(paste("F",i,sep=""))
trainingRows=setdiff(1:nrow(data),as.numeric(row.names(testing)))
training=data[trainingRows,]

if (model=="classification"){ data$Y=as.factor(data$Y)
} else {data$Y=as.numeric(data$Y) }
#Creating model, different weight for the classes were tested, with no
good result. Also a radial kernel
#was tested.
#mySVM=svm(Y~., data=training, kernel="radial", cost=1, gamma=0.5) #Good
#Best model using polynomial kernel with cost=0.01 gamma=1
mySVM=svm(Y~., data=training, kernel="radial", cost=dg,
gamma=2,probability = TRUE)
predicted=predict(mySVM,newdata=testing)
actual=testing$Y
if (model=="classification")
{
#Calculating classes probabilities to be used as input for
Multiclass Logarithmic Loss
probs1=predict(mySVM,newdata=testing,probability = TRUE)
probs=attr(probs1, "probabilities")

if (model=="classification") {
if (PM=="error") { performanceMeasure[i]=sum(actual!
=predicted)/nrow(testing)}
if(PM=="kappa") {
cm=as.matrix(table(Actual = actual, Predicted = predicted))
performanceMeasure[i]=cohen.kappa(cm)$kappa
}
if (PM=="mLogLoss") {
performanceMeasure[i]=MultiLogLoss(y_pred = probs,y_true = actual)
}
} else { if(model=="regression" & PM=="PRESS") {
performanceMeasure[i]=sum((actual-predicted)*(actual-predicted))
37
} else { if(model=="regression" & PM=="MAD")
{ performanceMeasure[i]=sum(abs(actual-predicted)/length(predicted))
}
}
}
if(model=="regression" & PM=="RMSE")
{ performanceMeasure[i]=(sum((actual-
predicted)^2)/length(predicted))^0.5}

}
if ((model=="classification"& (PM=="error" | PM=="kappa" |
PM=="mLogLoss")) | (model=="regression" & (PM=="PRESS" | PM=="MAD" |
PM=="RMSE"))) {
performanceMeasure=round(performanceMeasure,4)
meanError=round(mean(performanceMeasure),4)
sdError=round(sd(performanceMeasure),4)
output=list(meanError,sdError,performanceMeasure)
names(output)=c(paste("mean",PM,sep="."),paste("sd",PM,sep="."),PM)
output1=list(performanceMeasure)
return(output1)
} else {
"One or more invalid arguments have been inputed."
}
}
}

rm(cvErrorSVM)
detach("package:e1071", unload=TRUE)
detach("package:psych", unload=TRUE)
detach("package:MLmetrics", unload=TRUE)

```

***

##One Standard Error Rule for all the methods


```{r}

#Findin maximum values for normalizing (between 0 and 1) Multiclass


Logarithmic Loss
max_value=max(c(max(unlist(ErrorKNN3)),max(unlist(ErrorRF3)),max(unlist(ErrorS
VM3))))
min_value=min(c(min(unlist(ErrorKNN3)),min(unlist(ErrorRF3)),min(unlist(ErrorS
VM3))))

#Normalizing function
nrmlze <- function(x) {(x - min_value)/(max_value - min_value)}

#For inverting the error (the greater the better, in the case of
classification error and Multi Logarithmic Loss)
invrs<-function (x) 1-x

#Normalizing KNN errors


ErrorKNN1n=invrs(unlist(ErrorKNN1))
ErrorKNN2n=unlist(ErrorKNN2)

38
ErrorKNN3n=invrs(nrmlze(unlist(ErrorKNN3)))
#Normalizing random forest errors
ErrorRF1n=invrs(unlist(ErrorRF1))
ErrorRF2n=unlist(ErrorRF2)
ErrorRF3n=invrs(nrmlze(unlist(ErrorRF3)))

#Normalizing SVM errors


ErrorSVM1n=invrs(unlist(ErrorSVM1))
ErrorSVM2n=unlist(ErrorSVM2)
ErrorSVM3n=invrs(nrmlze(unlist(ErrorSVM3)))

#Constructing desirability functions


desir_KNN=0.5*ErrorKNN1n+0.4*ErrorKNN2n+0.1*ErrorKNN3n
desir_RF=0.5*ErrorRF1n+0.4*ErrorRF2n+0.1*ErrorRF3n
desir_SVM=0.5*ErrorSVM1n+0.4*ErrorSVM2n+0.1*ErrorSVM3n

#Constructing data frame for one standard error plot


onstderror_KNN=as.data.frame(cbind(knes,desir_KNN))
onstderror_RF=as.data.frame(cbind(tree_n_es,desir_RF))
onstderror_SVM=as.data.frame(cbind(dges,desir_SVM))

#Binding labels for data summarization


names(onstderror_KNN)=c("Number.of.Neighbors","Error")
names(onstderror_RF)=c("Number.of.Trees","Error")
names(onstderror_SVM)=c("Poly.Degree","Error")

#Using library dplyr


library(dplyr)

#Grouping data by Number.of.Neighbors and calculating the mean and standard


deviation of the desirability function
oneSTDerror=tbl_df(onstderror_KNN)
summarizedData=oneSTDerror %>% group_by(Number.of.Neighbors) %>%
summarise(meanDes = mean(Error),stdDes=sd(Error)/sqrt(n()))
#Plotting using ggplot (flipped, the lower the number of neighbors the greater
the complexity of the model)
ggplot(summarizedData, aes(x=Number.of.Neighbors, y=meanDes,
colour=Number.of.Neighbors)) +
geom_errorbar(aes(ymin=meanDes-stdDes, ymax=meanDes+stdDes), width=.1) +
geom_line() + geom_point() + scale_y_continuous(limits = c(0.79,0.83))
+scale_x_reverse(limits=c(10,0))

#Grouping data by Number.of.Trees and calculating the mean and standard


deviation of the desirability function
oneSTDerror=tbl_df(onstderror_RF)
summarizedData=oneSTDerror %>% group_by(Number.of.Trees) %>%
summarise(meanDes = mean(Error),stdDes =sd(Error)/sqrt(n()))
#Plotting using ggplot
ggplot(summarizedData, aes(x=Number.of.Trees, y=meanDes,
colour=Number.of.Trees)) +
geom_errorbar(aes(ymin=meanDes-stdDes, ymax=meanDes+stdDes), width=.1) +
geom_line() + geom_point() + scale_y_continuous(limits = c(0.9,0.960))

#Grouping data by Poly.Degree and calculating the mean and standard deviation
of the desirability function
oneSTDerror=tbl_df(onstderror_SVM)

39
summarizedData=oneSTDerror %>% group_by(Poly.Degree) %>%
summarise(meanDes = mean(Error),stdDes =sd(Error)/sqrt(n()))
#Plotting using ggplot
ggplot(summarizedData, aes(x=Poly.Degree, y=meanDes, colour=Poly.Degree)) +
geom_errorbar(aes(ymin=meanDes-stdDes, ymax=meanDes+stdDes), width=.1) +
geom_line() + geom_point() + scale_y_continuous(limits = c(0.5,0.8))
+scale_x_reverse(limits=c(1000,0))

detach("package:dplyr", unload=TRUE)

```
In this part the evaluation of the constructed models in term of complexity
and choosen errors is shown using the desirability function (the greater the
value the lowest the error of the model). This will allow to choose the least
complex model with the best response. For KNN,using the one stadard error
rule, it is possible to see that the less complex model with the best response
is with N=9. In the case of random forest, the less complex model with the
best response is with ntrees=100. Finally, for SVM, the best case is for a
third degree polynomial kernel.
***

##Stacking
```{r}

#Loadining neede packages


library(kknn)
library(randomForest)
library(e1071)
library(MLmetrics)
library(psych)

#Using three best models


#KNN
myKNN <- train.kknn(decision~., data = sensor.lda, ks = 10, kernel =
"rectangular")
probsKNN=predict(myKNN,newdata=sensor.lda,type="prob")

#RF
myRF=randomForest(decision~.,data=sensor.lda, ntree = 100)
probsRF=predict(myRF,newdata=sensor.lda,type="prob")

#SVM
mySVM=svm(decision~., data=sensor.lda, kernel="radial", cost=0.1,
gamma=1,probability = TRUE)
probs1SVM=predict(mySVM,newdata=sensor.lda,probability = TRUE)
probsSVM=attr(probs1SVM, "probabilities")

#Creating a new dafa frame using the probabilities of the classes and binding
the response of the PCA rotated
stacking_data=as.data.frame(cbind(probsKNN,probsRF))
colnames(stacking_data)=c(1:8)
stacking_data=cbind(stacking_data,sensor.lda$decision)
colnames(stacking_data)=c(1:8,"decision")
head(stacking_data)

40
#USing LDA for inference on the relations between the new predictors and the
response
myLDA=lda(decision ~., data=stacking_data)
myLDA
stacked.lda.values <- predict(myLDA)

library(randomForest)
ii=1
ErrorRF1s <- vector("list", 10)
ErrorRF2s <- vector("list", 10)
ErrorRF3s <- vector("list", 10)
for (n_t in tree_n)
{
ErrorRF1s[[ii]]=cvErrorRF(stacking_data[,1:8],stacking_data[,9],PM="error",k=1
0,nt=n_t,model="classification")
ii=ii+1;
}
ii=1
for (n_t in tree_n)
{
ErrorRF2s[[ii]]=cvErrorRF(stacking_data[,1:8],stacking_data[,9],PM="kappa",k=1
0,nt=n_t,model="classification")
ii=ii+1;
}
ii=1
for (n_t in tree_n)
{
ErrorRF3s[[ii]]=cvErrorRF(stacking_data[,1:8],stacking_data[,9],PM="mLogLoss",
k=10,nt=n_t,model="classification")
ii=ii+1;
}
ErrorRF1sn=invrs(unlist(ErrorRF1s))
ErrorRF2sn=unlist(ErrorRF2s)
ErrorRF3sn=invrs(unlist(ErrorRF3s))

desir_RFs=0.5*ErrorRF1sn+0.4*ErrorRF2sn+0.1*ErrorRF3sn
onstderror_RFs=as.data.frame(cbind(tree_n_es,desir_RFs))
names(onstderror_RFs)=c("Number.of.Trees","Error")

library(dplyr)

oneSTDerror=tbl_df(onstderror_RFs)
summarizedData=oneSTDerror %>% group_by(Number.of.Trees) %>%
summarise(meanDes = mean(Error),stdDes =sd(Error)/sqrt(n()))

ggplot(summarizedData, aes(x=Number.of.Trees, y=meanDes,


colour=Number.of.Trees)) +
geom_errorbar(aes(ymin=meanDes-stdDes, ymax=meanDes+stdDes), width=.1) +
geom_line() + geom_point() + scale_y_continuous(limits = c(0.995,1.005))

detach("package:dplyr", unload=TRUE)
```
For the stacking process, the best performing classifiers were chosen (one for
kNN, RF and SVM), after this, the complete rotated dataset was used to train

41
the classifiers and obtain the probabilities for the classes. The resulting
probabilities for each one of the four classes and the original dataset's
response were binded together, this new dataset was used as an input for a
random forest classifier. Additionaly, cross validation was used to test this
approach (using classification error, kappa and muticlass logarithmic loss)
with 10 folds. The result shows how the highest value for the desirabity
function is obtained with this scheme (a value of one), for this reason the
best results for this project were obtained with this ensemble learning
strategy.

APPENDIX B PCA R MARKDOWN IMPLEMENTATION

---
title: "Wall_Following"
author: "Rafael Batista"
date: "4 de diciembre de 2016"
output: pdf_document
---

```{r setup, include=FALSE}


knitr::opts_chunk$set(echo = TRUE)
```
42
##Loading Packages
```{r, echo=F}
#library(mlbench)
#library(reshape)
#library(caret)
#library(PRROC)
#library(class)
library(ggplot2)
library (MASS)
library (xtable)

#Library for Multi Log Loss error


library(MLmetrics)

#Library for kappa error


library(psych)

```

***

##Loading Sensors' Data


```{r, echo=F}
#Reading data file
data=read.csv(file ="C:/Users/Rafael Batista/Desktop/R
Files/sensor_readings_24.csv")

#Inspecting data and searching for missing values


summary(data)
head(data)
number.na<-sum(is.na(data))
number.na
```

No missing data was found in the original dataset. The dataset has 5456
observations with 24 predictors and one response. This response is label as:
-Move-Forward
-Sharp-Right-Turn
-Slight-Left-Turn
-Slight-Right-Turn

***

#Testing for multivariate normality and searching for outliers


```{r}

#Library for multivarate normality test


library(MVN)

result<-mardiaTest(data[,-25],qqplot=TRUE)
result
result<-hzTest(data[,-25],qqplot=FALSE)

43
result
detach("package:MVN", unload=TRUE)

#Library for detecting outliers by diverse methods


library(CORElearn)
#Library for detecting outliers using robus mahalanobis distance
#library(mvoutlier)

dataset <- data


#Creating random forest model for outlier detection, using a distance measure
between nodes
md <- CoreModel(Decision ~ ., dataset, model="rf", rfNoTrees=50, maxThreads=1)
outliers <- rfOutliers(md, dataset)
plot(md, dataset, rfGraphType="outliers")
#Identifying observations as outliers
resOut<-which(abs(outliers) >= 4.5);
resOut
data.outlier=subset(data, select = -c(Decision))

#Data ins Multivaria Normal, so mahalanobis distance is not applied.


#MAHALANOBIS
# res<-aq.plot(x=data.outlier, delta=qchisq(0.975, df=ncol(data.outlier)),
quan=1, alpha=0.05)
# resOut<-which(abs(outliers) == TRUE);

#Removing outlier and reorganizing the data


data.nout<-data.outlier[-c(resOut),]
data.nout.resp<-data[-c(resOut),]
row.names(data.nout) <- 1:nrow(data.nout)

#Removing no further needed variables


rm(data.outlier)
rm(data)
rm(dataset)
rm(outliers)

#Detaching no further needed packages


detach("package:CORElearn", unload=TRUE)
```
Dataset is not multivariate normal (two different test were conducted and the
same result was found), for this reason mahalanobis distance couldnt be used
for outlier detection (when performed almost 300 outliers are detected). For
this reason random forest was used as an alternative for detecting outliers.
In this case only 5 outliers were detected, the threshold for the outlier
detection was selected by visual inspection of the outlier graph generated by
the package "CORElearn".

***

##Applying PCA to reduce collinearity


```{r}

#Building PCA model


sensors.pca=prcomp(x=data.nout,center=TRUE,scale=TRUE)
print(sensors.pca)

44
#Scree Plot
plot(sensors.pca, type = "l")
summary(sensors.pca)

#Selecting first 15 principal componets and bingind response


sensor.rot<-as.data.frame(sensors.pca$x[,1:15])
sensor.rot<-cbind(sensor.rot,data.nout.resp$Decision)
colnames(sensor.rot)[16]<-"decision"
head(sensor.rot)

#Removing no further needed variables


rm(sensors.pca)
rm(data.nout)
rm(data.nout.resp)
```
The first 15 PCs were chosen (explaining more than 86% of variability in the
data) to be used as a input for the classification methods. The rotated data
for these 15 principal components is chosen and the prediction column is
binded to this data.

***

The first 15 PCs were chosen (explaining more than 86% of variability in the
data) to be used as a input for the classification methods. The rotated data
for these 15 principal components is chosen and the prediction column is
binded to this data.

***

##Using KNN method for classification and and CV


```{r}

#Library kknn is selected for the implenentation of knn (non weighted)


library(kknn)

#Creating list for storing the results of the cross-validation process


ErrorKNN1 <- vector("list", 10)
ErrorKNN2 <- vector("list", 10)
ErrorKNN3 <- vector("list", 10)

#Creating labels for the application of one standard error rule (last section
of the code)
knes=c(rep(1,10),rep(2,10),rep(3,10),rep(4,10),rep(5,10),rep(6,10),rep(7,10),r
ep(8,10),rep(9,10),rep(10,10))

ii=1

#Using number of neighbors as model complexity.


for (kn in 1:10)
{
ErrorKNN1[[ii]]=cvErrorKNN(sensor.rot[,1:15],sensor.rot[,16],PM="error",k=10,k
n=3,model="classification")
ii=ii+1;
}

45
ii=1
for (kn in 1:10)
{
ErrorKNN2[[ii]]=cvErrorKNN(sensor.rot[,1:15],sensor.rot[,16],PM="kappa",k=10,k
n=3,model="classification")
ii=ii+1;
}
ii=1
for (kn in 1:10)
{
ErrorKNN3[[ii]]=cvErrorKNN(sensor.rot[,1:15],sensor.rot[,16],PM="mLogLoss",k=1
0,kn=3,model="classification")
ii=ii+1;
}

#Function for cross validation using knn


cvErrorKNN=function(X,Y,PM,k=3,kn=3,model="classification")
{

data=data.frame(X,Y)
ycol=ncol(data)
colnames(data)[ycol]<-"Y"
nFolds=k
permRows=sample(x=1:nrow(data),size=nrow(data),replace=FALSE)
performanceMeasure=matrix(nrow=nFolds,ncol=1)

# Create testing and training folds


obsFold=floor(nrow(data)/nFolds)
pending=nrow(data)-floor(nrow(data)/nFolds)*nFolds
j=0
for (i in 1:nFolds){
if (i>=(nFolds-pending+1) & pending>0) {
assign(paste("F",i,sep=""),data[permRows[(j+1):(j+obsFold)],]) ; j= j +
obsFold + 1 }
else
{ assign(paste("F",i,sep=""),data[permRows[(j+1):(j+obsFold)],]); j= j +
obsFold }
}

if (is.null(PM)==TRUE & is.null(model)==TRUE) {"One or more arguments


missing"
} else {
if (model=="classification" & is.null(PM)==TRUE) {PM="error"}
if (model=="regression" & is.null(PM)==TRUE) {PM="PRESS"}

#Fit model and estimate error


for (i in 1:nFolds){
testing=get(paste("F",i,sep=""))
trainingRows=setdiff(1:nrow(data),as.numeric(row.names(testing)))
training=data[trainingRows,]

if (model=="classification"){ data$Y=as.factor(data$Y)
} else {data$Y=as.numeric(data$Y) }

46
#Predicting classes
myKNN <- train.kknn(Y~., data = training, ks = kn, kernel =
"rectangular")
predicted <- predict(myKNN, newdata=testing)
actual=testing$Y

if (model=="classification")
{
#Calculating classes probabilities to be used as input for
Multiclass Logarithmic Loss
probs=predict(myKNN,newdata=testing,type="prob")
}

#Error evaluation
if (model=="classification") {
if (PM=="error") { performanceMeasure[i]=sum(actual!
=predicted)/nrow(testing)}
if(PM=="kappa") {
cm=as.matrix(table(Actual = actual, Predicted = predicted))
#Using the function for Kappa from the package "psych"
performanceMeasure[i]=cohen.kappa(cm)$kappa
}
if (PM=="mLogLoss") {
#This function needs the probabilities, is from the package
"MLmetrics".
performanceMeasure[i]=MultiLogLoss(y_pred = probs,y_true = actual)

}
} else { if(model=="regression" & PM=="PRESS") {
performanceMeasure[i]=sum((actual-predicted)*(actual-predicted))
} else { if(model=="regression" & PM=="MAD")
{ performanceMeasure[i]=sum(abs(actual-predicted)/length(predicted))
}
}
}
if(model=="regression" & PM=="RMSE")
{ performanceMeasure[i]=(sum((actual-
predicted)^2)/length(predicted))^0.5}

}
if ((model=="classification"& (PM=="error" | PM=="kappa" |
PM=="mLogLoss")) | (model=="regression" & (PM=="PRESS" | PM=="MAD" |
PM=="RMSE"))) {
performanceMeasure=round(performanceMeasure,4)
meanError=round(mean(performanceMeasure),4)
sdError=round(sd(performanceMeasure),4)

output=list(meanError,sdError,performanceMeasure)
output1=list(performanceMeasure)
names(output)=c(paste("mean",PM,sep="."),paste("sd",PM,sep="."),PM)
return(output1)
# return(performanceMeasure)
# return(meanError)
# return(sdError)
} else {
"One or more invalid arguments have been inputed."
47
}
}
}

#Removing function
rm(CvErrorKNN)

#Detaching package
detach("package:kknn", unload=TRUE)

```
In this section the kNN model was trained and cross validated using three
errors: Classification error, Kappa and Multiclass Logarithmic Loss. For the
complexity evaluation, the numbers of neighbors was the chosen variable, only
the rectangular kernel was used, as the proposal of this project is only for
normal kNN (which is the rectangular case as stated in the documentation for
the "kknn" package)
***

##Using Random Forest method for classification and and CV


```{r}
#Package randomFores is selected for the implenentation of random forest
library(randomForest)

#Using number of trees as model complexity.


tree_n=c(10,20,50,100,200,300,400,500,1000,2000)

#Creating labels for the application of one standard error rule (last section
of the code)
tree_n_es=c(rep(10,10),rep(20,10),rep(50,10),rep(100,10),rep(200,10),rep(300,1
0),rep(400,10),rep(500,10),rep(1000,10),rep(2000,10))

ii=1
#Creating list for storing CV errors
ErrorRF1 <- vector("list", 10)
ErrorRF2 <- vector("list", 10)
ErrorRF3 <- vector("list", 10)
for (n_t in tree_n)
{
ErrorRF1[[ii]]=cvErrorRF(sensor.rot[,1:15],sensor.rot[,16],PM="error",k=10,nt=
n_t,model="classification")
ii=ii+1;
}
ii=1
for (n_t in tree_n)
{
ErrorRF2[[ii]]=cvErrorRF(sensor.rot[,1:15],sensor.rot[,16],PM="kappa",k=10,nt=
n_t,model="classification")
ii=ii+1;
}
ii=1
for (n_t in tree_n)
{

48
ErrorRF3[[ii]]=cvErrorRF(sensor.rot[,1:15],sensor.rot[,16],PM="mLogLoss",k=10,
nt=n_t,model="classification")
ii=ii+1;
}

#Function for cross validation using random forest


cvErrorRF=function(X,Y,PM,k=3,nt=50,model="classification")
{
library(randomForest)
data=data.frame(X,Y)
ycol=ncol(data)
colnames(data)[ycol]<-"Y"
nFolds=k
permRows=sample(x=1:nrow(data),size=nrow(data),replace=FALSE)
performanceMeasure=matrix(nrow=nFolds,ncol=1)

# Create testing and training folds


obsFold=floor(nrow(data)/nFolds)
pending=nrow(data)-floor(nrow(data)/nFolds)*nFolds
j=0
for (i in 1:nFolds){
if (i>=(nFolds-pending+1) & pending>0) {
assign(paste("F",i,sep=""),data[permRows[(j+1):(j+obsFold)],]) ; j= j +
obsFold + 1 }
else
{ assign(paste("F",i,sep=""),data[permRows[(j+1):(j+obsFold)],]); j= j +
obsFold }
}

if (is.null(PM)==TRUE & is.null(model)==TRUE) {"One or more arguments


missing"
} else {
if (model=="classification" & is.null(PM)==TRUE) {PM="error"}
if (model=="regression" & is.null(PM)==TRUE) {PM="PRESS"}

#Fit model and estimate error


for (i in 1:nFolds){
testing=get(paste("F",i,sep=""))
trainingRows=setdiff(1:nrow(data),as.numeric(row.names(testing)))
training=data[trainingRows,]

if (model=="classification"){ data$Y=as.factor(data$Y)
} else {data$Y=as.numeric(data$Y) }

#Creating model, different weight for the classes were tested, with no
good result.
#Also undersampling the majority classes but with no good results.
myRF=randomForest(Y~.,data=training, ntree = nt,sampsize=c(1450, 1410,
250, 700))
#classwt = c(1,1,2,2)
#sampsize=c(200, 200, 200, 200), strata = data$Y
predicted=predict(myRF,newdata=testing)
actual=testing$Y
if (model=="classification")
{

49
#Calculating classes probabilities to be used as input for
Multiclass Logarithmic Loss
probs=predict(myRF,newdata=testing,type="prob")
}

if (model=="classification") {
if (PM=="error") { performanceMeasure[i]=sum(actual!
=predicted)/nrow(testing)}
if(PM=="kappa") {
cm=as.matrix(table(Actual = actual, Predicted = predicted))
#Using the function for Kappa from the package "psych"
performanceMeasure[i]=cohen.kappa(cm)$kappa
}
if (PM=="mLogLoss") {
#This function needs the probabilities, is from the package
"MLmetrics"
performanceMeasure[i]=MultiLogLoss(y_pred = probs,y_true = actual)
}
} else { if(model=="regression" & PM=="PRESS") {
performanceMeasure[i]=sum((actual-predicted)*(actual-predicted))
} else { if(model=="regression" & PM=="MAD")
{ performanceMeasure[i]=sum(abs(actual-predicted)/length(predicted))
}
}
}
if(model=="regression" & PM=="RMSE")
{ performanceMeasure[i]=(sum((actual-
predicted)^2)/length(predicted))^0.5}

}
if ((model=="classification"& (PM=="error" | PM=="kappa" |
PM=="mLogLoss")) | (model=="regression" & (PM=="PRESS" | PM=="MAD" |
PM=="RMSE"))) {
performanceMeasure=round(performanceMeasure,4)
meanError=round(mean(performanceMeasure),4)
sdError=round(sd(performanceMeasure),4)
output=list(meanError,sdError,performanceMeasure)
names(output)=c(paste("mean",PM,sep="."),paste("sd",PM,sep="."),PM)
output1=list(performanceMeasure)
return(output1)
} else {
"One or more invalid arguments have been inputed."
}
}
}

detach("package:randomForest", unload=TRUE)

```
In this section the random forest model was trained and cross validated using
three errors: Classification error, Kappa and Multiclass Logarithmic Loss. For
the complexity evaluation, the numbers of trees was the chosen variable, also,
differents weights for each classes were used, but using the same
***

50
##Using SVM and CV
```{r}
library("e1071")

#Choosing as a complexity the cost for the radial kernel for the SVM
d_g=c(2,3,4,5,6)

#Creating labels for degrees of polynomial kernel


dges=c(rep(2,10),rep(3,10),rep(4,10),rep(5,10),rep(6,10))
ErrorSVM1 <- vector("list", 5)
ErrorSVM2 <- vector("list", 5)
ErrorSVM3 <- vector("list", 5)

#Running CV function for SVM


ii=1
for (dg_i in d_g)
{
ErrorSVM1[[ii]]=cvErrorSVM(sensor.rot[,1:15],sensor.rot[,16],PM="error",k=10,d
g=dg_i,model="classification")
ii=ii+1;
}
ii=1
for (dg_i in d_g)
{
ErrorSVM2[[ii]]=cvErrorSVM(sensor.rot[,1:15],sensor.rot[,16],PM="kappa",k=10,d
g=dg_i,model="classification")
ii=ii+1;
}
ii=1
for (dg_i in d_g)
{
ErrorSVM3[[ii]]=cvErrorSVM(sensor.rot[,1:15],sensor.rot[,16],PM="mLogLoss",k=1
0,dg=dg_i,model="classification")
ii=ii+1;
}

#Function for cross validation using SVM


cvErrorSVM=function(X,Y,PM,k=3,dg=3,model="classification")
{
data=data.frame(X,Y)
ycol=ncol(data)
colnames(data)[ycol]<-"Y"
nFolds=k
permRows=sample(x=1:nrow(data),size=nrow(data),replace=FALSE)
performanceMeasure=matrix(nrow=nFolds,ncol=1)

# Create testing and training folds


obsFold=floor(nrow(data)/nFolds)
pending=nrow(data)-floor(nrow(data)/nFolds)*nFolds
j=0
for (i in 1:nFolds){
if (i>=(nFolds-pending+1) & pending>0) {
assign(paste("F",i,sep=""),data[permRows[(j+1):(j+obsFold)],]) ; j= j +
obsFold + 1 }

51
else
{ assign(paste("F",i,sep=""),data[permRows[(j+1):(j+obsFold)],]); j= j +
obsFold }
}

if (is.null(PM)==TRUE & is.null(model)==TRUE) {"One or more arguments


missing"
} else {
if (model=="classification" & is.null(PM)==TRUE) {PM="error"}
if (model=="regression" & is.null(PM)==TRUE) {PM="PRESS"}

#Fit model and estimate error


for (i in 1:nFolds){
testing=get(paste("F",i,sep=""))
trainingRows=setdiff(1:nrow(data),as.numeric(row.names(testing)))
training=data[trainingRows,]

if (model=="classification"){ data$Y=as.factor(data$Y)
} else {data$Y=as.numeric(data$Y) }
#Creating model, different weight for the classes were tested, with no
good result. Also a radial kernel
#was tested.
#mySVM=svm(Y~., data=training, kernel="radial", cost=1, gamma=0.5) #Good
#Best model using polynomial kernel with cost=0.01 gamma=1
mySVM=svm(Y~., data=training, kernel="polynomial",degree=dg, cost=0.1,
gamma=0.5,probability = TRUE)
predicted=predict(mySVM,newdata=testing)
actual=testing$Y
if (model=="classification")
{
#Calculating classes probabilities to be used as input for
Multiclass Logarithmic Loss
probs1=predict(mySVM,newdata=testing,probability = TRUE)
probs=attr(probs1, "probabilities")

if (model=="classification") {
if (PM=="error") { performanceMeasure[i]=sum(actual!
=predicted)/nrow(testing)}
if(PM=="kappa") {
cm=as.matrix(table(Actual = actual, Predicted = predicted))
performanceMeasure[i]=cohen.kappa(cm)$kappa
}
if (PM=="mLogLoss") {
performanceMeasure[i]=MultiLogLoss(y_pred = probs,y_true = actual)
}
} else { if(model=="regression" & PM=="PRESS") {
performanceMeasure[i]=sum((actual-predicted)*(actual-predicted))
} else { if(model=="regression" & PM=="MAD")
{ performanceMeasure[i]=sum(abs(actual-predicted)/length(predicted))
}
}
}
if(model=="regression" & PM=="RMSE")
52
{ performanceMeasure[i]=(sum((actual-
predicted)^2)/length(predicted))^0.5}

}
if ((model=="classification"& (PM=="error" | PM=="kappa" |
PM=="mLogLoss")) | (model=="regression" & (PM=="PRESS" | PM=="MAD" |
PM=="RMSE"))) {
performanceMeasure=round(performanceMeasure,4)
meanError=round(mean(performanceMeasure),4)
sdError=round(sd(performanceMeasure),4)
output=list(meanError,sdError,performanceMeasure)
names(output)=c(paste("mean",PM,sep="."),paste("sd",PM,sep="."),PM)
output1=list(performanceMeasure)
return(output1)
} else {
"One or more invalid arguments have been inputed."
}
}
}

rm(cvErrorSVM)
detach("package:e1071", unload=TRUE)
detach("package:psych", unload=TRUE)
detach("package:MLmetrics", unload=TRUE)

```

***

##One Standard Error Rule for all the methods


```{r}

#Findin maximum values for normalizing (between 0 and 1) Multiclass


Logarithmic Loss
max_value=max(c(max(unlist(ErrorKNN3)),max(unlist(ErrorRF3)),max(unlist(ErrorS
VM3))))
min_value=min(c(min(unlist(ErrorKNN3)),min(unlist(ErrorRF3)),min(unlist(ErrorS
VM3))))

#Normalizing function
nrmlze <- function(x) {(x - min_value)/(max_value - min_value)}

#For inverting the error (the greater the better, in the case of
classification error and Multi Logarithmic Loss)
invrs<-function (x) 1-x

#Normalizing KNN errors


ErrorKNN1n=invrs(unlist(ErrorKNN1))
ErrorKNN2n=unlist(ErrorKNN2)
ErrorKNN3n=invrs(nrmlze(unlist(ErrorKNN3)))
#Normalizing random forest errors
ErrorRF1n=invrs(unlist(ErrorRF1))
ErrorRF2n=unlist(ErrorRF2)
ErrorRF3n=invrs(nrmlze(unlist(ErrorRF3)))
#Normalizing SVM errors

53
ErrorSVM1n=invrs(unlist(ErrorSVM1))
ErrorSVM2n=unlist(ErrorSVM2)
ErrorSVM3n=invrs(nrmlze(unlist(ErrorSVM3)))

#Constructing desirability functions


desir_KNN=0.5*ErrorKNN1n+0.4*ErrorKNN2n+0.1*ErrorKNN3n
desir_RF=0.5*ErrorRF1n+0.4*ErrorRF2n+0.1*ErrorRF3n
desir_SVM=0.5*ErrorSVM1n+0.4*ErrorSVM2n+0.1*ErrorSVM3n

#Constructing data frame for one standard error plot


onstderror_KNN=as.data.frame(cbind(knes,desir_KNN))
onstderror_RF=as.data.frame(cbind(tree_n_es,desir_RF))
onstderror_SVM=as.data.frame(cbind(dges,desir_SVM))

#Binding labels for data summarization


names(onstderror_KNN)=c("Number.of.Neighbors","Error")
names(onstderror_RF)=c("Number.of.Trees","Error")
names(onstderror_SVM)=c("Poly.Degree","Error")

#Using library dplyr


library(dplyr)

#Grouping data by Number.of.Neighbors and calculating the mean and standard


deviation of the desirability function
oneSTDerror=tbl_df(onstderror_KNN)
summarizedData=oneSTDerror %>% group_by(Number.of.Neighbors) %>%
summarise(meanDes = mean(Error),stdDes=sd(Error)/sqrt(n()))
#Plotting using ggplot (flipped, the lower the number of neighbors the greater
the complexity of the model)
ggplot(summarizedData, aes(x=Number.of.Neighbors, y=meanDes,
colour=Number.of.Neighbors)) +
geom_errorbar(aes(ymin=meanDes-stdDes, ymax=meanDes+stdDes), width=.1) +
geom_line() + geom_point() + scale_y_continuous(limits = c(0.8,0.83))
+scale_x_reverse(limits=c(10,0))

#Grouping data by Number.of.Trees and calculating the mean and standard


deviation of the desirability function
oneSTDerror=tbl_df(onstderror_RF)
summarizedData=oneSTDerror %>% group_by(Number.of.Trees) %>%
summarise(meanDes = mean(Error),stdDes =sd(Error)/sqrt(n()))
#Plotting using ggplot
ggplot(summarizedData, aes(x=Number.of.Trees, y=meanDes,
colour=Number.of.Trees)) +
geom_errorbar(aes(ymin=meanDes-stdDes, ymax=meanDes+stdDes), width=.1) +
geom_line() + geom_point() + scale_y_continuous(limits = c(0.82,0.89))

#Grouping data by Poly.Degree and calculating the mean and standard deviation
of the desirability function
oneSTDerror=tbl_df(onstderror_SVM)
summarizedData=oneSTDerror %>% group_by(Poly.Degree) %>%
summarise(meanDes = mean(Error),stdDes =sd(Error)/sqrt(n()))
#Plotting using ggplot
ggplot(summarizedData, aes(x=Poly.Degree, y=meanDes, colour=Poly.Degree)) +
geom_errorbar(aes(ymin=meanDes-stdDes, ymax=meanDes+stdDes), width=.1) +
geom_line() + geom_point() + scale_y_continuous(limits = c(0.65,0.8))

54
detach("package:dplyr", unload=TRUE)

```
In this part the evaluation of the constructed models in term of complexity
and choosen errors is shown using the desirability function (the greater the
value the lowest the error of the model). This will allow to choose the least
complex model with the best response. For KNN,using the one stadard error
rule, it is possible to see that the less complex model with the best response
is with N=9. In the case of random forest, the less complex model with the
best response is with ntrees=100. Finally, for SVM, the best case is for a
third degree polynomial kernel.
***

##Stacking
```{r}

#Loadining neede packages


library(kknn)
library(randomForest)
library(e1071)
library(MLmetrics)
library(psych)
library(xtable)

#Using three best models


#KNN
myKNN <- train.kknn(decision~., data = sensor.rot, ks = 9, kernel =
"rectangular")
probsKNN=predict(myKNN,newdata=sensor.rot,type="prob")

#RF
myRF=randomForest(decision~.,data=sensor.rot, ntree = 100)
probsRF=predict(myRF,newdata=sensor.rot,type="prob")

#SVM
mySVM=svm(decision~., data=sensor.rot, kernel="polynomial",degree=3, cost=0.1,
gamma=0.5,probability = TRUE)
probs1SVM=predict(mySVM,newdata=sensor.rot,probability = TRUE)
probsSVM=attr(probs1SVM, "probabilities")

#Creating a new dafa frame using the probabilities of the classes and binding
the response of the PCA rotated
stacking_data=as.data.frame(cbind(probsKNN,probsRF,probsSVM))
colnames(stacking_data)=c(1:12)
stacking_data=cbind(stacking_data,sensor.rot$decision)
colnames(stacking_data)=c(1:12,"decision")
head(stacking_data)
table1<-xtable(head(stacking_data))

#USing LDA for inference on the relations between the new predictors and the
response
myLDA=lda(decision ~., data=stacking_data)
myLDA

55
library(randomForest)
ii=1
ErrorRF1s <- vector("list", 10)
ErrorRF2s <- vector("list", 10)
ErrorRF3s <- vector("list", 10)
for (n_t in tree_n)
{
ErrorRF1s[[ii]]=cvErrorRF(stacking_data[,1:12],stacking_data[,13],PM="error",k
=10,nt=n_t,model="classification")
ii=ii+1;
}
ii=1
for (n_t in tree_n)
{
ErrorRF2s[[ii]]=cvErrorRF(stacking_data[,1:12],stacking_data[,13],PM="kappa",k
=10,nt=n_t,model="classification")
ii=ii+1;
}
ii=1
for (n_t in tree_n)
{
ErrorRF3s[[ii]]=cvErrorRF(stacking_data[,1:12],stacking_data[,13],PM="mLogLoss
",k=10,nt=n_t,model="classification")
ii=ii+1;
}
ErrorRF1sn=invrs(unlist(ErrorRF1s))
ErrorRF2sn=unlist(ErrorRF2s)
ErrorRF3sn=invrs(unlist(ErrorRF3s))

desir_RFs=0.5*ErrorRF1sn+0.4*ErrorRF2sn+0.1*ErrorRF3sn
onstderror_RFs=as.data.frame(cbind(tree_n_es,desir_RFs))
names(onstderror_RFs)=c("Number.of.Trees","Error")

library(dplyr)

oneSTDerror=tbl_df(onstderror_RFs)
summarizedData=oneSTDerror %>% group_by(Number.of.Trees) %>%
summarise(meanDes = mean(Error),stdDes =sd(Error)/sqrt(n()))

ggplot(summarizedData, aes(x=Number.of.Trees, y=meanDes,


colour=Number.of.Trees)) +
geom_errorbar(aes(ymin=meanDes-stdDes, ymax=meanDes+stdDes), width=.1) +
geom_line() + geom_point() + scale_y_continuous(limits = c(0.995,1.005))

detach("package:dplyr", unload=TRUE)
```
For the stacking process, the best performing classifiers were chosen (one for
kNN, RF and SVM), after this, the complete rotated dataset was used to train
the classifiers and obtain the probabilities for the classes. The resulting
probabilities for each one of the four classes and the original dataset
response were binded together, this new dataset was used as an input for a
random forest classifier. Additionaly, cross validation was used to test this
approach (using classification error, kappa and muticlass logarithmic loss)
with 10 folds. The result shows how the highest value for the desirabity
function is obtained with this scheme (a value of one), for this reason the

56
best results for this project were obtained with this ensemble learning
strategy.
***

APPENDIX C PARAMETER TUNNING CODE


---
title: "Tunning_WF_Robot"
author: "Rafael Batista"
date: "13 de diciembre de 2016"
output: html_document
---

```{r setup, include=FALSE}


knitr::opts_chunk$set(echo = TRUE)
```

##Tuning Algorithms
#Tuning RF

##Loading Packages
```{r, echo=F}
#library(mlbench)
57
#library(reshape)
#library(caret)
#library(PRROC)
#library(class)
library(ggplot2)
library (MASS)

#Library for Multi Log Loss error


library(MLmetrics)

#Library for kappa error


library(psych)

```

##Loading Sensors' Data


```{r, echo=F}
#Reading data file
data=read.csv(file ="C:/Users/Rafael Batista/Desktop/R
Files/sensor_readings_24.csv")

#Inspecting data and searching for missing values


summary(data)
head(data)
number.na<-sum(is.na(data))
number.na
```

#Outlier Detection
```{r}
#Library for detecting outliers by diverse methods
library(CORElearn)
#Library for detecting outliers using robus mahalanobis distance
library(mvoutlier)

dataset <- data


#Creating random forest model for outlier detection, using a distance measure
between nodes
md <- CoreModel(Decision ~ ., dataset, model="rf", rfNoTrees=50, maxThreads=1)
outliers <- rfOutliers(md, dataset)
plot(abs(outliers))
plot(md, dataset, rfGraphType="outliers")
#Identifying observations as outliers
resOut<-which(abs(outliers) >= 4.5);
data.outlier=subset(data, select = -c(Decision))

#Removing outlier and reorganizing the data


data.nout<-data.outlier[-c(resOut),]
data.nout.resp<-data[-c(resOut),]
row.names(data.nout) <- 1:nrow(data.nout)

```

##Applying LDA to reduce collinearity

58
```{r}

#Building LDA model


myLDA=lda(Decision ~., data=data.nout.resp)
myLDA
lda.values <- predict(myLDA)
dev.off()
par(mar=c(2,2,2,2))

#Stacked Histogram
ldahist(data = lda.values$x[,1], g=data.nout.resp$Decision)
ldahist(data = lda.values$x[,2], g=data.nout.resp$Decision)
ldahist(data = lda.values$x[,3], g=data.nout.resp$Decision)

#Selecting loadings
sensor.lda<-as.data.frame(lda.values$x[,1:3])
sensor.lda<-cbind(sensor.lda,data.nout.resp$Decision)
colnames(sensor.lda)[4]<-"decision"
head(sensor.lda)

```

##Applying PCA to reduce collinearity


```{r}

#Building PCA model


sensors.pca=prcomp(x=data.nout,center=TRUE,scale=TRUE)
print(sensors.pca)

#Scree Plot
plot(sensors.pca, type = "l")
summary(sensors.pca)

#Selecting first 15 principal componets and bingind response


sensor.rot<-as.data.frame(sensors.pca$x[,1:15])
sensor.rot<-cbind(sensor.rot,data.nout.resp$Decision)
colnames(sensor.rot)[16]<-"decision"
head(sensor.rot)

```

##Tunning Random Forest with LDA


```{r}
library(randomForest)
#RF
v1=c(1:5448)
v1s=sample(v1,500)
sensor.lda_s=sensor.lda[-c(v1s),]
row.names(sensor.rot_s) <- 1:nrow(sensor.lda_s)
myRF=randomForest(decision~.,data=sensor.lda_s, ntree = 200)
layout(matrix(c(1,2),nrow=1),width=c(5,2))
par(mar=c(5,4,4,0)) #No margin on the right side
plot(myRF, log="y")

59
par(mar=c(5,0,4,2)) #No margin on the left side
plot(c(0,1),type="n", axes=F, xlab="", ylab="")
legend("top", colnames(myRF$err.rate),col=1:4,cex=0.8,fill=1:4)

oob=array(0,100)
samp=array(0,100)
ii=1
yy=0
for (i in 1:100)
{
yy=yy+10
myRF=randomForest(decision~.,data=sensor.lda_s, ntree = 200, sampsize=c(1950-
yy, 1900-yy, 250, 700))
oob[ii]=myRF$err.rate[200,1]
samp[ii]=yy
ii=ii+1
}
min(oob)
weight=samp[which(oob==min(oob))]
weight
dev.off()
par(mfrow = c(2, 1))
par(mar=c(4,4,2,2))
plot(x=(1950+10)-samp,y=oob,ylab="OOB Error",xlab="# Samples per
Class",main="First Majoritarian Class")
abline(h=min(oob))
plot(x=(1900+10)-samp,y=oob,ylab="OOB Error",xlab="# Samples per
Class",main="Second Majoritarian Class")
abline(h=min(oob))
1900-samp[which(oob==min(oob))]
1950-samp[which(oob==min(oob))]
```

##Tunning Random Forest with PCA


```{r}

#RF
sensor.rot_s=sensor.rot[-c(v1s),]
myRF=randomForest(decision~.,data=sensor.rot_s, ntree = 200,
classwt=c(1,1,1,1))
layout(matrix(c(1,2),nrow=1),width=c(5,2))
par(mar=c(5,4,4,0)) #No margin on the right side
plot(myRF, log="y")
par(mar=c(5,0,4,2)) #No margin on the left side
plot(c(0,1),type="n", axes=F, xlab="", ylab="")
legend("top", colnames(myRF$err.rate),col=1:4,cex=0.8,fill=1:4)

oob=array(0,100)
samp=array(0,100)
ii=1
yy=0
for (i in 1:100)
{
yy=10+yy

60
myRF=randomForest(decision~.,data=sensor.rot_s, ntree = 200, sampsize=c(1940-
yy, 1900-yy, 250, 700))
oob[ii]=myRF$err.rate[200,1]
samp[ii]=yy
ii=ii+1
}
min(oob)
weight=samp[which(oob==min(oob))]
weight
dev.off()
par(mfrow = c(2, 1))
par(mar=c(4,4,2,2))
plot(x=(1950+10)-samp,y=oob,ylab="OOB Error",xlab="# Samples per
Class",main="First Majoritarian Class")
abline(h=min(oob))
plot(x=(1900+10)-samp,y=oob,ylab="OOB Error",xlab="# Samples per
Class",main="Second Majoritarian Class")
abline(h=min(oob))
1950-samp[which(oob==min(oob))]
1900-samp[which(oob==min(oob))]
```

##Tunning SVM with LDA


```{r}
library("e1071")
#Tuning SVM
#Takes long time, it was run and for radial and polynomial, the algorithm had
convergency problems for polynomial
#for this reason radial was chosen.
x=sensor.lda[,1:3]
y=sensor.lda[,4]
svm_tune1 <- tune(svm, train.x=x,train.y=y,kernel="radial",
ranges=list(cost=10^(-1:3), gamma=c(0.5,1,2)))
print(svm_tune1)
```

##Tunning SVM with PCA


```{r}
#Tuning SVM
#Takes long time, it was run and the results were cost=0.01 gamma=1 with a
polynomial kernel
#Also a radial kernel was tested but the lowest classification error was for
the polynomial, with degree=3
x=sensor.rot[,1:15]
y=sensor.rot[,16]
svm_tune2 <- tune(svm, train.x=x,train.y=y,kernel="polynomial",
ranges=list(cost=10^(-2:2), gamma=c(0.5,1,2)))
print(svm_tune2)

#Tuning SVM
#Takes long time, it was run and the results were cost=0.01 gamma=1 with a
polynomial kernel
#Also a radial kernel was tested but the lowest classification error was for
the polynomial, with degree=3
x=sensor.rot[,1:15]

61
y=sensor.rot[,16]
svm_tune3 <- tune(svm, train.x=x,train.y=y,kernel="radial",
ranges=list(cost=10^(-2:3), gamma=c(0.5,1,2,3,4)))
print(svm_tune3)

```

62

Vous aimerez peut-être aussi