Académique Documents
Professionnel Documents
Culture Documents
Task 1 Establish baseline scores for classification accuracy, F-score and classification time (testing)
Record the classification accuracy, overall F-score and the classification time (i.e. Time taken to test
model on supplied test set)
Classification accuracy :
Correctly Classified Instances 470 89.8662 %
Overall F-score :
0.871
Classification time (i.e. Time taken to test model on supplied test set):
Time taken to test model on supplied test set: 0.05 seconds*
Task 2 Perform feature selection and investigate effects on accuracy and model building time
(a) Record the classification accuracy, overall F-score and the classification time values with the use
of the CorrelationAttributeEval filter.
Classification accuracy:
Correctly Classified Instances 479 91.587 %
Overall F-score:
0.897
Classification time values:
Time taken to test model on supplied test set: 0.05 seconds*
(b) Record the classification accuracy, overall F-score and the classification time values with the use
of the CfsSubsetEval filter.
Classification accuracy:
Correctly Classified Instances 475 90.8222 %
Overall F-score:
0.886
Classification time:
Time taken to test model on supplied test set: 0.05 seconds*
(c) Record the classification accuracy, overall F-score and the classification time values with the use
of the InfoGainAttributeEval filter.
Classification accuracy:
Correctly Classified Instances 475 90.8222 %
Overall F-score:
0.898
Classification time :
Time taken to test model on supplied test set: 0.04 seconds*
TraindataSecom<-read.arff("C:/Users/rpears/Desktop/Secom/SecomTrain.arff")
colnames(TraindataSecom)[colnames(TraindataSecom)=="591"] <- "class" # set the target attribute name to "class" in the
training file
TestdataSecom<-read.arff("C:/Users/rpears/Desktop/Secom/SecomTest.arff")
colnames(TestdataSecom)[colnames(TestdataSecom)=="591"] <- "class" # set the target attribute name to "class" in the testing
file
actual<-TestdataSecom[, 591] # get the class values from the test file
A<- InfoGainAttributeEval(class ~ . , data = TraindataSecom,na.action=NULL ) # rank features by their information gain score
ranked_list<- A[order(A)] # sorting in ascending order
A[order(-A)] # print the features with the highest information gain together with their corresponding gain values
classifier <- J48(class ~ ., data = TraindataSecom1 , na.action=NULL) # build the model on the reduced training dataset (the
version with 50 attributes)
TestdataSecom1<- TestdataSecom[, !names(TraindataSecom) %in% cols.dont.want, drop = T] # drop low ranked features drop
low ranked features from the test dataset
pred<-predict(classifier,TestdataSecom1, na.action=NULL,seed=1) # deploy the new version of the dataset on the test dataset to
make predictions
P11<-0
P12<-0
P21<-0
P22<-0
for ( K in seq(1,523))
{
if(actual[K]==-1){
if(pred[K]==-1){
P11<-P11+1
}
else
{
P12<-P12+1
}
}
else if (actual[K]==1){
if(pred[K]==1){
P22<-P22+1
}
else
{
P21<-P21+1
}
}
}
Prec_1<-(P11/(P11+P21))
Prec_2<-(P22/(P22+P12))
Recall_1<-(P11/(P11+P12))
Recall_2<-(P22/(P22+P21))
F_1<-(2*Prec_1*Recall_1)/(Prec_1+Recall_1)
F_2<-(2*Prec_2*Recall_2)/(Prec_2+Recall_2)
F_overall<-(F_1*462+F_2*61)/523
paste("This is the F overall score",F_overall)
t<-system.time(predict(classifier,TestdataSecom1, na.action=NULL,seed=1)) # return elapsed cpu time
paste("This is the total classification time", t[[1]])
(a) Run the above code paste the overall F score and classification time in your lab report
(b) Now use a for loop and perform feature selection with K values in the range [10, 50] in intervals
of 5. You will need to lookup R help on using a for loop with increments. Paste your code in your
submission.
Code:
P11<-0
P12<-0
P21<-0
P22<-0
for ( K in seq(1,523))
{
if(actual[K]==-1)
{
if(pred[K]==-1)
{
P11<-P11+1
}
else
{
P12<-P12+1
}
}
else if (actual[K]==1)
{
if(pred[K]==1)
{
P22<-P22+1
}
else
{
P21<-P21+1
}
}
}
Output:
0.7679383
Number of features selected: 45
This is the F overall score: 0.9067615
This is the total classification time: 0.04 *