Académique Documents
Professionnel Documents
Culture Documents
library(class)
library(MASS)
library(ISLR)
library(car)
library(leaps)
#The first and second column of the data are names, so to get the numeric data matrix, delete the first
column
set.seed(100)
k.max = 10
data = pharma.scaled
wss
install.packages("tree")
install.packages("ISLR")
library(tree)
library(ISLR)
attach(heart)
tree.hearts=tree(hd~.,heart)
tree.hearts
summary(tree.hearts)
plot(tree.hearts)
plot(tree.hearts, uniform=TRUE,margin=0.2)
2b. For Pruning, we first divide the data into test and validation.( We have assumed 300 samples for
testing and 162 samples for validation)
set.seed(2)
train=sample(1:nrow(heart), 300)
test=heart[-train,]
tree.hearts=tree(hd~.,heart,subset=train)
tree.pred=predict(tree.hearts,test,type="class")
table(tree.pred,test)
The cv.tree() function reports number of terminal nodes of each tree considered ('size'),
set.seed(3)
cv.heart=cv.tree(tree.hearts,FUN=prune.misclass)
names(cv.heart)
cv.heart
plot(cv.heart$size,cv.heart$dev,type="b")
Finally, we apply prune.misclass() function to prune the tree to obtain the best tree
prune.hearts=prune.misclass(tree.hearts,best=9)
plot(prune.hearts)
text(prune.hearts,pretty=0)
We then test the performance of the regression tree on the test data.
tree.pred=predict(prune.hearts,test,type="class")
table(tree.pred,High.test)
3 b. 1.The given boxplot shows price of different types of fuel types. The bottom most line of the box
plot (rectangle) shows the value for the first quartile, the middle line shows the value for the second
quartile (also the mean value) and the top most line shows the value for the third quartile.
The bottom most line of the box plot (not a part of the box) indicates the lowest value while the top
most line of the box plot (not a part of the box) indicates the highest value of the parameter.
2.From the given boxplot we can see the value of the first quartile is almost on similar for CNG and
Diesel and is slightly higher for petrol which shows that the price of the first quartile value of Petrol is
slightly higher than that of CNG and Diesel.
3. Similarly, the 2nd quartile value is higher for Petrol as compared to CNG and Diesel which shows that
the mean price of Petrol is slightly higher than that of CNG and Diesel.
4. On similar lines, as the 3rd quartile value is higher for Diesel as compared to CNG and Petrol it shows
that the price of 3rd quartile value of Petrol is slightly higher than that of CNG and Petrol.
5. Diesel has the lowest price and also the highest price among all the types of fuels, however its mean
price is lower than that of petrol.