Académique Documents
Professionnel Documents
Culture Documents
file:///D:/KOMAL/SIMPLILEARN/MY%20COURSES/IN%20PROGRESS/My%20Codes_ML_DS/pdf%20conversion/htmls/komal_RF_roc_auc_HypTu… 1/17
9/21/2018 Untitled35
In [95]: df_train.head()
Out[95]:
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket F
Braund,
0 1 0 3 Mr. Owen male 22.0 1 0 A/5 21171 7.250
Harris
Cumings,
Mrs. John
Bradley
1 2 1 1 female 38.0 1 0 PC 17599 71.28
(Florence
Briggs
Th...
Heikkinen,
STON/O2.
2 3 1 3 Miss. female 26.0 0 0 7.925
3101282
Laina
Futrelle,
Mrs.
Jacques
3 4 1 1 female 35.0 1 0 113803 53.10
Heath
(Lily May
Peel)
Allen, Mr.
4 5 0 3 William male 35.0 0 0 373450 8.050
Henry
In [114]: # Cleaning
# We will remove ‘Cabin’, ‘Name’ and ‘Ticket’ columns
Out[114]:
PassengerId Survived Pclass Sex Age SibSp Parch Fare Embarked
file:///D:/KOMAL/SIMPLILEARN/MY%20COURSES/IN%20PROGRESS/My%20Codes_ML_DS/pdf%20conversion/htmls/komal_RF_roc_auc_HypTu… 2/17
9/21/2018 Untitled35
Out[115]: PassengerId 0
Survived 0
Pclass 0
Sex 0
Age 177
SibSp 0
Parch 0
Fare 0
Embarked 2
dtype: int64
In [119]: df_train_dropped.isnull().sum()
Out[119]: PassengerId 0
Survived 0
Pclass 0
Sex 0
Age 0
SibSp 0
Parch 0
Fare 0
Embarked 0
dtype: int64
df_train_dropped['Embarked'].value_counts()
Out[120]: S 646
C 168
Q 77
Name: Embarked, dtype: int64
file:///D:/KOMAL/SIMPLILEARN/MY%20COURSES/IN%20PROGRESS/My%20Codes_ML_DS/pdf%20conversion/htmls/komal_RF_roc_auc_HypTu… 3/17
9/21/2018 Untitled35
In [121]: df_train_dropped.head()
Out[121]:
PassengerId Survived Pclass Sex Age SibSp Parch Fare Embarked
In [122]: df_train_dropped['Pclass'].value_counts()
Out[122]: 3 491
1 216
2 184
Name: Pclass, dtype: int64
Out[124]:
PassengerId Survived Age SibSp Parch Fare Pclass_1 Pclass_2 Pclass_3 Em
0 1 0 22.0 1 0 7.2500 0 0 1 0
1 2 1 38.0 1 0 71.2833 1 0 0 1
2 3 1 26.0 0 0 7.9250 0 0 1 0
3 4 1 35.0 1 0 53.1000 1 0 0 0
4 5 0 35.0 0 0 8.0500 0 0 1 0
clf = RandomForestClassifier(random_state=42)
file:///D:/KOMAL/SIMPLILEARN/MY%20COURSES/IN%20PROGRESS/My%20Codes_ML_DS/pdf%20conversion/htmls/komal_RF_roc_auc_HypTu… 4/17
9/21/2018 Untitled35
In [128]: # train the algorithm utilizing the training and target class
clf.fit(X_train, y_train)
importances = list(clf.feature_importances_)
In [131]: importances
Out[131]: [0.17704930921912193,
0.1701609513162492,
0.028821370222286914,
0.03565150653155073,
0.1973100861519638,
0.019865619648430373,
0.016223643943662054,
0.05675591752670881,
0.014813114457404308,
0.0058535692652198195,
0.0158160841729409,
0.11751209931796955,
0.14416672822649168]
In [132]: df_train_dummied.columns
file:///D:/KOMAL/SIMPLILEARN/MY%20COURSES/IN%20PROGRESS/My%20Codes_ML_DS/pdf%20conversion/htmls/komal_RF_roc_auc_HypTu… 5/17
9/21/2018 Untitled35
file:///D:/KOMAL/SIMPLILEARN/MY%20COURSES/IN%20PROGRESS/My%20Codes_ML_DS/pdf%20conversion/htmls/komal_RF_roc_auc_HypTu… 6/17
9/21/2018 Untitled35
Out[142]:
Feature Importance
0 Parch 0.20
1 PassengerId 0.18
2 Survived 0.17
3 Sex_female 0.14
4 Embarked_S 0.12
5 Pclass_2 0.06
6 SibSp 0.04
7 Age 0.03
8 Fare 0.02
9 Pclass_1 0.02
10 Embarked_Q 0.02
11 Pclass_3 0.01
12 Embarked_C 0.01
file:///D:/KOMAL/SIMPLILEARN/MY%20COURSES/IN%20PROGRESS/My%20Codes_ML_DS/pdf%20conversion/htmls/komal_RF_roc_auc_HypTu… 7/17
9/21/2018 Untitled35
In [141]: ax = df_feature_importance.plot(kind='bar',
x='feature',
y='importance',
figsize=(10,8),
title= 'Feature importances for Random Forest Model',
grid=True,
legend=True,
fontsize = 12,
color='orange',
);
file:///D:/KOMAL/SIMPLILEARN/MY%20COURSES/IN%20PROGRESS/My%20Codes_ML_DS/pdf%20conversion/htmls/komal_RF_roc_auc_HypTu… 8/17
9/21/2018 Untitled35
In [152]: # comparing actual response values (y_test) with predicted response values (y_
pred)
print("model accuracy:", metrics.accuracy_score(y_test, y_pred)* 100)
file:///D:/KOMAL/SIMPLILEARN/MY%20COURSES/IN%20PROGRESS/My%20Codes_ML_DS/pdf%20conversion/htmls/komal_RF_roc_auc_HypTu… 9/17
9/21/2018 Untitled35
In [153]: # We grab the second array from the output which corresponds to
# to the predicted probabilites of positive classes
# Ordered wrt fit.classes_ in our case [0, 1] where 1 is our positive class
predictions_prob = clf.predict_proba(X_test)[:, 1]
predictions_prob
Out[153]: array([0.4, 0. , 0.2, 1. , 0.1, 1. , 0.9, 0. , 0.9, 0.7, 0.4, 0.1, 0.2,
0. , 0. , 0.9, 0.6, 0.9, 0.1, 0.1, 0.3, 0.7, 0.2, 0.1, 0.1, 0.1,
0.2, 0. , 0.2, 0.7, 0. , 0.7, 0.6, 0.3, 0.4, 0.5, 0.1, 0.8, 1. ,
0. , 0. , 0.1, 0.2, 0.2, 0.4, 0. , 0.3, 0. , 0.3, 0.6, 1. , 1. ,
0.1, 0.4, 0. , 0.7, 0.1, 0.9, 0.9, 0.9, 0.3, 1. , 1. , 0.1, 0. ,
1. , 0.1, 0.4, 0.4, 0.9, 1. , 1. , 0.9, 0.9, 0. , 0. , 1. , 1. ,
1. , 0.4, 0.1, 1. , 1. , 0. , 0.1, 0.3, 1. , 1. , 0.1, 0.1, 0.3,
0.1, 0.4, 0. , 0. , 0.2, 0.5, 0.1, 1. , 0.1, 0.2, 0.1, 1. , 0. ,
0.4, 0.5, 1. , 0.1, 0.1, 0. , 1. , 0.1, 1. , 0.7, 0.3, 0. , 0.3,
0.4, 1. , 0. , 0.3, 1. , 1. , 0.5, 0.1, 0.5, 1. , 0.6, 0.2, 0. ,
0.9, 0.2, 0. , 0.7, 1. , 0.2, 1. , 0.2, 0. , 0. , 0.1, 1. , 0. ,
0.3, 0.3, 1. , 0. , 0.6, 1. , 0.2, 0.1, 0.2, 0.1, 0.7, 0. , 0.1,
0.7, 1. , 1. , 0.5, 0.3, 0.5, 0.1, 1. , 0.1, 0.4, 0. , 1. , 0. ,
0.1, 0.6, 1. , 0.7, 0.7, 0.2, 0. , 0.2, 1. , 0.6, 0.9, 0. , 0.6,
0.1, 0.2, 0.5, 0.6, 0.2, 0.2, 0. , 1. , 0.1, 0. , 0.1, 0. , 1. ,
1. , 1. , 0. , 1. , 0.1, 0.1, 0.3, 1. , 0. , 0.2, 0.6, 0.2, 0.3,
0.3, 0.1, 0.5, 0.1, 0.9, 0. , 0.2, 0.4, 1. , 0.4, 0.9, 0.3, 0. ,
1. , 0.1])
A low AUC might say that you are not using the best metric for the problem at hand.
It could also mean overfitting but this is hard to tell if you don’t specify on which type of dataset you are getting
this low value.
Out[156]: 0.8541002850913971
file:///D:/KOMAL/SIMPLILEARN/MY%20COURSES/IN%20PROGRESS/My%20Codes_ML_DS/pdf%20conversion/htmls/komal_RF_roc_auc_HypT… 10/17
9/21/2018 Untitled35
HYPERPARAMETER TUNING
n_estimators
train_results = []
test_results = []
file:///D:/KOMAL/SIMPLILEARN/MY%20COURSES/IN%20PROGRESS/My%20Codes_ML_DS/pdf%20conversion/htmls/komal_RF_roc_auc_HypT… 11/17
9/21/2018 Untitled35
plt.ylabel('AUC score')
plt.xlabel('n_estimators')
plt.legend();
max_depth
In [171]: # max_depth represents the depth of each tree in the forest. The deeper the tr
ee,
# the more splits it has and it captures more information about the data. We f
it each decision
# tree with depths ranging from 1 to 32 and plot the training and test errors.
file:///D:/KOMAL/SIMPLILEARN/MY%20COURSES/IN%20PROGRESS/My%20Codes_ML_DS/pdf%20conversion/htmls/komal_RF_roc_auc_HypT… 12/17
9/21/2018 Untitled35
train_results = []
test_results = []
file:///D:/KOMAL/SIMPLILEARN/MY%20COURSES/IN%20PROGRESS/My%20Codes_ML_DS/pdf%20conversion/htmls/komal_RF_roc_auc_HypT… 13/17
9/21/2018 Untitled35
plt.ylabel('AUC score')
plt.xlabel('Tree depth')
plt.legend();
min_samples_split
In [174]: # represents the minimum number of samples required to split an internal node.
# This can vary between considering at least one sample at each node to
# considering all of the samples at each node. When we increase this paramete
r, each tree
# in the forest becomes more constrained as it has to consider more samples at
each node.
# Here we will vary the parameter from 10% to 100% of the samples
file:///D:/KOMAL/SIMPLILEARN/MY%20COURSES/IN%20PROGRESS/My%20Codes_ML_DS/pdf%20conversion/htmls/komal_RF_roc_auc_HypT… 14/17
9/21/2018 Untitled35
train_results = []
test_results = []
plt.ylabel('AUC score')
plt.xlabel('Tree depth')
plt.legend();
file:///D:/KOMAL/SIMPLILEARN/MY%20COURSES/IN%20PROGRESS/My%20Codes_ML_DS/pdf%20conversion/htmls/komal_RF_roc_auc_HypT… 15/17
9/21/2018 Untitled35
In [177]: # We can clearly see that when we require all of the samples at each node, the
model cannot
# learn enough about the data.
# This is an underfitting case.
In [178]: list(range(1,X_train.shape[1]))
max_features
In [179]: # represents the number of features to consider when looking for the best spli
t.
max_features = list(range(1,X_train.shape[1]))
train_results = []
test_results = []
file:///D:/KOMAL/SIMPLILEARN/MY%20COURSES/IN%20PROGRESS/My%20Codes_ML_DS/pdf%20conversion/htmls/komal_RF_roc_auc_HypT… 16/17
9/21/2018 Untitled35
plt.ylabel('AUC score')
plt.xlabel('Tree depth')
plt.legend();
file:///D:/KOMAL/SIMPLILEARN/MY%20COURSES/IN%20PROGRESS/My%20Codes_ML_DS/pdf%20conversion/htmls/komal_RF_roc_auc_HypT… 17/17