Académique Documents
Professionnel Documents
Culture Documents
II. INTRODlCTlON
There are eight main attributes in dataset. Each attribute except
The paper focuses to predict possible number of heart attacks
disease are the main causes for the Heart Disease. Each cause is
patients from the dataset using data mining techniques and categorized into some predefined measures. These measures
determines which model gives the highest percentage of are categorized for the making result efficient [8][11].
correct predictions for the diagnoses.
IV. RESULT
A. Final Result
When algorithm is applied to dataset, then the result is
;; produced, i.e. shown in figure 3.1. It consist the information of
dataset analysis such as information about total instances,
classified and unclassified instances, classification accuracy
Figure3. Rest blood Pressure Figure 4. Blood_sugar measures, detailed accuracy measures and confusion matrix
etc.
213
This above figurellshows the detailed accuracy and confusion
I exercice_angina=(no) matrix produced on the basis of classes in dataset. The detailed
I I age=(Avg) I(Min) accuracy is measured by some major measures like TP rate
I I I max_hearUate=(A) I(B): negative(l7.0/l0.0) (True Positive), FP (False Positive) rate, Precision, Recall, F-
I I I max_hearUate!=(A)I(B) Measure etc. These measures are calculated through the
I I I I rest_bpress=(Poor): negative(3.0/ 1.0) confusion matrix in figure 12.
I I I I rest_bpress!=(Poor): positive(7.0/1.0)
I I age!=(Avg) I(Min): positive(7.0/2.0)
I exercice_ angina!=(no): positive(54.0/6.0) TP rate = TP/ (TP+FN) ex:- (70) / (70+22) = 0.762 =>
Number of Leaf Nodes: 6 Positive
Size of the Tree: II (97) / (20+97) = 0.829=>
Negative
Time taken to build model: 0.08 seconds FP rate = FP/ (FP+ TN) ex:- (20) / (20+97) = 0.171
=== Stratified cross-validation === =>Positive
=== Summary === (22) / (22+70) = 0.239
Correctly Classified Instances 167 79.9043 % =>Negative
Incorrectly Classified Instances 42 20.0957 %
Precision = TP/ (TP+FP) ex:- (70) / (70+22)=0.778 =>
Kappa statistic 0.5913
Mean absolute error 0.2779 Positive
Root mean squared error 0.3869 (22) / (22+97)=0.815 =>
Relative absolute error 56.3624 % Negative
Root relative squared error 77.9345 % Recall = TP / (TP+ FN) ex :- (70) / (70+22) = 0.762 =>
Total Number ofInstances 209 Positive
(97) / (20+97) = 0.829=>
=== Detailed Accuracy By Class === Negative
TP Rate FP Rate Precision Recall F-Measure ROC Area F-Measure = (2*recal1*precision)/(recall + precision)
Class
Ex :- (2* 0.762*0.778) / (0.762 + 0.778) = 0.769 => Positive
0.761 0.171 0.778 0.761 0.769 0.838 positive
0.829 0.239 0.815 0.829 0.822 0.838 negative (2* 0.829*0.815) / (0.829 + 0.815) = 0.822 =>
Weighted Avg. 0.799 0.209 0.799 0.799 0.799 Negative
0.838 *Note
=== Confusion Matrix === • TP=70 * Recall= TP rate FN=22 FP=20 TN=97
a b < -- classified as
70 22 I a =positive
20 97 I b =negative
Figurel2. Rest blood Pressure
214
In above figure13 there are some important measures that E. Rules Interpretation
measure the correctly and incorrectly classified instance in By applying the CART algorithms on the dataset, some rules
dataset called errors. Errors are calculated by the predefined
are generated that are helpful to predict the correct cause of
formulas which are discussed previously.
heart disease.
C. Detailed Accuracy by Class
The detailed accuracy measures like TP rate, FP rate, Precision, JRIP rules:
Recall, F-Measure and ROC area are estimated by class i.e. .._--_ ........... -
-----------
a b (-- classified as
....
70 22 a = positive
'
20 97 b = negative
215
In figure17 the root node is 'exercice_angina' and it' s child [I] S. Aruna, Dr S.P. Rajagopalan and L. V. Nandakishore, "AnEmpirical
Comparison Of Supervised learning algorithms in Disease ion", VoU ,
node is 'chest~ain' . If any person has ' exercice_angina' August 2011.
value is 'yes' then it comes into the 'positive' category. [2] Leonard Gordon, Using Classification and Regression Trees (CART) in
(72.0112.0) means 72 instances followed this rule and 12 are SAS Enterprise Miner For Applications in Public Health.Paper 089-
not. And same situation with value 'no', and then check the 2013,2013
condition for 'chest~ain' attribute. This figure17 shows the [3] Vikas Chaurasia, Saurabh Pal, "Early Prediction of Heart DiseasesUsing
decision tree build by the classifier, it includes the root nodes, Data Mining Techniques", Vol. I ,2013.
child nodes and leaf nodes with their possible/predicted [4] Thenmozhi, P. Deepika, M.Meiyappasamy, "Different DataMining
Techniques Involved in Heart Disease Prediction: A Survey", Volume 3,
values. For example there are number of leaf nodes are 6 and ISSN No. 2277-8179, September 2014.
the size of tree is 11. In this algorithm it implemented the [5] Sushilkumar Kalmegh," Analysis of WEKA Data MiningAlgorithm
Simple CART algorithm practically. Analysis on the REPTree, Simple Cart and RandomTree for Classification of Indian
"Disease" attribute and generate a confusion matrix and News" , ,Vol. 2 Issue 2, February 2015.
measured all essential measures like TP rate, FP rate, [6] T.Miranda Lakshmi, A.Martin, R.Mumtaj Begum, Dr.V.Prasanna
Venkatesan, "An Analysis on Performance of Decision Tree Algorithms
Precision and Recall etc. In this way it can also analyze other
using Student's Qualitative Data", U.Modern Education and Computer
attributes, which may helpful to solve more complex situations Science, 2013,5,18-27.
or queries in the prediction of Heart attack diseases. [7] Jyoti Rohilla, Preeti Gulia, "Analysis of Data Mining Techniquesfor
Diagnosing Heart Disease", Volume 5, Issue 7,July2015.
CONCLUSION [8] Hlaudi Daniel Masethe , Mosima Anna Masethe, " Prediction oft-leart
The research undertook an experiment on application of mining Disease using Classification Algorithms", Vol II WCECS 2014, 22-
240ctober,2014,SanFrancisco,USA.
algorithm (Simple CART) in order to predict the heart attacks
and to compare the best available method of prediction. The [9] Nidhi BhatIa, Kiran Jyoti, " An Analysis of Heart DiseasePrediction
using Different Data Mining Techniques" International1ournal of
experiment can serve as an important tool for physicians to Engineering and Technology VoU issue 8 2012.
predict risky cases in the practice and advise accordingly. The [10] Chaitrali S. Dangare and Sulabha S. Apte, " Improved Study Oft-feart
model from the classification will be able to answer more Disease Prediction Using Data Mining Classification Techniques",
complex queries in the prediction of heart attack diseases. The International Journal Of Computer Applications, Vol. 47, No. 10, pp.
predictive accuracy determined by SIMPLE CART algorithm 0975-888,2012.
suggests that parameters used are reliable indicators to predict [II] Atul Kumar Pandey, Prabhat Pandey, K.L. Jaiswal and AshokKumar
the presence of heart diseases. Sen, " A Heart Disease Prediction Model using Decision Tree", IOSR
Journal of Computer Engineerin g, Vol. 12, Issue.6, (Jul. - Aug. 2013),
pp. 83 - 86.
REFERENCES [12] Tina R. Pati!, Mrs. S.S. Sherekar, " Performance Analysis of NaIve
Bayes and J48 Classification algorithm for Data Classification" ,
International Journal Of Computer Science and Applications, Vol. 6,
No.2, Apr2013.
216