Académique Documents
Professionnel Documents
Culture Documents
ABSTRACT
Data mining is the process of automating information discovery. ANN is widely used data mining method to extract
pattern. Classification is one of the important data mining techniques for classifying given set of input data. In this
experiment classification of heart disease dataset is done with the use of Cleveland Heart Disease Dataset.
Classification is carried out using neural network classifier MLP .In this experiment performance measures are
compared with chosen optimal parameter of MLP neural network, when it is trained and tested over cross
validation, the training percentage of 980.5 %, testing percentage of 981.5% and 97 1.2% overall accuracy,
sensitivity 950.5%,specificity 100% are achieved. It shows the consistent performance of MLP neural network as
compare to other models. In this work heart disease dataset is classified using 13 input attributes as well as by using
16 inputs attributes. The accuracy difference between 13 attributes and 16 attributes in training dada is 1.67 % and
in testing data is 3.7% and in overall accuracy is.1.47%.The results obtained in this experiment shows the efficiency
and accuracy of MLP NN.
Keywords: Heart Disease dataset, MLP, Neural Network, Back-Propagation Algorithm, Classification, PE, Knowledge Data
Discovery
1. INTRODUCTION
Data mining is an important step in discovery of knowledge from large dataset. In recent years data mining has found
its significance in every field including healthcare [1]. A major challenge for healthcare organizations is the provision
of quality services at affordable costs. Services imply diagnosing patients correctly and administrating effective
treatment [2]. Medical data comprises number of tests essential to diagnose to a particular disease. Integration of
clinical decision support with computer-based patient records could reduce medical errors, increase patient safety,
decrease unwanted practice variation, and improve patient outcome [15]. Global burden of disease estimates for 2001
by World Bank Country Groups shows severity statistics indicated in year 2001 is 25.2 % for India and from literature
survey now it has increased to 46% [3]. Effective and efficient automated heart disease classification systems can be
beneficial in healthcare sector for heart disease classification.
The aim of our work is to introduce a classification approach using Multilayer Perceptron (MLP) with Back
Propagation learning algorithm with heart disease dataset. Classification is one of the important techniques of data
mining [5]. Classification is the processing of finding a set of models or functions which describe and distinguish data
classes or concepts. In classification, inputs are given a set of data, called training set, where each record consists of
several fields or attributes. One of the attributes, called the classifying attribute, indicate the class to which each dataset
belong. The aim of the classification is to build a model of the classifying attribute based upon the other attributes
which are not from the training dataset [4].Artificial neural network is widely used technique for extraction of patterns
in data mining. ANN has some advantages such as it automatically allow arbitrary nonlinear relations between the
independent and dependent variables, and allows all possible interactions between the dependent variables, due to
above said advantages of ANN the use of neural network technique is adopted for the classification of dataset [5].
Parallel processing approach is implemented at each node to increase the efficiency of classification.
Page 426
Number
instances
Cleveland
303
Hungarian
Switzerland
Long Beach V.A
294
123
200
of
The goal or output field is a five bit value which will represent five different classes as class 0-normal person, class 1first stroke, class 2- second stroke and class 3- third stroke and class 4-end of life.
Description
Range
Cig-per-day
Value
Continuous
Yrs
Family history
Value
1=True,0=Fals
e
Continuous
1,0
Page 427
Start
No
Are Test
Result OK
Preprocessing
Yes
Trained and Verified ANN
Predicted Result
Stop
Stop
Page 428
Figure 2(a): Training graph converged at 704 epochs. X axis=Number of epoch, Y axis=Error difference
Figure 2(b): Error graph converged at 704 epochs. X axis=Number of instances, Y axis=Scaled output value and Error
difference
4.2.4 Verification
Once a convergence is achieved the ANN is declared to be trained and its verification is initiated which normally is
similar to the verification carried out during training by comparing the predicted outputs of the ANN with the actual
ones, only difference being the dataset used this time is different from the one used in training. Once this verification
results match then the ANN is declared as trained and verified for application purpose. Periodic verification of ANN
and retraining if verification fails is a normal process with the ANNs.
4.2.5 Testing
Once an ANN is declared to be trained and verified it is usable for application to the classification problem. In this
phase it is provided with new users heart disease data and asked to classify. The results are used as correctly generated.
4.3 Architecture for the classification of heart Disease Dataset
The architecture for the classification of heart disease dataset is as shown in figure 3. Initially Cleveland database (76
attributes) and its subset of (16) have been acquired and a Database structure for the system is being set into place for
the loading of the Database as well as Help presentation on the database. A scalable approach is used with the use of
Database module which uses two scripts labeled as Database Info and Database Load. The first one provides the
information about the Database features/attributes and their naming, the second one is provided for loading the
Page 429
Database Modules
(Load& Info.)
Preprocessing Modules
Training Module
Verification Module
Application Module
Training result
Verification
Classification Result
Heart
Database
Figure 3: Architecture of the System for classification of Heart Disease dataset using MLP [5]
Training Module:-ANN is trained by using MLP with Back-propagation learning algorithm.
Training Result: - Are predicted results, obtained by summing the results of inputs with adjusted weights.
Verification Module:-In this module predicted output of ANN is compared with actual output.
Verification Result: - Once this verification results match then the ANN is declared as trained and verified for
application purpose.
Application Module: - Once an ANN is declared to be trained and verified weights from input to hidden layer and
hidden to output layer are stored and reloaded for application to the classification problem. In this phase it is provided
with new patients heart disease data as an input and display result as a class to which patient belong.
Classification Result: - For inputs of any new patients heart disease dataset, it provides results such as whether the
patient is a healthy person or if not then to which class it belongs. If input is given in the form of file containing patient
records then classification result is the form of confusion matrix.
Range
Optimal Values
10% to 90 %
90%
10% to 90 %
10%
1000 to 10000
Number
layer
Number
of
Hidden
1 to 3
Class 0-704
Class 1-954
Class 2-604
Class 3-1742
Class 4 -689
1
of
hidden
2 to 100
60
Page 430
Tanh
0 to 0.9
0.1
Tanh
Step
0.1
%
Classification
Accuracy
90%
10%
Trainin
Testing
g Data
Data
96.69%
96.29%
%
Sensitivit
y
%
Specificit
y
Error
Limit
92.56%
100%
0.000
1
98.367%
95.86%
100%
0.1
100%
From above table it implies that MLP NN as a classifier in this work possesses more learning ability compare to
previously implemented techniques. The most important criterion in this work is to what extent the MLP NN classifier
is able to correctly classify the exemplars [16]. To confirm whether the proposed model is really consistently capable of
more accurate classification, different data partition sets are used to train the network. As per the confusion matrices it
was found that the MLP Neural classifier has the advantage of reducing misclassifications among the neighborhood
classes as compare to other NN classifiers [13].
13input 2 output
Performance % Accuracy,
% Sensitivity, % Specificity,
error limit
Accuracy 94% ,Error Rate 0.1
References
[11]
Page 431
From the performance comparison of proposed technique with others on same dataset as shown in above table 5, it is
proved that the proposed MLP NN classifier with 16 input attributes clearly outperforms earlier researchers
techniques. Previous related research analysis for heart diseases dataset shows report 94 % accuracy .With selected
parameters of MLP NN, when it is trained several times and tested over cross validation, then overall accuracy 98.16%,
sensitivity 95.86% and 100 % specificity are achieved which shows consistent performance than other neural network.
7. CONCLUSION
Proposed neural network method proved to be reliable for diagnosis of angina in patients with heart disease. Additional
studies with larger number of patients are required to improve accuracy and usefulness of artificial neural network. It is
observed that MLP NN is fastest, simple in design, lowest MSE and highest accuracy.
As per wide range of applicability of ANN, neural networks are well suited to solve complex problems due to their
ability to learn complex and nonlinear relationships including noisy or less precise information. From the design of
neural networks, it is evident that MLP NNs required a compact architecture as compared to other NNs, in terms of
number of hidden nodes required for the classification. The number parameters such as weights and biases required for
the designing of MLP NN is sufficiently lower than other. This simple and compact structure indicates the feasibility of
MLP NN for online implementation and the hardware implementation [14].
Whenever new dataset findings are listed, this classification system can be retrained to accommodate new knowledge.
This MLP NN classifier can be used to assist physicians to detect heart disease class for preliminary diagnosis, thus
they can attempt perfection in the diagnosis of heart disease.
REFERENCES
[1] Anamica Gupta, Naveen Kumar and Vasuda Bhatnagar, Analysis of Medical Data using Data Mining and formal
Concept Analysis, Proceedings of World Academy Of Science, Engineering and Technology, Vol. 6 , June 2005.
[2] Bonow, Libby, Mann, Zipes, Heart Disease: a textbook of Cardiovascular Medicine, Eight edition, Saunders,
Elsevier, 2006.
[3] Mathers C.D., Lopez A., Stein D., Deaths and disease burden by cause: Global burden of Disease estimates by
World Bank Country Group, 2004.
[4] John Shafer, Rakesh Agarwal, and Manish Mehta, SPRINT: A Scalable parallel classifier for Data Mining, In
Proceedings of the VLDB Conference, Bombay, India, 1996.
[5] Manjusha B. wadhonkar, P.A.Tijare,S.N Sawalkar, Artificial Neural Network Approach for Classification of
Heart Disease Dataset, International Journal of Application or Innovation in Engineering &
Management(IJAIEM),Vol.3,Issue 4,pp.388-392,April 2014.
[6] R.Rojas, Neural Networks: a systematic introduction, Springer-Verleg, 1996.
[7] R.P.Lippmann, Pattern Classification using Neural Networks, IEEE commun.Mag.pp.47-64, 1989.
[8] Simon Haykin, Neural Network: A Comprehensive foundation, Pearson Prentice Hall, New Delhi, 2007.
[9] Murphy P.M. and Aha D. W., UCI Machine Learning Databases Repository Irvine C.A: University of California,
Department
of
Information
and
Computer
Science,ftp://ftp.ics.uci.edu/pub/machine-learningdatabases/heart/,2004.
[10] Bose, N.K. and Liang, P. Neural Network Fundamentals with graphs, algorithms and applications: Tata McGrawHill publishing company Ltd., New Delhi, 2001.
[11] Dr. K Usha Rani, Analysis of Heart Disease Dataset using Neural Network Approach , International journal of
Data Mining & Knowledge Management(IJDKP),Vol.1,No.5,pp. 1-6,September 2011.
[12] Hagan, M.T, Demuth H.B, Beale M.H., Neural Network Design, PWS Publishing, Boston, MA.1997.
[13] Ranjana Raut, Dr. S.V. Dudul, Intelligent Diagnosis of Heart Diseases using Neural Network Approach,
International Journal of Computer Applications(0975-8887),Vol.1-No.2,pp. 97-102,2010.
[14] Reyneri, L.M., Implementation Issues of Neuro Fuzzy Hardware: going towards HW/SW co design, IEEE
Trans. On Neural Networks, Vol.14, no.1, pp.176-194, 2003.
[15] Sahana Devanathan,Ambika R, Heart Disease Prediction System using Bayes Theorem, International Journal of
scientific Engineering Research, Vol. 4,Issue 4,pp. 1914-1918,Apr 2013.
[16] Nadir N.Chamiya, Sanjay V. Dudul, Classification of material type and its surface properties using Digital signal
Processing techniques and neural network, Applied Soft Computing, ELSEVIER, Vol. 11,Issue 1,pp. 11081116,Jan 2011.
Page 432
AUTHOR
Manjusha B. Wadhonkar M.E (Computer Engineeing) Second Year. Computer Science and
Engineering department. Sipna College of Engineering and Technology, Amaravati( M.S).
Page 433