Vous êtes sur la page 1sur 8

International Journal of Application or Innovation in Engineering & Management (IJAIEM)

Web Site: www.ijaiem.org Email: editor@ijaiem.org


Volume 4, Issue 5, May 2015

ISSN 2319 - 4847

A DATA MINING APPROACH FOR


CLASSIFICATION OF HEART DISEASE
DATASET USING NEURAL NETWORK
1

Miss. Manjusha B. Wadhonkar , Prof. P. A. Tijare2 , Prof. S. N. Sawalkar3


1

M.E Computer Engineering ,Computer Science and Engineering Department.


Sipna College of Engineering & Technology, Amaravati
2

Associate Professor, Computer science and Engineering Department


Sipna College of Engineering & Technology, Amaravati

Assistant Professor, Computer Science and Engineering Department.


Sipna College of Engineering & technology, Amaravati

ABSTRACT
Data mining is the process of automating information discovery. ANN is widely used data mining method to extract
pattern. Classification is one of the important data mining techniques for classifying given set of input data. In this
experiment classification of heart disease dataset is done with the use of Cleveland Heart Disease Dataset.
Classification is carried out using neural network classifier MLP .In this experiment performance measures are
compared with chosen optimal parameter of MLP neural network, when it is trained and tested over cross
validation, the training percentage of 980.5 %, testing percentage of 981.5% and 97 1.2% overall accuracy,
sensitivity 950.5%,specificity 100% are achieved. It shows the consistent performance of MLP neural network as
compare to other models. In this work heart disease dataset is classified using 13 input attributes as well as by using
16 inputs attributes. The accuracy difference between 13 attributes and 16 attributes in training dada is 1.67 % and
in testing data is 3.7% and in overall accuracy is.1.47%.The results obtained in this experiment shows the efficiency
and accuracy of MLP NN.
Keywords: Heart Disease dataset, MLP, Neural Network, Back-Propagation Algorithm, Classification, PE, Knowledge Data
Discovery

1. INTRODUCTION
Data mining is an important step in discovery of knowledge from large dataset. In recent years data mining has found
its significance in every field including healthcare [1]. A major challenge for healthcare organizations is the provision
of quality services at affordable costs. Services imply diagnosing patients correctly and administrating effective
treatment [2]. Medical data comprises number of tests essential to diagnose to a particular disease. Integration of
clinical decision support with computer-based patient records could reduce medical errors, increase patient safety,
decrease unwanted practice variation, and improve patient outcome [15]. Global burden of disease estimates for 2001
by World Bank Country Groups shows severity statistics indicated in year 2001 is 25.2 % for India and from literature
survey now it has increased to 46% [3]. Effective and efficient automated heart disease classification systems can be
beneficial in healthcare sector for heart disease classification.
The aim of our work is to introduce a classification approach using Multilayer Perceptron (MLP) with Back
Propagation learning algorithm with heart disease dataset. Classification is one of the important techniques of data
mining [5]. Classification is the processing of finding a set of models or functions which describe and distinguish data
classes or concepts. In classification, inputs are given a set of data, called training set, where each record consists of
several fields or attributes. One of the attributes, called the classifying attribute, indicate the class to which each dataset
belong. The aim of the classification is to build a model of the classifying attribute based upon the other attributes
which are not from the training dataset [4].Artificial neural network is widely used technique for extraction of patterns
in data mining. ANN has some advantages such as it automatically allow arbitrary nonlinear relations between the
independent and dependent variables, and allows all possible interactions between the dependent variables, due to
above said advantages of ANN the use of neural network technique is adopted for the classification of dataset [5].
Parallel processing approach is implemented at each node to increase the efficiency of classification.

Volume 4, Issue 5, May 2015

Page 426

International Journal of Application or Innovation in Engineering & Management (IJAIEM)


Web Site: www.ijaiem.org Email: editor@ijaiem.org
Volume 4, Issue 5, May 2015

ISSN 2319 - 4847

2. LITERATURE SURVE AND RELATED WORK


Integration of clinical decision support with computer-based patient records could reduce medical errors, increase
patient safety, decrease unwanted practice variation, and improve patient outcome [15]. Global burden of disease
estimates for 2001 by World Bank Country Groups shows severity statistics indicated in year 2001 is 25.2 % for India
and from literature survey now it has increased to 46% [6]. In spite of the rapid development of pathological research
more than 60,000 people die suddenly each year in India due to cardiovascular diseases [5]. A number of techniques
have been used for identification or prediction of heart diseases such as waveform analysis, time frequency analysis,
complexity measures, Neuro Fuzzy RBF NN, but it has been observed that classification accuracies were only up to 79
% [5]. Classification of heart disease dataset using ANN with feature selection gives only 80% result with these
techniques and still have enough scope in improving it by choosing appropriate NN model [7]. The above analysis
shows that Neural Network with 8 input attribute and 13 input attributes have shown the approximate accuracy of 81%
so far [8].

3. DATASET FOR THE CLASSIFUCATION OF HEART DISEASE


Data for the classification of heart disease dataset is obtained from four different datasets of UCI [5], centre for machine
learning and intelligent system .This database contains total 76 attributes , but for classification ,a subset of 17 of them
namely Age(in years),Sex, Chest Pain type, Resting blood Pressure, Serum cholesterol, fasting blood sugar, Resting
ECG, Maximum heart rate achieved, exercise induced angina, ST depression induced by exercise relative to rest, The
slope of the peak exercise ST segment, Number of major vessels ,number of cigarettes per day, years as a smoker and
fam_hisory and last feature is output based on classification of heart disease. Table 1 shows name of datasets and
number of their instances [9].
Table 1: Database Names and their Instances [5]
Name of Database

Number
instances

Cleveland

303

Hungarian
Switzerland
Long Beach V.A

294
123
200

of

The goal or output field is a five bit value which will represent five different classes as class 0-normal person, class 1first stroke, class 2- second stroke and class 3- third stroke and class 4-end of life.

4. DESIGN OF CLASSIFICATION SYSTEM


The design of the neural network mainly consist of topology(i.e. arrangement of PEs, connections, and patterns in the
neural network) and architecture of the network[10].For the classification of Heart disease dataset using 13 input
attributes and one output testing results gives maximum 90.6% accuracy for single layer and 94% for multilayer feed
forward network [11]. To increase the accuracy of classification of heart disease dataset, in our system three other input
attributes as number of cigarettes per day, years as a smoker and fam_History are used which increases the risk of
cardiovascular disease. Thus this system is an attempt to introduce a classification approach using multilayer
Perceptron (MLP) with back-propagation algorithm which includes 16 input attributes and an output attribute. An
output attribute is a resultant class to which patient belong and is displayed as a combination of five bits such as [1 0 0
0 0] represent class 0, [0 1 0 0 0] for class 1 and so on. Attribute values for cig-per-day, yrs and family-history are
tabulated in below table 2.
Table 2: Proposed attribute values
Name

Description

Range

Cig-per-day

Value

Continuous

Yrs
Family history

Value
1=True,0=Fals
e

Continuous
1,0

Volume 4, Issue 5, May 2015

Page 427

International Journal of Application or Innovation in Engineering & Management (IJAIEM)


Web Site: www.ijaiem.org Email: editor@ijaiem.org
Volume 4, Issue 5, May 2015

ISSN 2319 - 4847

4.1 Multilayer Perceptron Neural Network


For more complex decision function the inputs each with its own set of weights and threshold are fed into a number of
perceptions nodes [5].The output of these input nodes are given as an output to another layer of nodes. Output of final
layer of nodes is the output of the network. Such type of network is termed as MLP [7].The layers of nodes whose
inputs and outputs are seen only by other nodes are termed hidden [8]. Back_propagation learning algorithm with
supervised learning methods is used to compute the connection weights. There are different variants of
back_propagation algorithm in the literature [12]-[13].
4.2 Data Flow for the Classification of Heart Disease Dataset
The workflow for the classification of dataset is as shown in figure 1(a) and 1(b) which provides brief description of
fundamental steps that should be followed to apply ANNs for the classification of heart disease dataset

Start

Collect patient Data from


Database
Start
Preprocessing
New Patient Data Acquisition

ANN Training using Back


Propagation Algorithm

No

Are Test
Result OK

Preprocessing

Processing for Prediction of


Class

Yes
Trained and Verified ANN

Predicted Result

Stop

Stop

Figure 1 (a): Training Procedure for Classification


of Heart Disease Dataset Using MLP Network [5].

Figure 1 (b): Testing of ANN based


Classification of heart Disease dataset for new
patient data [5].

4.2.1 Data Collection


Neural network is trained using Cleveland dataset of example cases. This dataset is nothing but records of patients
stored in a database. Database contains number of reliable examples to be given as an input to the training network.
4.2.2 Pre-processing
Data in the training dataset must be pre-processed before the evaluation by the neural network. Data to be given as
input are scaled within the interval (0, 1) because the interference function used is logistic one. During pre-processing
Cleveland dataset, 11 records contain missing attribute values that should be removed from the dataset to improve the
classification performance. Thus total 272 records are given as an input to the neural network.

Volume 4, Issue 5, May 2015

Page 428

International Journal of Application or Innovation in Engineering & Management (IJAIEM)


Web Site: www.ijaiem.org Email: editor@ijaiem.org
Volume 4, Issue 5, May 2015

ISSN 2319 - 4847

4.2.3 Training & verification using ANN


The neural network is trained with Heart Diseases database by using feed forward neural network model and backpropagation learning algorithm with momentum and variable learning rate. The input layer of the network consists of
16 neurons to represent each attribute as the database consists of 16 attributes. Several neural networks are constructed
with single hidden layers network and trained with heart disease dataset. A selection of maximum number of epochs is
provided prior to training within which the training is expected to converge. The convergence is said to be achieved
when the error between the output generated by the trained network and the actual output from the database matches
within a certain error limit preset before the training. If a convergence is not achieved then training with new network
configuration (i.e. hidden neuron count) is carried out. Below figure 2(a) and 2(b) shows training graph and the error
graph which depicts the actual output, predicted output by the trained neural network and the absolute error difference
between actual and predicted output.

Figure 2(a): Training graph converged at 704 epochs. X axis=Number of epoch, Y axis=Error difference

Figure 2(b): Error graph converged at 704 epochs. X axis=Number of instances, Y axis=Scaled output value and Error
difference
4.2.4 Verification
Once a convergence is achieved the ANN is declared to be trained and its verification is initiated which normally is
similar to the verification carried out during training by comparing the predicted outputs of the ANN with the actual
ones, only difference being the dataset used this time is different from the one used in training. Once this verification
results match then the ANN is declared as trained and verified for application purpose. Periodic verification of ANN
and retraining if verification fails is a normal process with the ANNs.
4.2.5 Testing
Once an ANN is declared to be trained and verified it is usable for application to the classification problem. In this
phase it is provided with new users heart disease data and asked to classify. The results are used as correctly generated.
4.3 Architecture for the classification of heart Disease Dataset
The architecture for the classification of heart disease dataset is as shown in figure 3. Initially Cleveland database (76
attributes) and its subset of (16) have been acquired and a Database structure for the system is being set into place for
the loading of the Database as well as Help presentation on the database. A scalable approach is used with the use of
Database module which uses two scripts labeled as Database Info and Database Load. The first one provides the
information about the Database features/attributes and their naming, the second one is provided for loading the

Volume 4, Issue 5, May 2015

Page 429

International Journal of Application or Innovation in Engineering & Management (IJAIEM)


Web Site: www.ijaiem.org Email: editor@ijaiem.org
Volume 4, Issue 5, May 2015

ISSN 2319 - 4847

database in memory for processing.


GUI for the Classification of Heart Disease Dataset Using MLP with Back Propagation Algorithm

Database Modules
(Load& Info.)

Preprocessing Modules

Training Module

Verification Module

Application Module

Training result

Verification

Classification Result

Heart
Database

Figure 3: Architecture of the System for classification of Heart Disease dataset using MLP [5]
Training Module:-ANN is trained by using MLP with Back-propagation learning algorithm.
Training Result: - Are predicted results, obtained by summing the results of inputs with adjusted weights.
Verification Module:-In this module predicted output of ANN is compared with actual output.
Verification Result: - Once this verification results match then the ANN is declared as trained and verified for
application purpose.
Application Module: - Once an ANN is declared to be trained and verified weights from input to hidden layer and
hidden to output layer are stored and reloaded for application to the classification problem. In this phase it is provided
with new patients heart disease data as an input and display result as a class to which patient belong.
Classification Result: - For inputs of any new patients heart disease dataset, it provides results such as whether the
patient is a healthy person or if not then to which class it belongs. If input is given in the form of file containing patient
records then classification result is the form of confusion matrix.

5. EXPERIMENTAL RESULTS AND DISCUSSION OF MLP NN CLASSIFIER


The network is trained several times with different random initialization of connection weights to ensure the true
learning. Termination is when training gets convergence i.e. the error difference between the actual and predicted
output is less than or equal to error limit. It is also established from the results that, the 90 % training and 10 % testing
data partition gives best results. It is clear that transfer function of neurons in hidden layer as well as output layer
should be tanh (hyperbolic tangent). Details about the training algorithm and its parameter are as given in table 3. The
MLP neural network should be trained using back propagation algorithm with supervised learning rule. The designed
classifier is evaluated on cross validation with regard to percent classification accuracy, specificity and sensitivity.
Table 3: Variable Parameters of MLP NN (16-60-05)
Parameter

Range

Optimal Values

Exemplars for training

10% to 90 %

90%

Exemplars for cross


validation
Number of epochs

10% to 90 %

10%

1000 to 10000

Number
layer
Number

of

Hidden

1 to 3

Class 0-704
Class 1-954
Class 2-604
Class 3-1742
Class 4 -689
1

of

hidden

2 to 100

60

Volume 4, Issue 5, May 2015

Page 430

International Journal of Application or Innovation in Engineering & Management (IJAIEM)


Web Site: www.ijaiem.org Email: editor@ijaiem.org
Volume 4, Issue 5, May 2015
neurons
Transfer function of
the neurons in hidden
layer
Transfer function of
the neuron in the
output layer
Supervised Learning
Rule
Step Size at hidden
and output Layer
Error limit

Tanh, sigmoid, Linear tanh,


Log sigmoid, Bias axon Linear
Axon, Axon
Tanh,sigmoid,Linear tanh,Log
sigmoid, Bias axon Linear
Axon, Axon
Step,
Momemtum,conjugate
Gradiant(CG)
0 to 1

Tanh

0 to 0.9

0.1

ISSN 2319 - 4847

Tanh

Step
0.1

5.1 Selection of Error Criteria


Normally Euclidian or L2 norm is used. When problem incorporates very high degree of nonlinearity different error
norms could be examined for their suitability in computation of error between output of NN model and the desired
output. For MLP NN L2 norm provides the highest classification accuracy on training, testing and cross validation.
5.2 Performance Measures of MLP NN
Proposed neural network is trained using back propagation algorithm and confusion matrix for cross validation so as to
ensure that its performance does not depend on any specific data partitioning scheme. In this, rows are selected
randomly by factor n which depends upon the data partitioning percentage of train and cross validation. Table 4 shows
the performance measures for the MLP NN classifier with different dataset with respect to normal and diseased heart
instances.
Table 4: Performance Measures for MLP NN Classifiers
Data Sets

13:60:05 MLP 90%


training data
16:60:05 MLP 90 %
training data

%
Classification
Accuracy
90%
10%
Trainin
Testing
g Data
Data
96.69%
96.29%

%
Sensitivit
y

%
Specificit
y

Error
Limit

92.56%

100%

0.000
1

98.367%

95.86%

100%

0.1

100%

From above table it implies that MLP NN as a classifier in this work possesses more learning ability compare to
previously implemented techniques. The most important criterion in this work is to what extent the MLP NN classifier
is able to correctly classify the exemplars [16]. To confirm whether the proposed model is really consistently capable of
more accurate classification, different data partition sets are used to train the network. As per the confusion matrices it
was found that the MLP Neural classifier has the advantage of reducing misclassifications among the neighborhood
classes as compare to other NN classifiers [13].

6. RESULTS AND DISCUSSION


Table 5: Performance comparison of other technique with others on same dataset
Previous Technique

13input 2 output

Performance % Accuracy,
% Sensitivity, % Specificity,
error limit
Accuracy 94% ,Error Rate 0.1

13:60:05 MLP 90%


train data

Accuracy 96.69%,Sensitivity 92,56%, Specificity


100%,Error Rate 0.0001,

Volume 4, Issue 5, May 2015

References

[11]

Page 431

International Journal of Application or Innovation in Engineering & Management (IJAIEM)


Web Site: www.ijaiem.org Email: editor@ijaiem.org
Volume 4, Issue 5, May 2015
16:60:05 MLP 90 %
train data

ISSN 2319 - 4847

Accuracy 98.161%,Sensitivity 96.86%, Specificity


100%,Error Rate 0.1,

From the performance comparison of proposed technique with others on same dataset as shown in above table 5, it is
proved that the proposed MLP NN classifier with 16 input attributes clearly outperforms earlier researchers
techniques. Previous related research analysis for heart diseases dataset shows report 94 % accuracy .With selected
parameters of MLP NN, when it is trained several times and tested over cross validation, then overall accuracy 98.16%,
sensitivity 95.86% and 100 % specificity are achieved which shows consistent performance than other neural network.

7. CONCLUSION
Proposed neural network method proved to be reliable for diagnosis of angina in patients with heart disease. Additional
studies with larger number of patients are required to improve accuracy and usefulness of artificial neural network. It is
observed that MLP NN is fastest, simple in design, lowest MSE and highest accuracy.
As per wide range of applicability of ANN, neural networks are well suited to solve complex problems due to their
ability to learn complex and nonlinear relationships including noisy or less precise information. From the design of
neural networks, it is evident that MLP NNs required a compact architecture as compared to other NNs, in terms of
number of hidden nodes required for the classification. The number parameters such as weights and biases required for
the designing of MLP NN is sufficiently lower than other. This simple and compact structure indicates the feasibility of
MLP NN for online implementation and the hardware implementation [14].
Whenever new dataset findings are listed, this classification system can be retrained to accommodate new knowledge.
This MLP NN classifier can be used to assist physicians to detect heart disease class for preliminary diagnosis, thus
they can attempt perfection in the diagnosis of heart disease.

REFERENCES
[1] Anamica Gupta, Naveen Kumar and Vasuda Bhatnagar, Analysis of Medical Data using Data Mining and formal
Concept Analysis, Proceedings of World Academy Of Science, Engineering and Technology, Vol. 6 , June 2005.
[2] Bonow, Libby, Mann, Zipes, Heart Disease: a textbook of Cardiovascular Medicine, Eight edition, Saunders,
Elsevier, 2006.
[3] Mathers C.D., Lopez A., Stein D., Deaths and disease burden by cause: Global burden of Disease estimates by
World Bank Country Group, 2004.
[4] John Shafer, Rakesh Agarwal, and Manish Mehta, SPRINT: A Scalable parallel classifier for Data Mining, In
Proceedings of the VLDB Conference, Bombay, India, 1996.
[5] Manjusha B. wadhonkar, P.A.Tijare,S.N Sawalkar, Artificial Neural Network Approach for Classification of
Heart Disease Dataset, International Journal of Application or Innovation in Engineering &
Management(IJAIEM),Vol.3,Issue 4,pp.388-392,April 2014.
[6] R.Rojas, Neural Networks: a systematic introduction, Springer-Verleg, 1996.
[7] R.P.Lippmann, Pattern Classification using Neural Networks, IEEE commun.Mag.pp.47-64, 1989.
[8] Simon Haykin, Neural Network: A Comprehensive foundation, Pearson Prentice Hall, New Delhi, 2007.
[9] Murphy P.M. and Aha D. W., UCI Machine Learning Databases Repository Irvine C.A: University of California,
Department
of
Information
and
Computer
Science,ftp://ftp.ics.uci.edu/pub/machine-learningdatabases/heart/,2004.
[10] Bose, N.K. and Liang, P. Neural Network Fundamentals with graphs, algorithms and applications: Tata McGrawHill publishing company Ltd., New Delhi, 2001.
[11] Dr. K Usha Rani, Analysis of Heart Disease Dataset using Neural Network Approach , International journal of
Data Mining & Knowledge Management(IJDKP),Vol.1,No.5,pp. 1-6,September 2011.
[12] Hagan, M.T, Demuth H.B, Beale M.H., Neural Network Design, PWS Publishing, Boston, MA.1997.
[13] Ranjana Raut, Dr. S.V. Dudul, Intelligent Diagnosis of Heart Diseases using Neural Network Approach,
International Journal of Computer Applications(0975-8887),Vol.1-No.2,pp. 97-102,2010.
[14] Reyneri, L.M., Implementation Issues of Neuro Fuzzy Hardware: going towards HW/SW co design, IEEE
Trans. On Neural Networks, Vol.14, no.1, pp.176-194, 2003.
[15] Sahana Devanathan,Ambika R, Heart Disease Prediction System using Bayes Theorem, International Journal of
scientific Engineering Research, Vol. 4,Issue 4,pp. 1914-1918,Apr 2013.
[16] Nadir N.Chamiya, Sanjay V. Dudul, Classification of material type and its surface properties using Digital signal
Processing techniques and neural network, Applied Soft Computing, ELSEVIER, Vol. 11,Issue 1,pp. 11081116,Jan 2011.

Volume 4, Issue 5, May 2015

Page 432

International Journal of Application or Innovation in Engineering & Management (IJAIEM)


Web Site: www.ijaiem.org Email: editor@ijaiem.org
Volume 4, Issue 5, May 2015

ISSN 2319 - 4847

AUTHOR
Manjusha B. Wadhonkar M.E (Computer Engineeing) Second Year. Computer Science and
Engineering department. Sipna College of Engineering and Technology, Amaravati( M.S).

Volume 4, Issue 5, May 2015

Page 433

Vous aimerez peut-être aussi