Académique Documents
Professionnel Documents
Culture Documents
778
International Journal of Advanced Computer Research (ISSN (print): 2249-7277 ISSN (online): 2277-7970)
Volume-4 Number-3 Issue-16 September-2014
false presumptions and unpredictable effects. The The rest of the paper is structured as follows. Section
large amount of data is a key resource to be 2, follows Related Work completed on the
processed and analyzed for knowledge extraction that application of data mining techniques to heart disease
enables support for decision making. Neural Network prediction. In Section 3, the Genetic Neural Network
which is able to train the data through multiple layers with proper selection of neural network architecture
proves to be well performing technique for heart for data mining is proposed for heart disease
disease prediction [5]. It is possible for doctors to prediction. The experimental results of the heart
analyze, model, and make sense of complex clinical attack disease system for prediction using Genetic-
data across a broad range of medical applications Neural Approach are explained in Section 4. Further,
with the help of artificial neural networks as it is a the paper presents the conclusion based on the results
powerful tool for medical data mining [6] [7]. analysis of Genetic Neural Approach for Heart
Disease Prediction and defines the future work to be
One of these promising method is artificial neural carried for further research in healthcare service.
networks (ANNs) which emerge as a well performing
technique for heart disease prediction [8], is a highly 2. Literature Survey
effective tool used in classification tasks, as well as
to solve many important problems, such as signal Huge amounts of data generated by healthcare
enhancement, identification, and prediction of signals transactions are too complex and voluminous to be
and factors. ANNs has an important feature as its processed and analyzed by traditional methods hence
adaptivity in complex information processing in data calls for technological interventions so as to simplify
mining process. This makes it possible that the ANNs management of those data. The decision making can
are applied in cases where there is impossible to be improved by using data mining in discovering
create a strict mathematical model but has a patterns and trends in large amounts of complex data.
sufficiently representative set of samples. The other Several ways are carried out in finding efficient
important characteristic of neural networks is their technique of medical diagnosis for various diseases.
capacity to generalize input information and to give Popular data mining tasks are association rules,
correct answers for unfamiliar data, which makes classification, clustering, prediction and sequential
them effective in solving complicated classification patterns.
problems [6].
Classification techniques are capable of processing a
The major problem associated with the neural large amount of data. Classification is one of the
network is the selection of hidden neurons in most widely used methods of Data Mining in
structure of neural network [3].This is very important Healthcare organization. The common classification
while the neural network is trained to get very small techniques used in healthcare are Bayesian Networks,
errors which may not respond properly in prediction. Support Vector Machines, Nearest neighbor method,
There exists an overtraining issue in the design of Decision trees, Fuzzy logic, Fuzzy based Neural
Neural Network training process. Over training is Networks, Artificial Neural Network, Genetic
similar issue to the over fitting of data in the neural Algorithms [11].
network. This issue is to be solved because the neural
network trains the data by matching the data so In 2012, R. Bhuvaneswari et al., [12], use Naive
closely as to lose its generalization ability over the Bayes classifier in medical applications. Two of the
test data. One of the searching algorithms which find well-known algorithms are used in data mining
an optimal solution to a problem is Genetic classification are Backpropagation Neural
Algorithm [9].Genetic algorithm is based on Darwin Network(BNN) and Nave Bayesian (NB) calculate
theory about evolution Survival of the fittest". the priors, the probability of the object among all
Genetic algorithm are a way of solving problems by objects based on the previous experience. Bayesian
mimicking processes the nature uses-Selection, technique is constructed on the probability concept.
Crossover, Mutation and Accepting to evolve The posterior from the prior is calculated by bayes
solution to the problem [10]. So when the two data rules. Depending on the precise nature of the
mining techniques such as neural network and probability model, Naive Bayes classifiers is used to
genetic algorithm combines helps to increase the trained very efficiently in a supervised learning
accuracy of prediction. setting.
779
International Journal of Advanced Computer Research (ISSN (print): 2249-7277 ISSN (online): 2277-7970)
Volume-4 Number-3 Issue-16 September-2014
In 2011, Milan Kumari et al., [11], resolves Then the error signal is back propagated in the
cardiovascular disease dataset using different data network. The actual response of the network moves
mining algorithms, such as Support Vector Machine, nearer to the desired response by adjusting the
Artificial neural networks (ANNs), Decision Tree, synaptic weights in a statistical sense in the network.
and RIPPER classifier. The authors analyze the The generalized delta rule which minimizes the error
performance of these algorithms through several is used for the weight adjustment in the network.
statistical analysis factors such as accuracy and error Thus a medical decision support system can be
rate. Accuracy of RIPPER, Decision Tree, ANN and developed particularly in the diagnosing of heart
SVM are 81.08%, 79.05%, 80.06% and 84.12% disease.
respectively. While the results of error rates for
RIPPER, Decision Tree, ANN and SVM are 2.756, In 2013, Syed Umar Amin et al., [4], developed
0.2755, 0.2248 and 0.1588 respectively. Out of these genetic neural network hybrid system. This system
four classification models SVM predicts uses the global optimization advantage of genetic
cardiovascular disease with least error rate and algorithm for initialization of neural network weights.
highest accuracy. A backpropagation algorithm is used to train the
networks with optimize initialization of synaptic
In 2012, Mai Shouman et al., [13], combine different weights by Genetic Algorithm.
classifiers through voting to outperform other single
classifiers. Decisions of multiple classifiers are The determination of neural network structure is a
associated by using aggregation technique called as challenging task. Even though significant progress
Voting. The idea of applying multiple classifier has been carried out in classification related areas of
voting is to divide the training data into smaller equal neural networks, there are number of issues that are
subsets of data and building a classifier for each not been solved successfully or completely applying
subset of data. The results show that applying voting to the neural network. A small network could not be
could not enhance the K-Nearest Neighbor accuracy able to provide good performance owing to its
in the diagnosis of heart disease. limited information processing power. A large
network, on the other hand, may have some
In 2013, Senthil Kumar et al., [14], proposed a connections redundant Neural network trains the
method that uses components of fuzzy logic like input data pattern through different layers with local
Fuzzification, Advanced Fuzzy Resolution convergences which does not provides optimal
Mechanism and defuzzification. Fuzzification is a solution to the problem [4]. Thus the selection of
process to transfer crisp values into fuzzy values. In proper network architecture plays key role in training
the analysis of heart disease a fuzzy resolution the input data.
mechanism uses predicted value with five layers,
each layer has its own nodes. The results are tested There is being needed to have decision support
with Cleveland heart disease dataset. Fuzzy system for predicting the existence or absence of
Resolution Mechanism was developed using heart disease. Wrong diagnosis or poor clinical
MATLAB. Defuzzification process converts the decisions leads to mortality, All clinicians are not
fuzzy set into crisp values. equally good in predicting the heart disease in which
diagnosis plays a very important role. In the case of
In 2013, NABEEL AL-MILLI., [15], developed heart heart disease time is a precious, proper diagnosis at
disease prediction system that uses the the right time saves life of many patients. The system
backpropagation algorithm technique to develop can be considered assisting the doctor to come to
multilayer neural networks in a supervised manner. decision making. Further, proposed solution to
The error-correction learning rule is the basis for the overcome these limitations in medical data mining
back propagation algorithm. The algorithm uses a for heart disease prediction is presented.
forward pass and a backward pass through the
different layers of the network. The forward pass use 3. Proposed System
to fix the synaptic weights of the networks. In the
backward pass, the synaptic weights are all adjusted Diagnosing heart disease is considered as a non-
in accordance with an error-correction rule. Error linear problem that shows the complex causal
signal is calculated as the difference between the relationship between the variables. However, there is
desired output and the actual response of the network. a new computational paradigm called an artificial
780
International Journal of Advanced Computer Research (ISSN (print): 2249-7277 ISSN (online): 2277-7970)
Volume-4 Number-3 Issue-16 September-2014
Genetic Algorithm is adaptive heuristic search risk factors. The Table 4.1 shows the identified
algorithm based on the evolutionary ideas of natural important risk factors and the corresponding values
selection and genetics which can be used to initialize and their encoded values in brackets, which were
the neural network weights. Thus Genetic-Neural used as input to the system [4].
Network takes advantage of global optimization of
genetic algorithm for initialization of neural network. The approach uses Backpropagation Neural Network
Then the genetic algorithm fitness function is used to as the learning algorithm in Neural Network for this
predict the heart disease. heart disease prediction system. However,
Backpropagation learning depends on the several
4. Results and Discussion parameters in the Multi-Layer Feedforward Neural
Network such as number of neurons in the hidden
The experimental results of the heart attack disease layers initialization of neural network weights. Due
system for prediction using Genetic-Neural Approach to this, Genetic Algorithm is used to obtain the
are explained in this section. The system was optimal parameter value and weight for the
developed using MATLAB R2012a. Global Backpropagation learning so that the performance of
Optimization Toolbox and the Neural Network Genetic Algorithm along with Multi-Layer
Toolbox were used for implementing the algorithm. Feedforward Neural Network is increased.
The data for risk factors related to heart diseases are
collected from 50 people who are provided by The Multi-Layer Feedforward Neural Network is
American Heart Association [4]. constructed by calculating the number of nodes in
input, hidden and output layers. The input nodes are
Table 4.1: Risk Factor Values and their Encoding taken as 12 equals to the number of risk factor
associated in heart disease prediction system and
Name Description output node is taken as 1 equals to the predicted
Sex Male(1), Female(0) output as 'Yes' or 'No'. the number of hidden nodes is
Age 20-34(-2),35-50(-1),51-60(0),61- computed from mean square error(mse) which is
79(1),>79(2) generated as the neural network is train by using
Blood Below 200 mg/dL - Low (-1), Backpropagation Learning Algorithm considering 1
Cholesterol 200-239 mg/dL - Normal (0), to 10 hidden nodes. The Genetic Algorithm (GA) is
240 mg/dL and above - High (1) applied to initialize the neural network weight. The
Blood Pressure Below 120 mm Hg- Low (-1) Genetic Algorithm is used to calculate the number of
120 to 139 mm Hg- Normal (0), hub that is the number of layers in the neural network
Above 139 mm Hg- High (-1) along with the total number of weights and bias used
Hereditary Family Member diagnosed with HD - to initialize the network for each generation by
Yes (1) Otherwise No (0) genetic algorithm. Hence there are (I*Hn+10) +
Smoking Yes (1) or No (0) (10*O+2) number of total weights and biases. Fitness
Alcoholic Yes (1) or No (0) function is calculated for each chromosome based on
Intake mean square error. After selection, crossover and
Physical Low (-1) , Normal (0) or High (1) mutation in GA, the chromosomes with lower
Activity adaptation are replaced with better ones, and the
Diabetes Yes (1) or No (0) better and fitter chromosomes (optimized solutions)
Diet Low (-1) , Normal (0) or High (1) that correspond to the interconnecting weights and
Obesity Yes (1) or No (0) thresholds of neural network are generated further,
Stress Yes (1) or No (0) the Genetic Algorithm along with the Multi-Layer
Heart Disease Yes (1) or No (0) Feedforward Neural Network (MLFNN) is used to
predict the heart disease. Thus, Genetic-Neural
The dataset was composed of 12 important risk Approach for Heart Disease Prediction helps to
factors which were sex, age, family history blood increase the accuracy of medical resource utilization
pressure, Smoking Habit, alcohol consumption, in order to reduce the health care.
physical inactivity, diabetes, blood cholesterol, poor
diet, and obesity. The system indicates whether the 4.1 Performance Metrics
patient had risk of heart disease or not. Most of the The Genetic-Neural Approach for Heart Disease
heart disease patients had many similarities in the Prediction is evaluated to compute the accuracy.
782
International Journal of Advanced Computer Research (ISSN (print): 2249-7277 ISSN (online): 2277-7970)
Volume-4 Number-3 Issue-16 September-2014
Accuracy is determined as follows: Accuracy = (TP + Table 4.2: Comparison of the Genetic-Neural
TN) / (TP + FP + TN + FN) where TP denotes true Approach for Heart Disease Prediction by Using 6
positives. True positives refer to the positive tuples and 10 Hidden Nodes
that were correctly labeled by the classifier. TN
denotes true negatives. True Negatives are the Technique Genetic Neural Genetic Neural
negative tuples that were correctly labeled by the Approach For Network Based
classifier. FP denotes false positives. False positives Heart Disease Data Mining in
are the negative tuples that were incorrectly labeled Prediction using Prediction of
6 hidden nodes Heart Disease
by the classifier. FN denotes false negatives. False
Using Risk
negatives are the positive tuples that were incorrectly Factors using 10
labeled by the classifier. The resulted values of TP, hidden nodes
TN, FP, and FN is computed by training the Heart True Positive 17 10
Disease Dataset by using Genetic-Neural Approach True Negative 32 33
for computation of Accuracy. False Positive 0 0
False Negative 1 7
4.2 Experimental Results Time Required 22.6322 seconds 25.3133 seconds
The neural network is constructed by taking 12 nodes Accuracy 98% 84%
in input layer, 1 to 10 hidden nodes and node at
which minimum mean square error occurred is taken 5. Conclusion and Future Scope
as the number of hidden nodes and 1 node as output.
Results shows that the minimum mean square error The Data mining techniques are nowadays mostly
occurred at node 6 so number of hidden nodes are 6 used in healthcare industry for predicting diseases.
as shown in Figure 4.1. When these techniques applied in patient medical
dataset has resulted in innovations, standards and
decision support system that have significant success
in improving the health of patients and the overall
quality of medical services. In this study, a
experiment is conducted with Heart Disease dataset
by considering the Multi-Layer Neural Network
along with Backpropagation Learning Algorithm
used to train the network. Genetic Algorithm is used
to optimize the initialization of neural network
weights. This work demonstrates about Genetic
Neural Network based prediction of heart disease by
improving the accuracy as 98% using optimize neural
network architecture and predicts whether the patient
is suffering from heart disease or not.
783
International Journal of Advanced Computer Research (ISSN (print): 2249-7277 ISSN (online): 2277-7970)
Volume-4 Number-3 Issue-16 September-2014
784