Vous êtes sur la page 1sur 6

IJSRD - International Journal for Scientific Research & Development| Vol.

4, Issue 03, 2016 | ISSN (online): 2321-0613

A Survey on Classification Algorithms in Data Mining of Bioinformatics


S. Muthulakshmi1 Dr.R.Porkodi2
1
Research Scholar 2Assistant Professor
1,2
Department of Computer Science & Engineering
1,2
Bharathiar University
Abstract Data mining is used to extract the information deals with algorithms, databases and information systems,
from a large amount of data. Data mining consist of two web technologies, artificial intelligence and soft computing,
models, they are predictive and descriptive. Classification is information and computation theory, soft engineering, data
one of the data mining technique which comes under mining, image processing, modeling and simulation, single
predictive model. Classification is used in many applications processing, discrete mathematics, control and system theory,
such as artificial intelligent, machine learning, statistics and circuit theory, and statistics. Bioinformatics generates new
database system. Data mining can be applied to these knowledge and the computational tools are also used to
problems, to improve the efficiency of systems and the create that knowledge.
designs of machines. This paper surveyed the some The paper organized as follows: section 1 describes
algorithm gives the best result. The researchers used the introduction on data mining and microarray, section 2
different classification algorithm in which are namely K- describes the literature review, section 3 describes the
Nearest Neighbour classifiers, Decision tree, Bayesian various classification algorithms, section 4 describes
network, Support Vector Machine, Artificial Neural comparison of classification algorithms and finally the
Networks. This paper also presented the comparison of all paper is concluded in section 5.
five algorithms used in Bioinformatics Research.
Key words: Classification, Decision tree, Bayesian network, II. LITERATURE REVIEW
k- nearest neighbour classifier, Support vector machine, S.Archana and Dr. K.Elangovan et al. [1] discuss the
Artificial neural network classification algorithms can be implemented on different
types of data sets like data of patients, financial data
I. INTRODUCTION according to performances. Hence these classification
Classification one of data mining task which is used to techniques show how a data can be determined and grouped
predict the values. In classification should have two classes when a new set of data is available. On the basis of the
and that classes are predefined. The input of the performance of these algorithms can also be used to detect
classification model is the attribute of sample data and the the natural disasters like cloud bursting, earth quake, etc.
output is which data sample belongs to the class. David B.fogel et al. [2] had presented to develope, breast
Classification is the partition or ordering of objects into cancer by using neural network technique and the related
classes. In this method the classes are predefined and that works are also used in breast cancer diagnosis based on back
will train the classification system to allocate objects to the propagation method with multilayer perception. In contrast
classes. The training is based on training sample and that to back propagation found that evolution computational
sample will contains a set sample data. In this training, method and algorithms were used often, perform more
sample data classes are already known. In this classification classic optimization techniques.
techniques testing and validatation places an important role. Shadab Adam Pattekari et al. [3] developed a
Classifying the test data and comparing the result with the prototype Heart Disease Prediction System (HDPS) using
unknown result can determine the accuracy. Decision Trees, Naive Bayes and Neural Networks. In this
A microarray database is a repository of contains system user answers the predefined questions. Then it
microarray gene expression data. The key uses of a retrieves hidden data from stored database and it compares
microarray database are to store the extent data, manage a the users values with trained dataset.
searchable index, and create the data available to other Endo et al. [4] had implemented common machine
applications for analysis and interpretation. The models that learning algorithms to predict survival rate of breast cancer
determine to solve a problem are classified as Predictive and patient. Logistic regression had the highest accuracy;
Descriptive. Microarray technology has become one of the artificial neural network showed the highest specificity and
significant tools that many biologists use to monitoring J48 decision trees model had the best sensitivity.
genome in wide expression levels of genes in a given Sonali Agarwal, G. N. Pandey, and M. D. Tiwari et
organism. A microarray is typically a glass slide on to which al. [5] had proposed Support Vector Machines (SVM) is
DNA molecules are fixed in an orderly manner at specific established as a best classifier with maximum accuracy and
locations called spots. Classification is the process of minimum root mean square error (RMSE). This is aimed to
finding a model that describes and distinguish data classes develop a faith on Data Mining techniques so that present
or concepts. The purpose of this model used to predict the education and business system may adopt this as a strategic
class of objects and whose class label is unknown. management tool.
Bioinformatics is a combination of molecular K. Srinivas, B. Kavitha Rani and Dr. A. Goverdhan
biology and computer science. In this technology the et al.[6]discussesd examine the potential use of
computers are used to storing, extracting, organizing, classification based data mining techniques such as Rule
analysing, interpreting and integrate biological and genetic Based, Decision tree, Nave Bayes and Artificial Neural
information. Bioinformatics is very important for the use of Network to the massive volume of healthcare data. It can
identifying human diseases and genomic information. It predict the likelihood of patients getting a heart disease.

All rights reserved by www.ijsrd.com 612


A Survey on Classification Algorithms in Data Mining of Bioinformatics
(IJSRD/Vol. 4/Issue 03/2016/166)

Shweta Kharya et al.[7]discussed various data process, a decision tree can be used to visually and explicitly
mining approaches that have been utilized for breast cancer represent decisions and decision making.
diagnosis and prognosis Decision tree is found to be the best A decision tree describes data but not a decision
predictor with 93.62% Accuracy on benchmark dataset and relatively the resulting classification tree can be an input for
also on SEER data set. decision making. Decision tree is one of the popular
Tina R. Patil, Mrs. S. S. Shereka et al.[20] had algorithms which is able to handle both categorical and
proposed that to make comparative evaluation of classifiers numerical data and perform less computation. Decision trees
Naive Bayes and J48 in the context of bank dataset to are often simpler to interpret. Decision tree is a directed tree
maximize true positive rate and minimize false positive with a node and cannot having incoming edges called root.
rate of defaulters rather than achieving only higher All the nodes have one exact incoming edge. Each non-leaf
classification accuracy using WEKA tool. node called internal node or splitting node and it contains a
N. Poomani, R.Porkodi et al.[21] had compared on decision and most correct target value assign to one class is
various supervised learning algorithms to predict the best represented by leaf node. Decision tree can be used to
classifier. The experimental result shows that the highest analyze and represent classifiers models. On the other hand,
accuracy is found in J48graft classifier and the lowest error decision trees also referred to a hierarchical model of
rate 0.9587 among various classification algorithms. Based decisions and their cost. When a tree is used for
on the experimental result, it proves that the probabilistic classification, then it is said to be as a classification tree.
model is not much suitable for classify breast cancer dataset. There are some specific decision tree algorithms, namely
R. Porkodi and G. Suganya [36] had implemented ID3 (Iterative Dichotomiser3), C4.5 Algorithm, CART
classification algorithm in colon cancer dataset. The (Classification and Regression Tree).
experimental result shows that the highest accuracy is found ID3 is one of the most important decision tree
in both KNN and Neural Network classifier gives the result algorithms. In this method, information gain in advance and
Finally KNN and Neural Network classifier produces the generally to determine suitable property for each node of a
good accuracy than the Support Vector Machine, Random generated decision tree. We can select the attribute with the
Forest and Nave Bayes classifier. highest information as the test attribute based on current
node. Therefore, the use of an information approach will
III. CLASSIFICATION ALGORITHMS effectively reduce the required dividing number of object
Classification is one of the most widely used methods classification. ID3 is a supervised learning algorithm, based
of data mining in healthcare. The classification on information entropy. It is developed from several classes
algorithms can be useful to forecasting the outcome of and set of datasets. The algorithm planned a set of rules that
some diseases or its discover the genetic performance of allows predicting the class of an item and also used to
growth. This model is used to build the relating a predefined identify the attribute of the class and then differentiate from
set of classes or ideas. The classification model is used to others class. ID3 know the all dataset values and that dataset
construct by analysing database tuples are described by used to determine which attribute are important and which
attributes and also used to predict categorical class labels decision tree need to be included at which position is
and classify the data based on the training set. situated.
Classification techniques in data mining are C4.5 algorithm is the successor of ID3 algorithm. It
capable of processing a large amount of data and it can be is used to reduce the error rate by replacing the internal
used to classifying newly available data. The classification node with a leaf node.C4.5 algorithm accepts both
algorithm is a method procedure which takes some value or continuous and categorical attributes to build the decision
set of value as input and generates some value or set as tree.C4.5 has an enhance method of tree purning and reduce
output. The result of a given problem is the output that we the misclassification error due to noise. C4.5 algorithm
got after solving the problem. If the given algorithm is handle the attribute with different costs and also handling
considered to be correct for every input occurrence, then it training data with missing attribute values.
will generate the correct output and it gets completed or CART stands for Classification and Regression
otherwise it does not considered as a correct algorithm. This Trees. It is characterized by the each internal node which
paper gives the detailed description of five algorithms has exactly two outgoing edges. The splits are selected and
namely Decision tree, Bayesian network, K-nearest the obtained tree is pruned by costcomplexity. When
neighbour, Support vector machine and Artificial neural provided, CART can consider misclassification costs in the
network. tree instructed and also users to provide the prior probability
distribution. The major characteristic of CART is capacity to
A. Decision Tree generate regression trees. Regression trees are predict a real
A decision tree is a predictive modeling technique from the number and not a class.
field of data mining that builds a simple tree-like structure. B. Bayesian Network
Decision Tree (DT) is one of the classification technique in
data mining. Decision tree builds classification in the form The Naive Bayes algorithm is a simple probabilistic
of tree structure. It divides whole training set into smaller classifier that is used to calculates a set of probabilities by
subsets and at the same time decision tree incrementally using combinations of values in a data set. In all class
developed. The result is a tree with decision nodes and leaf variable attribute should be indepent in bayes theorem.
nodes. Decision tree classifier is helps to implement Bayesian network (BN) is a graphical model for probability
complex decision into easy process and the complex relationships among a set of variables. This BN consist of
decision is subdivided into simpler decision. In decision two components. First component is mainly a directed

All rights reserved by www.ijsrd.com 613


A Survey on Classification Algorithms in Data Mining of Bioinformatics
(IJSRD/Vol. 4/Issue 03/2016/166)

acyclic which contains nodes are called the random However the time to create the bootstrapped set is O (n2)
variables and the edges between the nodes or random where n is the number of training patterns. K-Nearest
variables. Second component which contain a set of Neighbor Mean Classifier (k-NNMC). Finds k nearest
parameters that describe the conditional probability of each neighbors for each class of training patterns separately. The
variable given its parents. A naive Bayes classifier assumes classification is done based to the nearest mean pattern. This
the presence or absence of a particular feature and unrelated improvisation proves to show better accuracy of
to the presence or absence of any other feature of class classification in when compared to other techniques using
variable. Naive Bayes classifiers can be trained very well in Hamamoto's bootstrapped training set.
a supervised learning and this method is important for Hamming Distance
several reasons. DH = =1 | |
Likelihood Class Prior Probability X=Y D=0
XY D=1
P(|c)P(c)
(|) = D. Support Vector Machine
()
Support vector machines (SVM) are also a type of machine
Posterior Probability Predictor Prior Probability learning tool. A support vector machine constructs a hyper
P(C|X) = P(X|C)P(X|C)..P(Xn|C)P(C) plane in infinite-dimensional space, and which can be used
P(C|X) is the posterior probability of class (target) given to classification, regression, or other tasks. SVMs were first
predictor(attribute). applied to protein sequence classification and have been
P(C) is the prior probability of class. applied to remote homology detection also. SVMs are
P(X|C) is the likelihood which is the probability of predictor supervisied binary classifiers used to find a linear separation
given class. between different classes of points in 3-D space. In 2D
P(X) is the prior probability of predictor. space, this separator is a line and in 3-D, it is a plane. This
POSTERIOR = PRIORLIKELIHOOD/EVIDENCE find an optimal separating hyper plan between members and
Where Posterior is the predicting the event will non- members of a given class in an abstract space.SVMS
occur, Prior is past experience, Likelihood is possible of as applied to gene expression data begin with a collectin of
chance and Evidence is total number of event will occur. known classifications of genes . One could build a classifier
capble of discriming between members and non-members of
C. K- Nearest Neighbors a given class.This would be useful in recognizing new
KNN Algorithm is based on similarity measure and used to members of a class, among genes of unknown function. The
store all accessible cases and used to identify the unknown classifier could be applied to original set of training data of
data point based on the nearest neighbor. It is easy to identify outliers that may have been previously
understand but has an unbelievable work in fields and unrecognized. A special property of SVM is, SVM
practice especially in classification. KNN is a supervised simultaneously minimize the empirical classification error
classification technique which is used extensively. It is an and maximize the geometric margin. So SVM called
easy to implement classification technique and Training is Maximum Margin Classifiers. The equation shown below is
very fast. KNN is particularly well suited for multimodal the hyper plane:
classes. In this method the training tuples are represented in Hyper plane, aX + bY = C
N-dimensional space and given an unknown tuple, k-nearest The main idea in SVM is an optimal hyper plane
neighbor classifier searches the k training tuples that are and which can be used in classification, for separation of
closest to the unknown sample and places the sample nearest linear patterns. The optimal hyper plane is select from the
class. The K nearest neighbor method is simple and set of hyper planes. The set of hyper planes are classifying
implement to the small sets of data, but when applied to patterns from the margin of the hyper plane. The distance
large of data and high dimensional data the results in from the hyper plane to the nearest point of each patterns.
slower performance. Accuracy in data classification is a The major purpose of SVM is to maximize the margin so the
major issue in data mining and in order to improve the classifying given patterns is correctly and large margin size
accuracy of classification. The improvements have been classifies patterns also correctly. The given pattern can be
made to the K nearest neighbor method. Weighted nearest mapped by kernel function, (x).i.e. x (x) . The different
neighbor classifier (wk-NNC) is one such method which kernel function is an important aspect in the SVM-based
adds a weight to each of the neighbors in a classification. classification. The kernel functions commonly used for
KNN using distance function. LINEAR, POLY, RBF, and SIGMOID. For e.g.: the
Distance function equation for Poly Kernel function is given as:
K(x, y) =<x, y>^p
Encliden =( )
E. Artificial Neural Networks
Manhattan = | |
Neural Networks are used in prototype recognition and
Minkowski [=(| |) ]1/q classification. A neural network is combination of nodes that
are connected in a topology with each node has input and
Hamamoto's bootstrapped training set can also be output connections to other nodes. Neural Networks are also
used as a substitute of the training patterns. The training called connectionist models because they are represented by
outlines are replaced by a weighted mean of a few of its weighted functions. The neural networks which are working
neighbors from its own class of training patterns. This with simple individual processing elements can perform
method proves to improve the accuracy of classification.

All rights reserved by www.ijsrd.com 614


A Survey on Classification Algorithms in Data Mining of Bioinformatics
(IJSRD/Vol. 4/Issue 03/2016/166)

complex method. The observation in a single layer neural n gives producer


network whose weights and biases is trained to produce a And 90% of .
correct output when presented with the corresponding input Naganandhi accuracy
vector. Artificial neural networks are connected by artificial ni. .
neurons. Artificial neural networks is used to understand the CART
accuracy
biological neural networks and for solving artificial is
intelligence problems. These problems can be solved 83.49%
without using a biological system because the real, and the To
biological nervous are highly complex. Artificial neural Vikas C4.5, Heart total Reduce
network algorithms attempt to summary this complexity and 2 Chaurasia, ID3, Datase time the the
focus on theoretically but most of the information are from Saurabh Pal CART. t taken to error
processing point of view. build rate.
Input layer Hidden layer Output layer the
model is
at 0.23
seconds.
To
The improve
K.Rajeswari SVM the
Decision
, algorith accuracy
Tree, Heart
V.Vaithiyan m is of
3 SVM, Clevel
athan and gives the Decision
Neural and
Shailaja correct tree and
Network
V.Pede accuracy Neural
84.16%. network
Fig. 1: Example of Artificial neural network
.
A neural network (NN), in the case of artificial The J48
neurons called artificial neural network (ANN) or simulated Tina R. is better To
neural network (SNN), is an interconnected group of neural Nave
Patil, Mrs. Bank than of increse
or artificial neurons that uses a mathematical or 4 Bayes,
S. S. dataset Nave accuracy
J48
computational model for information processing based on a Sherekar. bayes. level.
connectionist approach to computation. In most cases an
ANN is an adaptive system that changes its structure based Improve
C4.5 is
on external or internal information that flows through the the
better
network. Basic topology of neural network consists of feed Nave performa
Abdelghani performa
Bayes, nce of
forward neural network and recurrent network. In feed Bellaachia Breast nce than
5 C4.5, Naive
forward neural network information flow starts from the and Erhan Cancer the other
Neural bayes
input node .The information flow is one direction only from Gauven two
Network and
input node to hidden node and finally leads to the output techniqu
Neural
es.
node. In each node one or more processing elements (PE) network.
may be active.PE is used to simulate the neurons in the Neural
brain.PE receive input from the outside world or from the network
previous layer. No cycles or loops in this network. But in shows
recurrent neural network data flows bi-directionally and Naive 100% Improve
Bayes, accuracy the
feedback connections exists here. Neural network consist of Heart
Nidhi Bhatla Decision and Neural
three parts architecture, learning algorithm and the 6
Kiran Jyoti Tree,
Datase
Decision network
activation function. Neural networks are programmed to t
Neural tree also efficienc
store, recognize and retrieve patterns or database entries for Network performe y.
solving ill-defined problems, to filter noise from measured d well
data. with
99.62%.
IV. COMPARISON OF CLASSIFICATION ALGORITHMS Iterative
Support
,
The extensive survey has been conducted in classification vector
Support Human slow
algorithms in data mining for different medical datasets. The machine
7 Kharrat Vector brain training ,
gives the
outcome of the survey produces the comparison of various Machine dataset and
96.36 %
classification algorithms based on experimental dataset nonlinea
accuracy
used, outcome of their research and demerits are listed in r.
table1. Iterative
Decision
Classifica training
tree
S.N Author tion Data producer
Outcome Demerits Heart gives
o. Name Algorith Set Tu, Shin and Decision ,
8 Diseas 90% of
m Shin Tree overtrain
e accuracy
Rajendran, Decision Brain Decision Iterative ing
1 .
Madheswara tree. tumor tree training sensitive
.

All rights reserved by www.ijsrd.com 615


A Survey on Classification Algorithms in Data Mining of Bioinformatics
(IJSRD/Vol. 4/Issue 03/2016/166)

Naive KNN [5] Sonali Agarwal, G. N. Pandey, and M. D. Tiwari ,Data


Bayes, AND To Mining in Education: Data Classification and Decision
R.Porkodi
KNN, Neural Improve Tree Approach.
Neural Colon Network the [6] K. Srinivas, B. Kavitha Rani and Dr. A. Govrdhan,
9 and
Network, cancer gives the present
G.Suganya Applications of Data Mining Techniques in
Random better correctio
Forest, accuracy n genes. Healthcareand Prediction of Heart Attacks
SVM . International Journal on Computer Science and
The Engineering (2010).
highest [7] Shweta Kharya, Using Data Mining Techniques For
accuracy Diagnosis And Prognosis Of Cancer Disease,
is found International Journal of Computer Science, Engineering
Nave in and Information Technology (IJCSEIT), Vol.2, No.2,
bayes, J48graft Improve April 2012.
N.Poomani
CART, Breast classifier the [8] Amin, S.U., Agarwal. K., Beg.R., "Genetic neural
10 and
J48 Cancer gives executio
R.Porkodi
Graft, 0.979 n time.
network based data mining in prediction of heart
JRip with the disease using risk factors, Information&
lowest Communication Technologies (ICT), 2013 IEEE
error Conference on , vol., no., pp.1227,1231, 11-12 April.
rate [9] S.Ghorai, A.Mukherjee and P.K.Dutta, Cancer
0.9587. Classification from Gene Expression Data by NPPC
Table 1: Comparison of Classification Algorithms Ensemble, IEEE/ACM Transactions On
Computational Biology and
V. CONCLUSION Bioinformatics,vol.8,No.3,May/June 2011.
This paper deals with classification techniques in data [10] Topon Kumar Paul and Hitoshi Iba, Prediction of
mining. Data mining consists of various fields, and one of cancer class with majority voting genetic programming
that is bioinformatics. Classification is used to predicting classifier using gene expression data, IEEE/ACM
the values. In Data Mining, Classification techniques has Transactions on Computational Biology and
various algorithms namely Decsion Tree,Nave Bayes, K- Bioinformatics, vol. 6, No. 2, April-June 2009.
Nearest Neighbors, Support Vector Machine and Artifical [11] Kharrat, A., Gasmi, K. Messaoud, M. B., Benamrane,
Neural Networks. Compare to K-Nearest Neighbors, N. & Abid, M. (2010). A hybrid approach for automatic
Decision Tree and Bayesian Network (BN), Support Vector classification of brain MRI using genetic algorithm and
Machine and Artificial Neural Networks generally have support vector machine,Leonardo Journal of
different operational profiles. The classification technique is science.,ISSN-1582-0233, pp. 71-82.
to generate more precise and accurate system results. Based [12] Nidhi Bhatla Kiran Jyoti, An Analysis of Heart
on the Dataset the Classification result is compared in Table Disease Prediction using Different Data Mining
1. The objective of this paper is to improve the Accuracy Techniques, International Journal of Engineering
and performance of the Classifier. In the Classification Research & Technology (IJERT), 2012.
algorithms, Decision Tree Classifier support some of the [13] Romeo. M., F. Burden, M. Quinn, B. Wood and D.
dataset listed in Table 1. The comparison table shows how McNaughton.,Infrared Microspectroscopy And
classification algorithms performed in different datasets and Artificial Neural Networks In The Diagnosis Of
identified which one is gives the best accuracy among the Cervical Cance.U.S. National Library of Medicine
different classifier. In future, it works with gene National Institutes of Health ,Vol.44(1),pp179-87,1998.
identification, gene prediction and gene analysis. [14] Dr.Santhosh baboo, S.Sasikala A Survey on data
mining techniques in gene selection and cancer
REFERENCES classification-April 2010 International journal of
Computer science and information technology.
[1] S.Archana, Dr. K.Elangovan, Survey of [15] D.Lavanya and Dr.K.Usha Rani, Analysis of
Classification Techniques in Data Mining, Vol.2 Issue. feature selection with classification Breast cancer
2, February- 2014, pg. 65-71. datasets, Vol.2-No.5, oct-nov: 2011, Pg.no:756-763.
[2] David B.Fogel, Eugene C, Wasson, Edward [16] Krishnaiah Diagnosis of Lung Cancer Prediction
M.Boughton Evolving neural networks for detecting System Using Data Mining Classification
breast cancer. 1995 Elsevier Science Ireland Ltd. Techniques International Journal of Computer
[3] Shadab Adam Pattekari and Asma Parveen ,prediction Science and Information Technologies, Vol. 4 (1)
system for heart disease using naive bayes, 2013, 39 45 www.ijcsit.Com ISSN: 0975-9646.
International Journal of Advanced Computer and [17] N. Revathy And R. Amalraj, "Accurate Cancer
Mathematical Sciences,ISSN 2230-9624. Vol 3, Issue 3, Classification Using Expressions Of Very Few Genes",
2012, pp 290-294 International Journal Of Computer Applications,
[4] Endo, T. Shibata and H. Tanaka (2008), Comparison of Vol.14, No.4, 201.
seven algorithms to predict breast cancer survival, [18] Manaswini Pradhan, Dr. Ranjit Kumar Sahu, Predict
Biomedical Soft Computing and Human Sciences, the onset of diabetes disease using Artificial Neural
vol.13, pp.11-[16]. Network (ANN).

All rights reserved by www.ijsrd.com 616


A Survey on Classification Algorithms in Data Mining of Bioinformatics
(IJSRD/Vol. 4/Issue 03/2016/166)

[19] Wang X And Gotoh O, "Cancer Classification Using [35] Tu, M. C., Shin, D., & Shin. D,. Acomparative study
Single Genes, Genome Informatics, Vol. 23, Pp.179- of medical data classification methods based on
188, 2009. decision tree and bagging algorithms. In Dependable,
[20] Tina R. Patil, Mrs. S. S. Sherekar ,Performance Autonomic and Secure Computing, 2009. DASC'09.
Analysis of Naive Bayes and J48 Classification Eighth IEEE International Conference on (pp. 183-187).
Algorithm for Data Classification. IEEE.
[21] N. Poomani, R.Porkodi, A Comparative Study of [36] R. Porkodi and G. Suganya, A Comparative Study on
Classification Algorithms for Breast Cancer Microarray Classification Algorithms in Data Mining Using
Dataset: A study, IJSRD - International Journal for Microarray Dataset of Colon Cancer, International
Scientific Research & Development,Vol. 2, Issue 12, Journal of Advanced Research in Computer Science
2015 ,ISSN (online): 2321-0613. and Software Engineering, Volume 5, Issue 5, May
[22] Fayyad, Piatetsky-Shapiro, Smyth, and Uthurusamy: 2015 ISSN: 2277 128X.
Advances in Knowledge Discovery and Data Mining,
AAAI/MIT Press 1996.Jiawei Han and Micheline
Kamber,Data mining: concepts and techniques, San
Francisco:Morgan Kaufmann Publishers, 2001.
[23] Xin Yao, Yong Liu Neural Networks for Breast
Cancer Diagnosis 01999 IEEE.
[24] Jinn-Yi Yeh , Applying Data Mining Techniques For
Cancer Classification On Gene Expression Data,
Department of Management Information Systems,
National Chiayi University, Chiayi City, Taiwan.
[25] Zurada J.M., An Introduction To Artificial Neural
Networks Systems, St. Paul: West Publishing (1992).
[26] Lundin M., Lundin J., BurkeB.H.,Toikkanen S.,
Pylkknen L. and Joensuu H. , Artificial Neural
Networks Applied to Survival Prediction in Breast
Cancer, Oncology International Journal for Cancer
Resaerch and Treatment, vol. 57, 1999.
[27] Sarvestan Soltani A. , Safavi A. A., Parandeh M. N. and
Salehi M., Predicting Breast Cancer Survivability
using data mining techniques, Software Technology
and Engineering (ICSTE), 2nd International
Conference, 2010, vol.2, pp.227-231.
[28] Delen Dursun, Walker Glenn and Kadam Amit,
Predicting breast cancer survivability: a comparison
of three data mining methods, Artificial Intelligence
in Medicine, vol. 34, pgno. 113-127, June 2005.
[29] V. A. Sitar-Taut.,Using machine learning algorithms in
cardiovascular disease risk evaluation, Journal of
Applied Computer Science and Mathematics, 2009.
[30] Padmavati J., A Comparative study on Breast
Cancer Prediction Using RBF and MLP,
International Journal of Scientific & Engineering
Research, vol. 2, Jan. 2011.
[31] Rajendran, P., Madheswaran, M., & Naganandhini, K.
(2010) An improved pre-processing techniquewith
image mining approach for the medical image
classification In Computing Communicationand
Networking Technologies (ICCCNT), 2010
International Conference on (pp. 1-7). IEEE.
[32] Vikas Chaurasia and Saurabh Pal, Early Prediction of
Heart Diseases Using Data Mining Techniques,
Carib.j.SciTech, 2013, Vol.1, 208-217.
[33] Milan Kumari and Sunila Godara, Comparative Study
of Data Mining Classification Methods in
Cardiovascular Disease Prediction,IJCST Vol. 2, Issue
2, June 2011.
[34] Prof.K.Rajeswari , Dr.V.Vaithiyanathan and Shailaja
V.Pede , Feature Selection for Classification in
Medical Data Mining, Volume 2, Issue 2, March
April 2013 ISSN 2278-6856.

All rights reserved by www.ijsrd.com 617