Académique Documents
Professionnel Documents
Culture Documents
Neha Aitavade
Saumil Shah
Student,
Student,
Student,
Faculty,
Thakur College of
Engineering and
Technology
Thakur College of
Engineering and
Technology
Thakur College of
Engineering and
Technology
Thakur College of
Engineering and
Technology
amivora00@gmail.co
m
neha.aitavade@gmail
.com
saumilshah500@gm
ail.com
rashmi.thakur@thaku
reducation.org
ABSTRACT
Data mining is a process of analyzing a huge data from
different perspectives and summarizing it into useful
information. The information can be converted into
knowledge about historical patterns and future trends. Data
mining is the extraction of hidden predictive information from
large databases; it is a powerful technology with great
potential to help organizations focus on the most important
information in their data warehouses [6]. The other famously
used term is knowledge discovery from data or KDD [7]. Data
mining plays a significant role in the field of information
technology. Health care industry today generates large
amounts of complex data about patients, hospitals resources,
diseases, diagnosis methods, electronic patients records, etc.
The data mining techniques are very useful to make medicinal
decisions in curing diseases. The health care industry collects
huge amount of health care data which, unfortunately, are not
mined to discover hidden information for effective decision
making. The discovered knowledge can be used by the health
care administrators to improve the quality of service. In this
paper, we will find a method to identify frequency of diseases
in particular geographical area at given time period with the
help of data mining tools.
General Terms
Data mining, Frequent diseases, Medical data, Future trends.
Keywords
Feature selection, KDD, Health care, Data mining, Apriori
algorithm, ID3 algorithm, K-means clustering algorithm.
1.
INTRODUCTION
2.
RELATED WORK
Mining Diseases based on Feature Selection
A large population needs a great demand of doctor. But their
deficiency create problem so console plays very important
role for some extent. Facilitate the users to predict them self
even if they are at remote location and very hard to reach
doctors regularly. Less cost and time saving if we integrate it
to web portals. Data mining derives its name from the
similarities between searching for valuable business
information in a large database.
Type 1 (Based on age): Age wise there are different
frequently occurring diseases. Different age groups are prone
to different diseases due to their daily activities. Hence it is
important to sort the diseases based on age groups for future
use and awareness.
Proposed Ideas
[1] Apriori Algorithm (Association Rule)
It is the fundamental and most important algorithm for mining
frequent itemsets. It was first given by Agrawaland Srikant in
1994 [7]. It is a level wise algorithm which works in an
iterative fashion to discover all frequent itemsets in a
database. It uses prior knowledge of frequent itemsets
properties [8]. Frequent itemsets are the sets of items that
satisfy minimum support threshold. This algorithm takes only
categorical input and associates attributes present in the
dataset. There is a property associated with this algorithm
called Apriori Property which states that any subset of
frequent itemsets is also a frequent itemset. For example, if
{x,y,z} is a frequent set then the sets { {x},{y},{z} },
{ { x,y },{ x,z },{ y,z }} must also be frequent. The execution
of this algorithm is organized in two phases. In the first stage,
the candidates are generated and in the next phase frequent
itemsets are generated [9]. The generated large itemsets are
used to produce association rules from database.
Generating
Association rules
Mining
Time
Complexity
Space
Complexity
COMPARITIVE ANALYSIS
Parameter
Principle
K-Means
ID3
Apriori
Algorithm
Algorithm
Algorithm
It is based
on
the
principle of
Clustering
It is based on
the
It is based
on
the
principle of
Association
Rule
principle
of
Decision Trees
Space
Complexity is
= O(n)
Space
Complexity
is =
Relatively
efficient
and easy to
implement.
Easy
and
comprehensible
to implement.
Relatively
efficient
and easy to
implement.
Sum of Error
(SSE)
Tries
to
minimize
square sum
of
error
(SSE)
Tries
to
minimize
square sum
of
error
(SSE)
Works On
Labeled
data
Labeled data
Unlabeled
data
i cluster.
3.
O [MN +
(R^1+R^2+
R^M)] =
O(MN+ (1R^M)/(1R)).
Efficiency and
implementation
th
Space
Complexit
y is =
O((m+k)n)
Time
Complexity
is
O(R^i)
Time
Complexity is
O(m n2)
O
(I*k*m*n)
where,
between xi and vj.
Time
Complexit
y is
4.
CONCLUSION
5.
REFERENCES