Académique Documents
Professionnel Documents
Culture Documents
Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume 3, Issue 1, January February 2014 ISSN 2278-6856
Lovely Professional University, School of technology and sciences, NH-1, Punjab, India
Abstract: Decision Trees are used to analyze the data in Then calculate the information gain for well-organized way. These days there is an ample amount of attribute. data to purify that data mining is done and to classify the data After that find the best splitting attribute [1].
data decision trees are produced. There are various algorithms for decision tree generation but traditional algorithms have some performance issues. This paper describes a novel algorithm being proposed which is amalgamation of previous algorithms with some enhancement. Novel approach for generation of decision trees trees provides a well-organized and efficient way for the decision building. This research focuses on implementation of of an algorithm which will improve performance and is better than already existing algorithms. Existing algorithms have many issues regarding accuracy of data which creates problems for an organization and to solve those problems an algorithm is being designed.
each
Keywords: Data Mining, Decision Trees, Tree generation algorithms, Univariate algorithm, Multivariate algorithm, Pruning.
Decision Tree uses greedy algorithm for classification of data. There are two phases in its generation: Growth Phase (tree growing or building) Pruning Phase (tree cutting) The tree growing phase is done in top-down manner. In this phase tree is partitioned till all data items belongs to same class. The tree pruning phase is done in bottom up manner. It is most important phase in which tree is cut back to prevent over fitting and improving accuracy of decision tree. It includes 2 types:Post pruning (done after creation of tree) Pre Pruning (done during creation of tree) [1]. 1.3 Algorithms involved in decision tree generation For creation of decision tree, many algorithms are used which gives good results. Data mining techniques are used for systematic analysis of large data sets. Decision tree is one of the most popular and efficient technique in data mining. Two approaches involved are Univariate approach and Multivariate approach which deals with small amount of data and large amount of data with noise removal respectively. There are many algorithms for generation of classification trees. Some of them are: C4.5 generates classifiers as decision trees in addition to that it also create classifiers in more logical rule set form [4]. CART algorithm is Classification and Regression Trees which is used in the field of Artificial Intelligence, data mining and Machine Learning. [4]. J48 is an implementation of C4.5 algorithm. C4.5 was a version of J48. J48 uses two pruning techniques with bottom up strategy where nodes are replaced by leaf i.e. start from leaves and move towards root node. [2]. M5P algorithm is commonly used to develop regression trees whose leaves are combination of multivariate linear models. The nodes of the tree are chosen over the attribute by which error reduction can be done as a function of the standard deviation of output parameter. M5P algorithm even removes noisy part also which is not possible by Univariate algorithms [5]. The main purpose to propose this paper is to discuss that a data mining algorithm is to be developed which improves the performance and efficiency of consequences. There are 4 sections in this paper. Section 2 describes the Page 240
1. INTRODUCTION
In todays world there is huge amount of data which can be gathered from a variety of sources but all the data is not praiseworthy. In the data mining process, we analyze the data and then summarize it into useful information. Knowledge extraction is used by many organizations to reduce fraud and to mine the uncooked data. 1.1 Decision Trees Decision tree is a tree-like graph which is used for classification of data sets and for taking decisions in decision making system. It is a classification tree which includes a root node, leaf nodes (which represents classes), internal nodes (which represents test conditions) etc. It is used for knowledge discovery process. Decision tree perform classification in two stages: tree growing and tree pruning. Tree pruning is most important step and it is useful for outlier free tree generation. Decision tree is also known as Classification tree as in this classification of dataset is done. It uses greedy algorithm which follow divide and conquer strategy. In this difficult problem is divided into easy problems. 1.2 Generation of classification Trees Various steps to build classification trees are: First step is to check whether all cases refer to same class and to check whether tree is a leaf or not. Volume 3, Issue 1 January February 2014
2. PREVIOUS WORK
Earlier two approaches for classification tree were discovered: Univariate decision tree: This approach is used for small data. During this approach, one attribute is taken at internal nodes and then splitting is performed Multivariate decision tree: This approach is used for large dataset. Univariate tree may sometimes results in inaccurate tree and multivariate decision tree removes this issue. J48 algorithms rules slow for huge and raucous data. Space complexity is extremely more because values are repeatedly in arrays .As a result use of M5P to create decision and regression tree. In M5P algorithm P stands for prime. Multivariate approach is better than Univariate approach as it allow dealing with huge quantity of data. [3]
4. CONCLUSION
A new algorithm is being proposed which involves combination of some features of two data mining algorithms and it improves performance.100% accuracy cant be defined for an algorithm in all applications. After analyzing comparisons of different algorithms it is being analyzed that novel algorithm to be designed can improve the performance of classification tree. In novel algorithm approach, some features of algorithm of Univariate tree can be used and if noise remains then that can be removed by implementing some features of Multivariate algorithm with some additional features of new algorithm to be designed. As a result of doing this performance of classification tree can be improved when compared to earlier implemented algorithms.
References
[1] S.Anupama Kumar, A Naive Based approach of Model Pruned trees on Learners Response, International Journal of Advanced Research in Computer Science and Software Engineering, 2012, 9, pp. 52-57. [2] W. Nor Haizan W. Mohamed , A Comparative Study of Reduced Error Pruning Method in Decision Tree Algorithms, In Proceedings of the IEEE International Conference on Control System, Computing and Engineering, pp. 23 - 25 Nov. 2012. Page 241
AUTHORS
Srishti Taneja received the B.E. degree in Computer Science Engineering from Lovely Professional University in 2012. She is pursuing MTech. From Lovely Professional University. Richa Sapra received the B.Tech and M.Tech M.Tech in Information Technology and Computer Science from Guru Nanak Dev Engg. College, Ludhiana in 2007 and Lovely Professional University, Phagwara in 2012 respectively. From 2012 till now, she is working with Lovely Professional University, Phagwara as an Assistant Professor.
Page 242