Vous êtes sur la page 1sur 3

International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)

Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume 3, Issue 1, January February 2014 ISSN 2278-6856

A Hybrid Approach for Classification Tree Generation


Srishti Taneja1 , Ms. Richa Sapra2
1,2

Lovely Professional University, School of technology and sciences, NH-1, Punjab, India

Abstract: Decision Trees are used to analyze the data in Then calculate the information gain for well-organized way. These days there is an ample amount of attribute. data to purify that data mining is done and to classify the data After that find the best splitting attribute [1].
data decision trees are produced. There are various algorithms for decision tree generation but traditional algorithms have some performance issues. This paper describes a novel algorithm being proposed which is amalgamation of previous algorithms with some enhancement. Novel approach for generation of decision trees trees provides a well-organized and efficient way for the decision building. This research focuses on implementation of of an algorithm which will improve performance and is better than already existing algorithms. Existing algorithms have many issues regarding accuracy of data which creates problems for an organization and to solve those problems an algorithm is being designed.

each

Keywords: Data Mining, Decision Trees, Tree generation algorithms, Univariate algorithm, Multivariate algorithm, Pruning.

Decision Tree uses greedy algorithm for classification of data. There are two phases in its generation: Growth Phase (tree growing or building) Pruning Phase (tree cutting) The tree growing phase is done in top-down manner. In this phase tree is partitioned till all data items belongs to same class. The tree pruning phase is done in bottom up manner. It is most important phase in which tree is cut back to prevent over fitting and improving accuracy of decision tree. It includes 2 types:Post pruning (done after creation of tree) Pre Pruning (done during creation of tree) [1]. 1.3 Algorithms involved in decision tree generation For creation of decision tree, many algorithms are used which gives good results. Data mining techniques are used for systematic analysis of large data sets. Decision tree is one of the most popular and efficient technique in data mining. Two approaches involved are Univariate approach and Multivariate approach which deals with small amount of data and large amount of data with noise removal respectively. There are many algorithms for generation of classification trees. Some of them are: C4.5 generates classifiers as decision trees in addition to that it also create classifiers in more logical rule set form [4]. CART algorithm is Classification and Regression Trees which is used in the field of Artificial Intelligence, data mining and Machine Learning. [4]. J48 is an implementation of C4.5 algorithm. C4.5 was a version of J48. J48 uses two pruning techniques with bottom up strategy where nodes are replaced by leaf i.e. start from leaves and move towards root node. [2]. M5P algorithm is commonly used to develop regression trees whose leaves are combination of multivariate linear models. The nodes of the tree are chosen over the attribute by which error reduction can be done as a function of the standard deviation of output parameter. M5P algorithm even removes noisy part also which is not possible by Univariate algorithms [5]. The main purpose to propose this paper is to discuss that a data mining algorithm is to be developed which improves the performance and efficiency of consequences. There are 4 sections in this paper. Section 2 describes the Page 240

1. INTRODUCTION
In todays world there is huge amount of data which can be gathered from a variety of sources but all the data is not praiseworthy. In the data mining process, we analyze the data and then summarize it into useful information. Knowledge extraction is used by many organizations to reduce fraud and to mine the uncooked data. 1.1 Decision Trees Decision tree is a tree-like graph which is used for classification of data sets and for taking decisions in decision making system. It is a classification tree which includes a root node, leaf nodes (which represents classes), internal nodes (which represents test conditions) etc. It is used for knowledge discovery process. Decision tree perform classification in two stages: tree growing and tree pruning. Tree pruning is most important step and it is useful for outlier free tree generation. Decision tree is also known as Classification tree as in this classification of dataset is done. It uses greedy algorithm which follow divide and conquer strategy. In this difficult problem is divided into easy problems. 1.2 Generation of classification Trees Various steps to build classification trees are: First step is to check whether all cases refer to same class and to check whether tree is a leaf or not. Volume 3, Issue 1 January February 2014

International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)


Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume 3, Issue 1, January February 2014 ISSN 2278-6856
previous work, section 3 describes current status of the work that involves novel hybrid approach of algorithm to be designed and flow out of algorithm and Section 4 comprises conclusion. performance of decision tree so that efficiency may also get increased for that a novel approach of algorithm is being implemented. Novel approach is combination of several tasks like it creates a decision tree and after that it also removes outlier. Flow of novel approach of algorithm:-New algorithm to be designed will be applied on processed data and this novel approach comprises of some features of traditional algorithms with some novel features. As a result an efficient outcome is obtained.

2. PREVIOUS WORK
Earlier two approaches for classification tree were discovered: Univariate decision tree: This approach is used for small data. During this approach, one attribute is taken at internal nodes and then splitting is performed Multivariate decision tree: This approach is used for large dataset. Univariate tree may sometimes results in inaccurate tree and multivariate decision tree removes this issue. J48 algorithms rules slow for huge and raucous data. Space complexity is extremely more because values are repeatedly in arrays .As a result use of M5P to create decision and regression tree. In M5P algorithm P stands for prime. Multivariate approach is better than Univariate approach as it allow dealing with huge quantity of data. [3]

3. CURRENT STATUS OF WORK


3.1 Scope of the Study This research focuses on implementation of an algorithm which is better than other already existing algorithms for decision tree generation. Existing algorithms have many issues regarding accuracy of data which creates problems for an organization and for solving those problems an algorithm is being designed. A novel algorithm is to be designed so that accuracy of decision tree gets increased and an efficient decision tree is generated. Tree generated may not be as much accurate so to increase its accuracy novel algorithm is to be designed. Using novel approach of algorithm, efficiency can also be improved i.e. time complexity or we can say less time will be taken to generate decision trees as a result organizations can take better decisions. Results generated after using the novel approach will also be more accurate and contain fewer errors. 3.2 Methodology Data is collected and this raw data is mined for knowledge extraction. Then decision trees are generated using tree generating tools and different mining techniques are applied on data. By doing this previously unknown knowledge is extracted in order to improve decision making process. Collected data may contain a lot of complex aggregation from raw data. The aggregation from raw data makes the knowledge discovery in the form of decision trees. Raw data is converted to knowledge discovery in the form of well structured graphical representation known as decision trees. We try to explore how to improve the Volume 3, Issue 1 January February 2014 Figure 1 Flowchart of proposed work

4. CONCLUSION
A new algorithm is being proposed which involves combination of some features of two data mining algorithms and it improves performance.100% accuracy cant be defined for an algorithm in all applications. After analyzing comparisons of different algorithms it is being analyzed that novel algorithm to be designed can improve the performance of classification tree. In novel algorithm approach, some features of algorithm of Univariate tree can be used and if noise remains then that can be removed by implementing some features of Multivariate algorithm with some additional features of new algorithm to be designed. As a result of doing this performance of classification tree can be improved when compared to earlier implemented algorithms.

References
[1] S.Anupama Kumar, A Naive Based approach of Model Pruned trees on Learners Response, International Journal of Advanced Research in Computer Science and Software Engineering, 2012, 9, pp. 52-57. [2] W. Nor Haizan W. Mohamed , A Comparative Study of Reduced Error Pruning Method in Decision Tree Algorithms, In Proceedings of the IEEE International Conference on Control System, Computing and Engineering, pp. 23 - 25 Nov. 2012. Page 241

International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)


Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume 3, Issue 1, January February 2014 ISSN 2278-6856
[3] Dr. Neeraj Bhargava, Decision Tree Analysis on J48 algorithm for Data Mining, International Journal of Advanced Research in Computer Science and Software Engineering, 2013. [4] http://www.slideshare.net/asad.taj/top10-algorithmsdata-mining [5] Du Zhang & Jeffrey J.P. Tsai, Advances in Machine Learning Applications in Software Engineering [6] Jiawei Han and Micheline Kamber, Data Mining Concepts and Techniques, Illinois University, Urbana Champaign, 2nd edition

AUTHORS
Srishti Taneja received the B.E. degree in Computer Science Engineering from Lovely Professional University in 2012. She is pursuing MTech. From Lovely Professional University. Richa Sapra received the B.Tech and M.Tech M.Tech in Information Technology and Computer Science from Guru Nanak Dev Engg. College, Ludhiana in 2007 and Lovely Professional University, Phagwara in 2012 respectively. From 2012 till now, she is working with Lovely Professional University, Phagwara as an Assistant Professor.

Volume 3, Issue 1 January February 2014

Page 242

Vous aimerez peut-être aussi