Vous êtes sur la page 1sur 5

IPASJ International Journal of Computer Science (IIJCS)

Web Site: http://www.ipasj.org/IIJCS/IIJCS.htm


A Publisher for Research Motivation ........ Email:editoriijcs@ipasj.org
Volume 5, Issue 7, July 2017 ISSN 2321-5992

A Decision Tree Model for Discovering the


Nature of Travel Expenses of employees
C.Ilaiyaraja 1 , Dr.N.R.Ananthanarayanan2

1
Research. Scholar, Department of CSA, SCSVMV University, Kanchipuram, T.N
2
Associate Professor, Department of CSA, SCSVMV University, Kanchipuram,T.N

Abstract
The travelling expenses of a concern can rise to alarming levels if not monitored and regulated. In the present system, a huge
difference in travel expenses was noticed among the same level employees. Hence an attempt was made to regulate the travel
expenses incurred by different level of employees who go on official travel.

Keywords: Data Mining, Decision Tree, Knowledge Discovery, Rule based Classification

1. INTRODUCTION
This research work is aimed at streamlining the travel expenses of employees who go on official travel. For the purpose
of study, the travelling expenses are categorized into
Travel Request Wise details
City Wise details and
Employee Level Wise details

OBJECTIVES OF THE STUDY : The objective of this work is to regulate travel expenses incurred by employees and
categorized them as
Approved identify expenses that fall within a permissible range and can be approved
Refer identify expenses that should be approved by a senior
Query identify expenses for which a query is made to the employee seeking for an explanation

2. REVIEW OF LITERATURE
Teklu Urgessa, Wookjae Maeng and Joong Seek Lee [1] experimented five implementations of three data mining
classification techniques in their work for extracting important insights from tourism data. The dataset contained 12030
instances and 56 attributes before selection and preprocessing. The techniques selected for comparison are
Decision Tree, C4.5 (J48 in Weka),
Random Forest
SMO
Projective Adaptive Resonance Theory (PART)
Multilayer Perceptron (MLP) models are experimented

The authors found out that the Random Forest algorithm outperformed (76%) the rest on the entire attributes.
Yan-Ying Chen, An-Jung Cheng, and Winston H. Hsu[2] focussed on the personalized recommendation framework to
provide a context-aware recommendation system. The experiments were conducted on 19 major cities in the world and
established the fact that using people attributes and travel group types have the potential to improve the personalized
travel recommendation, especially in the location where people have diverse choices of the next stops.

The authors adopted Bayesian learning model in their work because of its effectiveness in recommendation systems
and, most importantly, it can be applied for real time mobile recommendation service.

Volume 5, Issue 7, July 2017 Page 68


IPASJ International Journal of Computer Science (IIJCS)
Web Site: http://www.ipasj.org/IIJCS/IIJCS.htm
A Publisher for Research Motivation ........ Email:editoriijcs@ipasj.org
Volume 5, Issue 7, July 2017 ISSN 2321-5992

Pairaya Juwattanasamran , Sarawut Supattranuwong and Sukree Sinthupinyo [3] in their work attempted to find a
travellers interest extracted from search behaviour when the traveller searches for tourism destination. Questionnaires
were used as a tool to collect data.

Rapid miner program with training set of 2,000 transactions found 78 rules which had minimum support 0.0230
maximum support 0.490 minimum confidence 0.211 and maximum confidence 1.000. The authors claimed that their
paper demonstrated that applying data mining with tourism sector can increase opportunity for the competitive
operations of tourism firm to respond the travellers demand effectively.

Nitesh V Chawla [4] studied issues concerning decision trees and imbalanced data sets. The author claimed that a data
set is imbalanced if the classes are not equally represented. A popular way to deal with imbalanced data sets, according
to the author, is to either over-sample the minority class or under-sample the majority class.

The author presented two versions of over-sampling, one by replicating each minority class example and the other by
creating new synthetic examples(SMOTE- Synthetic Minority Over-sampling TEchnique). The author observed in his
study that SMOTE on an average was better than under-sampling and oversampling techniques.

Jiao Yabing [5] proposed an improved Apriori algorithm in his work. The author also quotes the difference between the
traditional apriori algorithm and the improved apriori algorithm. The optimised algorithm prunes Lk-1 before Ck is
consisted. The author claimed that this will decrease the possibility of combination, decline the number of candidate
item sets in Ck, and reduce the times to repeat the process. For large database, this algorithm can obviously save time
cost and increase the efficiency of data mining.

3. DESIGN OF THE STUDY


3.1 DATA SOURCE The historical data of past expenses incurred are used in this study. The population data
consists of 5242 records after eliminating irrelevant data from the study.
3.2 DATA ORGANISATION: The data considered for analysis in this study was categorized on the basis of
3.2.1 Travel Request Type: Five different travel request types were present on the population data viz. 1.Relocation
2.Transfer 3.Deputation 4. Business Travel 5. Candidate Interview. Business Travel consisted of 85% of the
transactions and the other request types consisted of only 15% of the transactions. Hence the research work is
performed on the basis of Request Type= Business Travel and Request Type = Others.
3.2.2 City: The population data consists of the following cities to which an employee has travelled so far. The Cities
are categorized as Elite, Elegant and Classic based on the number of transactions
3.2.3 Employee Level: The employee levels are categorized as Executive, Senior Executive, Manager and Senior
Manager on the basis of their designation.
3.3 VARIABLES STUDIED The following are the variables considered for the study:
1. Employee Level Categorized as Executive, Sr. Executive, Manager, Sr. Manager
2. Start date of Travel From Date
3. End date of Travel To Date
4. Period of Travel To Date - From Date
5. To Place City Categorized as Elite, Elegant and Classic
6. Request Type Business Travel and Others
7. Amount. For the purpose of research, per day cost is calculated. i.e. amount/No of days on travel which provides
the average travel cost per day. The number of travel days are calculated by deducting From Date from To Date.
3.4 TRAINING DATA The training data comprises of 500 data sets selected from the sampling data and the same is
modified for use by Weka data mining tool.
3.5 IDENTIFYING EXPENSE RANGE The expense range identified for different groups are provided in the table
given below which forms the basis for framing the rules for classification.

Volume 5, Issue 7, July 2017 Page 69


IPASJ International Journal of Computer Science (IIJCS)
Web Site: http://www.ipasj.org/IIJCS/IIJCS.htm
A Publisher for Research Motivation ........ Email:editoriijcs@ipasj.org
Volume 5, Issue 7, July 2017 ISSN 2321-5992

Table 1: Rules Framed


Sl. Request Type City Employee Level Expense Range Percentage Status
1 Business Travel Elite Sr. Manager Any amount - Approved
2 Business Travel Elegan Sr. Manager Any amount - Approved
3 Business Travel Classic Sr. Manager Any amount - Approved
4 Business Travel Elite Manager <=180 66% Approved
5 Business Travel Elite Manager 181-240 26% Refer
6 Business Travel Elite Manager > 240 8% Query
7 Business Travel Elegan Manager <=140 71% Approved
8 Business Travel Elegan Manager 141-220 22% Refer
9 Business Travel Elegan Manager >220 8% Query
10 Business Travel Classic Manager Any amount - Approved
11 Business Travel Elite Sr. Executive <=110 67% Approved
12 Business Travel Elite Sr. Executive 111-160 22% Refer
13 Business Travel Elite Sr. Executive >160 11% Query
14 Business Travel Elegan Sr. Executive <=90 69% Approved
15 Business Travel Elegan Sr. Executive 91-140 21% Refer
16 Business Travel Elegan Sr. Executive >140 10% Query
17 Business Travel Classic Sr. Executive <=130 66% Approved
18 Business Travel Classic Sr. Executive >130 34% Refer
19 Business Travel Elite Executive <=60 66% Approved
20 Business Travel Elite Executive 61-110 28% Refer
21 Business Travel Elite Executive >110 6% Query
22 Business Travel Elegan Executive <=50 67% Approved
23 Business Travel Elegan Executive 51-100 27% Refer
24 Business Travel Elegan Executive >100 6% Query
25 Business Travel Classic Executive <=80 68% Approved
26 Business Travel Classic Executive >80 32% Refer
27 Others Any Any <=180 67% Approved
28 Others Any Any 181-230 30% Refer
29 Others Any Any >230 3% Query

4. METHODOLOGY
4.1 MODEL FOR ANALYSIS: J48 Algorithm was used to display a Decision Tree model for analysis because of its
ability to represent the outcome as a tree like structure. A tree like structure in the result helps to identify the
knowledge in a simple and effective manner. The training and test data are used for this analysis
Pseudocode : The general algorithm for building decision trees is:
1. Check for the base cases.
2. For each attribute a, find the normalized information gain ratio from splitting on a.
3. Let a_best be the attribute with the highest normalized information gain.
4. Create a decision node that splits on a_best.
5. Recur on the sublists obtained by splitting on a_best, and add those nodes as children of node.
Volume 5, Issue 7, July 2017 Page 70
IPASJ International Journal of Computer Science (IIJCS)
Web Site: http://www.ipasj.org/IIJCS/IIJCS.htm
A Publisher for Research Motivation ........ Email:editoriijcs@ipasj.org
Volume 5, Issue 7, July 2017 ISSN 2321-5992

4.2 ANALYSIS OF DATA: Weka Data Mining tool is used for constructing the decision tree because of its
effectiveness in representing the Tree structure for the J48 algorithm. The constructed tree is shown below

Figure 1 Decision Tree

Figure 2 Class Label Prediction

5. CONCLUSION
The Decision Tree model helped to identify the range of expenses that can be approved, expenses that should be
referred and expenses which are to be queried. It also helped to predict the class label based on the supplied training
data.

References
[1] A. Teklu Urgessa, Wookjae Maeng and Joong Seek Lee (2017) Application of Data Mining Techniques for
Tourism Knowledge Discovery ,International Journal of Computer, Electrical, Automation, Control and
Information Engineering Vol:11, No:1
[2] Yan-Ying Chen, An-Jung Cheng, and Winston H. Hsu(2013) Travel Recommendation by Mining People
Attributes and Travel Group Types From Community-Contributed Photos, IEEE Transactions on Multimedia,
Vol. 15, No. 6

Volume 5, Issue 7, July 2017 Page 71


IPASJ International Journal of Computer Science (IIJCS)
Web Site: http://www.ipasj.org/IIJCS/IIJCS.htm
A Publisher for Research Motivation ........ Email:editoriijcs@ipasj.org
Volume 5, Issue 7, July 2017 ISSN 2321-5992

[3] Pairaya Juwattanasamran, Sarawut Supattranuwong and Sukree Sinthupinyo (2013)Applying Data Mining to
Analyze Travel Pattern in Searching Travel Destination Choices, The International Journal Of Engineering And
ScienceVol.2 Issue 4 Pages 38-44.
[4] Nitesh V. Chawla (2003) C4.5 and Imbalanced Data sets: Investigating the effect of sampling method,
probabilistic estimate, and decision tree structure, ICML, Washington DC,
[5] Jiao Yabing (2013) Research of an Improved Apriori Algorithm in Data Mining Association Rules, International
Journal of Computer and Communication Engineering, Vol. 2, No. 1

AUTHORS
Ilaiyaraja C completed MCA degree from Bharathidasan University in 2012. He is an expert in the
area of developing software solutions. He is currently employed as Lead Software Engineer providing
software based support to clients. His research interests are Data Mining.

Dr.N.R .Anantha Narayanan is working as an Associate Professor in the Department of computer


science and Applications, SCSVMV University, Enathur, Kanchipuram, He has received his
Ph.D.,degree in the area of computer science from SCSVMV University. His research interests are E-
Learning,Data Mining, Soft Computing, Data Analysis using SPSS, AMOS and R-Software

Volume 5, Issue 7, July 2017 Page 72

Vous aimerez peut-être aussi