Académique Documents
Professionnel Documents
Culture Documents
6 ISSN: 1837-7823
Abstract
Intrusion Detection can be defined as the act of detecting actions that attempt to compromise the confidentiality, integrity or availability of a resource. There are many approaches to intrusion detection like anomaly based, signature based and machine learning based. Machine learning approach can prove to be very useful for developing intrusion detection systems. This paper presents the comparison of various Rule based machine learning algorithms. These learning algorithms are categorized under supervised learning. The NSL-KDD dataset [9] and Waikato Environment for Knowledge Analysis (WEKA) [3] is used to evaluate the performance of the machine learning algorithms. Keywords: Intrusion detection, NSL-KDD dataset, WEKA
1. Introduction
The purpose of network security is to protect the network from unauthorized access, destruction and disclosure. Many techniques have emerged in the field of network security that helps in the protection of computer systems and computer networks. One of the techniques used for making the network secure and detecting intrusions is Intrusion Detection System. Intrusion Detection System is a mechanism that detects unauthorized and malicious activity present in the computer systems. There are mainly two approaches to Intrusion Detection Signature detection and Anomaly detection. Machine learning techniques have also been applied to Intrusion detection in many ways. This paper presents the evaluation and results of rule base machine learning algorithms to NSLKDD dataset [12].
2. Review of Literature
Intrusion Detection System helps information systems to deal with attacks. An IDS gathers and analyzes information from various areas within a computer or a network to identify the intrusions which includes attacks from outside the organization and as well as attacks from within the organization. There are mainly two approaches to intrusion detection - Signature based detection and Anomaly based detection. The signature-based approach looks for the signatures of known attacks, which exploit weaknesses in system and application software [5]. It uses pattern matching techniques against a frequently updated database of attack signatures. It is useful to detect already known attack but not the new ones. Many attacks can be detected by this approach because many attacks have clear and distinct signatures. An Intrusion Detection system that looks at network traffic and detects data that is incorrect, not valid or generally abnormal is called anomaly-based detection. This method is useful for detecting unwanted traffic that
42
International Journal of Computational Intelligence and Information Security, June 2013 Vol. 4, No. 6 ISSN: 1837-7823 is not specifically known [5]. There are various methods that can be used in Anomaly detection approach to detect anomalous behavior from normal behavior like machine learning, statistical methods. Machine learning algorithms can be used in Intrusion detection problems to find interesting intrusion patterns in data [10]. This requires the data to be in labelled form. For this purpose NSL- KDD dataset is taken which has the required characteristics. The rule based learning algorithms are applied on this dataset. K.. Shafi et al. [11] proposes a methodology to create a fully labelled network intrusion detection dataset which is suitable for machine learning algorithms. The dataset is created using real background traffic and simulated attacks. This dataset is tested on supervised machine learning algorithms in WEKA. F. Gharibian and A. Ghorbani (2007) [2] compare the supervised machine learning techniques. The algorithms used are Naive Bayes, Gaussian, Decision Tree and Random Forests. The ability of each technique for detecting the attack categories in the KDD dataset has been compared. From the results, the proper technique for identifying an attack category is also proposed. According to M. Panda and M. Patra (2007) [9], the use of nave bayes for anomaly based network intrusion detection technique produces better results in terms of false positive rate, cost, and computational time when applied to KDD99 data sets as compared to a back propagation neural network based approach. The experimentation is done on WEKA program. J. Zhang et al. (2008) [13] focuses on a framework that apply a data mining algorithm called random forests in misuse, anomaly, and hybrid-network-based IDSs. In misuse detection, patterns of intrusions are built automatically by the random forests algorithm over training data. In anomaly detection, novel intrusions are detected by the outlier detection mechanism of the random forests algorithm. G. Oreku and F. Mtenzi (2009) [8] presents the use of data mining techniques to discover consistent and useful patterns of system features that describe program and user behavior. According to them the useful set of relevant system features can recognize anomalies and known intrusions. D. Zhao (2010) [14] et al. proposes a hybrid IDS which combines network and host IDS, with anomaly and misuse detection mode. Data mining programs are applied to learn rules that can capture the behavior of intrusions and normal activities. K. Qazanfari et al. (2012) [10] have proposed an Intrusion detection system which uses Support Vector Machine (SVM) and Multi Layer Perceptron (MLP) machine learning algorithms to classify normal from abnormal behaviors. S. M. Hussein et al. (2012) [4] discusses the anomaly detection engine that will be based on NaveBayes algorithm, J48graft Decision Tree algorithm and Bayes Net algorithm in WEKA program.
3. Methodology
This section presents the methodology used to carry out the work. The workflow is described in Figure 1. The main aim is to analyze the performance of rule based classifiers present in WEKA. For this purpose NSL KDD dataset is used. The NSL KDD dataset is applied to the rule based classifiers. The output of the classifiers is compared to each other. The measures used to compare the results are accuracy, false alarm rate and the number of instances that are correctly and incorrectly classified. The rule based classifier giving the best performance is deduced by analyzing the results based on above metrics.
43
International Journal of Computational Intelligence and Information Security, June 2013 Vol. 4, No. 6 ISSN: 1837-7823
44
International Journal of Computational Intelligence and Information Security, June 2013 Vol. 4, No. 6 ISSN: 1837-7823 For the purpose of experimentation KDDTest-21.ARFF - a subset of the KDDTest+.arff file is taken. It contains 11850 instances [7]. The attribute names and types are listed in Table 1. Table 1: Attribute name and type in KDDTest-21.arff file Attribute Name Duration protocol_type Service Flag src_bytes dst_bytes Land wrong_fragment Urgent Hot num_failed_logins logged_in num_compromised root_shell su_attempted num_root num_file_creations num_shells num_access_files num_outbound_cmds is_host_login Attribute Type Real Nominal Nominal Nominal Real Real Nominal Real Real Real Real Nominal Real Real Real Real Real Real Real Real Nominal Attribute Name is_guest_login Count srv_count serror_rate srv_serror_rate rerror_rate srv_rerror_rate same_srv_rate diff_srv_rate srv_diff_host_rate dst_host_count dst_host_srv_count dst_host_same_srv_rate dst_host_diff_srv_rate dst_host_same_src_port_rate dst_host_srv_diff_host_rate dst_host_serror_rate dst_host_srv_serror_rate dst_host_rerror_rate dst_host_srv_rerror_rate Class Attribute Type nominal real real real real real real real real real real real real real real real real real real real nominal
45
International Journal of Computational Intelligence and Information Security, June 2013 Vol. 4, No. 6 ISSN: 1837-7823 5) OneR: It generates a set of rules that test one particular attribute and learns a one-level decision tree. 6) Part: Class for generating a PART decision list. 7) NNge:It is Nearest neighbour like algorithm using non-nested generalized exemplars. 8) Ridor: This is implementation of a Ripple-Down Rule learner.
46
International Journal of Computational Intelligence and Information Security, June 2013 Vol. 4, No. 6 ISSN: 1837-7823 Table 2: Comparison of various measures for rule based classifiers Rule based Classifier Correctly classified Instances ConjunctiveRule DecisionTable DTNB JRip OneR Part NNge Ridor 10258 11268 11314 11492 10867 11540 11401 11386 Incorrectly Classified Instances 1592 582 536 358 983 310 449 464 86.5654 % 95.0886 % 95.4768 % 96.9789 % 91.7046 % 97.384 % 96.211 % 96.0844 % 8.7% 3.5% 3.4% 1.8% 4.3% 1.7% 2% 3.2% Overall Accuracy False Alarm rate (class=Normal)
Table 3: Class wise accuracy Rule based classifier ConjunctiveRule DecisionTable DTNB JRip OneR Part NNge Ridor Normal 65% 88.8% 90.5% 91.7% 73.9% 93.4% 88% 92.8% Anomaly 91.3% 96.5% 96.6% 98.2% 95.7% 98.3% 98% 96.8%
N 0 . o f
i n s t a n c e s
Correctly classified Instances Incorrectly classified Instances Figure 2: Classification of Instances by rule based classifiers
47
International Journal of Computational Intelligence and Information Security, June 2013 Vol. 4, No. 6 ISSN: 1837-7823
10.00% 9.00% 8.00% 7.00% 6.00% 5.00% 4.00% 3.00% 2.00% 1.00% 0.00%
Figure 4: False alarm rate of rule based classifiers 120% 100% 80% 60% Class wise Accuracy 40% 20% 0% Normal Anomaly
Figure 5: Accuracy of rule based classifiers for class Normal and class Anomaly
48
International Journal of Computational Intelligence and Information Security, June 2013 Vol. 4, No. 6 ISSN: 1837-7823
References
[1] Chandolikar n. S., and Nandavadekar V. D., (2012), comparative analysis of two algorithms for intrusion attack classification using kdd cup dataset, International Journal of Computer Science and Engineering ( IJCSE ) Vol.1, Issue 1, pp. 81-88. [2] Gharibian, F., & Ghorbani, A. A. (2007), Comparative study of supervised machine learning techniques for intrusion detection, In Communication Networks and Services Research, CNSR'07.,Fifth Annual Conference on ,pp. 350-358, IEEE. [3] Hall M., Frank E., Holmes G., Pfahringer B., Reutemann P., Witten I. H., (2009),The WEKA Data Mining Software: An Update, SIGKDD Explorations, Volume 11, Issue 1. [4] Hussein S. M., Ali F., and Kasiran Z., (2012), Evaluation Effectiveness of Hybrid IDS Using Snort with Nave Bayes to Detect Attacks, IEEE. [5] Information Assurance Tools Report Intrusion Detection Systems, (2009), IATAC, Herndon, VA. [6] Kurundkar G.D., Naik N.A. and Dr.Khamitkar S.D,(2012), Network Intrusion Detection using SNORT, International Journal of Engineering Research and Applications (IJERA) ,Vol. 2, Issue 2, pp. 1288-1296. [7] Nsl-kdd data set for network-based intrusion detection systems. (2009) Available on: http://nsl.cs.unb.ca/NSL-KDD/. [8] Oreku, G. S., & Mtenzi, F. J., (2009), Intrusion Detection Based on Data Mining, In Dependable, Autonomic and Secure Computing, DASC'09, Eighth IEEE International Conference on Dependable, Autonomic and Secure Computing pp. 696-701, IEEE. [9] Panda, M., and Patra, M. R., (2007), Network intrusion detection using naive bayes., International journal of computer science and network security, Vol.7, No.12. [10] Qazanfari K., Mirpouryan . S., and Gharaee H., (2012), A Novel Hybrid Anomaly Based Intrusion Detection Method, 6.th International Symposium on Telecommunications (IST'2012). [11] Shafi, K., Abbass, H. A., and Zhu, W. A., Methodology to Evaluate Supervised Learning Algorithms for Intrusion Detection. [12] Tavallaee M., Bagheri E., Lu W., and Ghorbani A. A., (2009), A Detailed Analysis of the KDD CUP 99 Data Set, Proceedings of the Second IEEE Symposium on Computational Intelligence for Security and Defence Applications. [13] Zhang J., Zulkernine M., and Haque A., (2008), Random-Forests-Based Network Intrusion Detection Systems, IEEE Transactions On Systems, Man, And CyberneticsPart C: Applications And Reviews, Vol. 38, No. 5. [14] Zhao D., Xu Q., Feng Z., (2010), Analysis and Design for Intrusion Detection System Based on Data Mining, Second International Workshop on Education Technology and Computer Science.
49