Académique Documents
Professionnel Documents
Culture Documents
I. INTRODUCTION
Information has become a valuable resource
in modern society. However, the security of
information systems is a critical problem because of the openness of networks. In network
security, possessing a complete security system is not possible. Instead, it is more practical to establish a secure system that is easy to
implement and, to simultaneously, construct
a corresponding security assistance system
according to the security policies. Intrusion
detection is a security system that serves as
a supplement to firewalls, which defend the
computer system against attacks [1].
Intrusion detection detects external intrusions and supervises unauthorized activities of
internal users by identifying and responding to
malicious network communication and computer usage behavior. Intrusion detection aims
to detect intrusions by studying the process
and characteristics of intrusion behavior, thereby enabling a real-time response to intrusion
events and the invasion process. Two basic
intrusion detection technologies exist, namely,
anomaly detection and misuse detection [2].
Currently, most of the relevant literature
focuses on intrusion detection based on machine learning and combines different criteria
24
Technique
Dataset
Problem domain
Evaluation method
Devaraju S et al
[3]
ARMA
KDD-Cup99
Misuse Detection
DR, FA
[7]
TANN
KDD-Cup99
Anomaly detection
DR , Accuracy, FP , FN
KDD-Cup99
Anomaly detection
Lin WCet al
[8]
A.S. Eesa et al
CANN
ARMA
5
K-NN/ SVM7
K-NN/SVM
KDD-Cup99
Anomaly detection
N/A
E. de la Hoz et al[6]
GHSOM11
DARPA/NSL-KDD
Anomaly detection
NB12, RF13, DT
Wang FN et al [9]
KPCA14 +RVM15
KDD-Cup99
Anomaly detection
RVM
[5]
DT
Baseline
10
25
areas, evaluation methods, and baseline classifiers. Most studies used the KDD Cup 99 dataset. The most widely used evaluation method
is the accuracy and detection rate. k-NN classifier is the most widely used baseline classifier. Most techniques used in the literature are
two techniques for intrusion detection. Using
clustering method first, and then use the classifier
Tsai CF and Lin CY used intrusion detection technology based on hybrid machine
learning[7]. First, k-means clustering was performed to obtain cluster centers. Then, the triangular area related to two cluster centers with
one data from the given dataset is calculated,
thereby forming a new feature signature of the
data. Finally, the k-NN classifier was used to
classify similar attacks on the basis of the new
feature represented by the triangular area[7].
Lin WC, Ke SW, and Tsai CF also used intrusion detection technology based on hybrid
machine learning[8]. They first used k-means
clustering to obtain cluster centers and nearest
neighbors, and then they obtained a new feature by calculating the distance between data
points and cluster centers or nearest neighbors.
Finally, k-NN classifier and SVM were used
to classify based on the new feature. They
achieved a detection accuracy close to that of
normal
2000
Density
Density
R2L
500
0
1
71
141
211
281
351
421
491
1000
500
1
71
141
211
281
351
421
491
1
81
161
241
321
401
481
4000
1000
DOS
500
U2R
Density
To illustrate the validity of density, the density distribution of the KDD Cup 99 corpus of
normal and abnormal classes are calculated as
shown in Figure 1. The definition of local density will be a subject of focus later.
Figure 1 indicates that the density distribution of each data type is different. The local
density of normal data is most distributed at
higher values of approximately 497500, and
PROBING
Density
1000
0
80
160
240
320
400
480
1
81
161
241
321
401
481
5000
Density
26
Nearest
neighbors Ni
Distance
between Ci
and point (d1)
Dlabel
Distance
dataset S
Local density
i
Label=
attacks
No label
data
K-NN
classify
Label=
normal
27
between Ni
and point (d2)
Testing
culating the local density. The dotted line represents the truncated distance dc. To calculate
the local density of point I, a circle is drawn
around center I, and the radius equals d c.
Then, the number of points within this circle
is counted. The density of the point I shown in
the graph is equal to 5.
C1
C2
C5
A
C3
C4
Fig.3 Extracting the cluster centers with the use of k-means and determining the
nearest neighbor
(3)
The other distance is the distance between
the sample point and its nearest neighbor. The
calculation method used in this paper is Euclidean distance. Assuming that the original
dataset is an M dimensional vector, then the
distance d2 from point A (a1,a2,,am) to
the nearest neighbor point B (b1,b2,bm) is
defined as Formula (4)
dc
I
(4)
Figure 5 shows an example of calculating
said 2 distances in case of 5 classes. The black
solid line represents the distances between
data point A and 5 cluster centers. The red
dashed line represents the distance between
data point A and its nearest neighbor B. After
obtaining these two distances, d1 and d2 are
added together. A new distance Di is obtained
for each sample point to act as the first new
feature, as defined in Formula (5)
(5)
China Communications July 2016
C1
C2
C5
A
C3
Fig.5 Distance between point A and cluster centers, and distance between point A
and nearest neighbor
28
Normal
PROBING
DOS
U2R
R2L
Tags
Normal
Normal
TN
FP
Attacks
FN
TP
120000
100000
80000
4.1.2 Evaluations
66.65
%
64.57%
28.77
33.09 %
%
60000
40000
20000
0
Attacks
1.47 2.41
%
%
Normal PROBING
DOS
(6)
U2R
R2L
(7)
(8)
True positives (TP): the number of malicious executables correctly classified as malicious
True negatives (TN): the number of benign
programs correctly classified as benign
False positives (FP): the number of benign
programs falsely classified as malicious
False negative (FN): the number of malicious executables falsely classified as benign
4.1.1 Dataset
training and testing sets are combined to obtain the two-dimensional feature vector for the
k-NN classifier testing.
IV. EXPERIMENTS
29
sional of feature descriptions and one dimension of category label for a total of 42 dimensions. Similar to the work of Zhang et al. [11],
19 dimensional characteristics are selected.
After taking out 19 dimensional data, quantitative data need to be normalized. Afterwards,
we need to remove duplicate data to obtain a
single dataset. Figure 6 shows the composition
of the remaining data. The training dataset has
119845 data, and the training and testing datasets have 177463 data.
V. CONCLUSION
In this paper, we proposed a new hybrid machine learning-based intrusion detection method called DCNN, which effectively reduces
the feature dimension of the original dataset
into a simple and representative two-dimensional vector. It saves time and improves the
accuracy in our experiment on the KDD Cup
99 dataset. Experimental results show that
China Communications July 2016
Normal
PROBING
DOS
U2R
R2L
Accuracy
Normal
68741
1852
6124
593
79
88.83%
PROBING
126
1594
21
11
90.77%
DOS
8147
1218
30287
76.38%
U2R
19
27
51.92%
R2L
241
44
254
452
45.38%
DOS
U2R
R2L
Accuracy
69630
240
7079
80
360
89.97%
PROBING
94
1595
60
90.83%
DOS
3056
793
35549
118
136
89.65%
U2R
10
33
63.16%
R2L
162
31
801
80.42%
Normal
PROBING
DOS
U2R
R2L
Accuracy
Normal
76375
896
82
11
25
98.69%
PROBING
42
1711
97.44%
DOS
1992
502
36955
69
134
93.20%
U2R
14
35
67.31%
R2L
67
15
48
862
86.55%
KNN
120
100
96.74
84.3689.79
80
CANN
DCNN
91.9694.93
78.91
60
40
11.17
4.55 2.69
20
0
Accuracy(%)
Detection
Rate(%)
False
Alarm(%)
30
References
[1] Yao Lan, Wang Xinmei. Present situation and
development trend of intrusion detection
system. Telecommunications Science. 2002.
(12):31-35.
[2] YAO Jun-lan. Intrusion detection technology
and its development trend. Information Technology. 2006.(4):172-175
[3] Devaraju S, Ramakrishnan S. Detectionof
Attacks for IDS using AssociationRuleMining Algorithm. IETE JOURNAL OF RESEARCH.
2015.61(6)624-633.
[4] ZHANG Yi, LIU Yan-heng et al. Intrusion detection system based on association rules. Journal
of Jilin University. 2006. (2).
[5] A.S. Eesa, Z. Orman, A.M.A. Brifcani. A novel feature-selection approach based on the cuttlefish
optimization algorithm for intrusion detection
systems. EXPERT SYSTEMS WITH APPLICATIONS. 2015. 42(5):26702679.
[6] E. de la Hoz, E. de la Hoz, A. Ortiz, J. Ortega, A. Martinez-Alvarez. Feature selection by
multi-objective optimization: application to network anomaly detection by hierarchical self-organising maps. KNOWLEDGE-BASED SYSTEMS.
2014. 71(SI):322338.
[7] Tsai CF, Lin CY. A triangle area based nearest
neighbors approach to intrusion detection.
PATTERN RECOGNITION. 2010. 43(1): 222-229.
[8] Lin WC, Ke SW, Tsai CF. CANN: An intrusion
detection system based on combining cluster centers and nearest neighbors. KNOWLEDGE-BASED SYSTEMS. 2015. 78:13-21.
31
[9] Wang FN,Wang SS. Solving theintrusiondetectionproblem with KPCA-RVM. DESIGN, MANUFACTURING AND MECHATRONICS. 2016.520527.
[10] Rodriguez A, Laio A. Clustering by fast search
and find of density peaks. SCIENCE. 2014.
344(6191):1492-1496.
[11] X.-Q. Zhang, C.H. Gu, J.J. Lin. Intrusion detection
system based on feature selection and support
vector machine. International Conference on
Communications and Networking in China.
2006. pp. 15.
Biographies
Xiujuan Wang, received her PhD in Information and
Signal Processing in July 2006 at the Beijing University of Posts and Telecommunications. She is currently
an instructor lecturer at the College of Computer Sciences, Beijing University of Technology. Her research
interests include information and signal processing
and network security. E-mail: xjwang@bjut.edu.cn
Chenxi Zhang, is currently pursuing her Master at
the College of Computer Sciences, Beijing University
of Technology. Her research interests include information and network security. The corresponding
author, E-mail: 15110005031 @163.com
Kangfeng Zheng, received his PhD in Information
and Signal Processing in July 2006 at Beijing University of Posts and Telecommunications. He is currently
an associate professor at the School of Computer
Science and Technology, Beijing University of Posts
and Telecommunications. His research interests include networking and system security, and network
information processing. E-mail: kfzheng@bupt.edu.
cn