Vous êtes sur la page 1sur 12

ARTICLE IN PRESS

Journal of Network and Computer Applications 32 (2009) 721 732

Contents lists available at ScienceDirect

Journal of Network and Computer Applications


journal homepage: www.elsevier.com/locate/jnca

Predicting intrusion goal using dynamic Bayesian network with transfer


probability estimation$
Li Feng a,c,, Wei Wang b, Lina Zhu a, Yi Zhang a
a
Center of Dependable and Secure Computing (CDSC) of WuHan Digital Engineering Institute, WuHan, Hubei Provence 430074, China
b
French National Institute for Research in Computer Science and Control (INRIA) Sopia antipolis, France
c
State Key Laboratory for Manufacturing Systems (SKLMS) and MOE Key Lab for Intelligent Networks and Network Security (KLINNS), Xian Jiaotong University, Xian, China

a r t i c l e in fo abstract

Article history: Predicting the intentions of an observed agent and taking corresponding countermeasures is the
Received 2 March 2008 essential part for the future proactive intrusion detection systems (IDS) as well as intrusion prevention
Received in revised form systems (IPS). In this paper, an approach of dynamic Bayesian network with transfer probability
21 May 2008
estimation was developed to predict whether the goal of system call sequences is normal or not, with
Accepted 13 June 2008
early-warnings being launched, so as to ensure that some appropriate countermeasures could be taken
in advance. Since complete set of system call state transfer can hardly be built in real environments, the
Keyword: empirical results show that the newly emerging system call transfer would have great impact on the
Intrusion prediction prediction performance if we straightly use dynamic Bayesian network without transfer probability
Plan recognition
estimation. Therefore, we estimate the probability of new state transfer to predict the goals of system
Dynamic Bayesian network
call sequences together with those in conditional probability table (CPT). It surmounts the difculties of
Transfer probability estimation
System call sequences manually selecting compensating parameters with dynamic Bayesian network approach [Feng L, Guan
X, Guo S, Gao Y, Liu P. Predicting the intrusion intentions by observing system call sequences. Computers
& Security 2004; 23/3: 241252] and obviously makes our prediction model more applicable. The
University of New Mexico (UNM) and KLINNS data sets were analyzed and the experimental results
show that it performs very well for predicting the goals of system call sequences with high accuracy and
furthermore dispenses with much more manual work for selecting compensating parameters.
& 2008 Elsevier Ltd. All rights reserved.

1. Introduction various intrusion detection systems (IDS) components to work


together.
Computer security is a rapidly developing and extremely Some audit information such as system logs of host, trafcs of
important research domain. Hackers use various attacking skills to network or IDS alarms can trace the hackers behavior in various
intrude or crash the targets to achieve their goals. Thus, predicting viewpoints. In our work, we choose system call sequences as
the hackers goal becomes very vital for a proactive intrusion observation data. Each operating system has its own inner
prevention system. Intrusion detection must turn to predict the functions built in kernel. The functions are used for each calling
future actions of attackers from detecting the attacks already from user space of system. In UNIX-like systems, the functions
happened. Geib and Goldman (2001) rst proposed a model based that used are called system calls, which represent the transitions
on plan recognition to predict the goals of hackers, which depends from user space to kernel space. Accordingly, to most extent,
on a hierarchical plan library that provides recipes for achieving sequences of system calls in kernel space represent a plan of user
goals. Huang and Wicks (1999) addressed a conceptual architec- or hacker in user space to achieve a certain goal. In general, it can
ture about identifying the attack strategy, which aims to drive be classied into two main types: normal and abnormal. Anomaly
system call transfer can be regard as the wrong or malicious
action planning of an observed process reasonably. Therefore, it
$
The research presented in this paper was supported in part by 863 High Tech can be great benecial to block the malicious action through
Plan (No. 2007AA01Z464) of China and the Defense Pre-Research Projects of the examining the plan or goal of a calling sequence of kernel
Eleventh Five-Year-Plan of China (Nos. C0820061362-06 and A1420080183). functions.
 Corresponding author at: Center of Dependable and Secure Computing (CDSC)
In recent years, a lot of research activities for the intrusion
of WuHan Digital Engineering Institute, WuHan, Hubei Provence 430074 China.
detection used system call sequences as valuable data sources. In
Tel.: +86 2787787006.
E-mail addresses: fengli_xjtu@163.com (L. Feng), wei.wang.email@gmail.com 1996, Forrest et al. (1996) initially introduced a simple anomaly
(W. Wang), an235000@163.com (L. Zhu), zhangyi98@sohu.com (Y. Zhang). detection method called time-delay embedding (tide), based on

1084-8045/$ - see front matter & 2008 Elsevier Ltd. All rights reserved.
doi:10.1016/j.jnca.2008.06.002
ARTICLE IN PRESS

722 L. Feng et al. / Journal of Network and Computer Applications 32 (2009) 721732

monitoring system calls invoked by active and privileged vulnerable target host. Through building the structure of DBN
processes. Proles of normal behavior were built by enumerating and condition probability table (CPT) by training data, system call
all xed length of distinct and contiguous system calls that occur sequences can be modeled to predict their intentions. Theoreti-
in the training datasets and unmatched sequences in actual cally, Bayesian formula is based on the total probability theorem
detection are considered anomalous. In subsequent research, the and exclusive partition of event space. However, in reality, it is
approach is extended by various methods. For example, (Lee and difcult to have ideal partition of system call transfers in normal
Stolfo, 1998) explored data mining approach to study a sample of and anomalous sequences sets. There exists those sequences with
system call data and characterize the sequences contained in different types of goals but have the same sub-sequences or it may
normal data by a small set of rules. The sequences violating those also be true that certain parts of an anomaly system call sequence
rules were then treated as anomalies for monitoring and detection can be the same as that of a normal sequence. To mitigate the
purpose. Warrender et al. (1999) proposed a Hidden Markov impact from prediction errors, we proposed an approach on
Model (HMM) based method for modeling and evaluating parameter compensation to condition probability distribution of
invisible events. Yeung and Ding (2003) and Lee and Xiang normal system call sequences and successfully predict the goals
(2001) used information-theoretic measures for anomaly detec- with good accuracy (Feng et al., 2004). However, the approach of
tion. Liao and Vemuri (2002) used K-nearest neighbor (K-NN) parameter compensation introduced much more unnecessary
classier and (Hu et al., 2003) applied robust support vector manual selection in trials and errors and can hardly choose the
machines (SVM) to model program behavior and classied each ideal compensating parameters. The defects hamper its wide
process as normal or abnormal based on system call data. Sharma application in real environments. To efciently solve the above
et al. (2007) adopted kernel based similarity measures to detect problem, we propose an approach on TPE based on DBN. Since the
anomaly events. In our previous work, we also employed non- Bayesian theory needs some priori information about observed
negative matrix factorization (NMF) (Wang et al., 2004), self agents, the complete information about the system call transfers
organizing maps (SOM) (Wang et al., 2006) and principal is hard to be obtained totally. In experiments, we discover that the
component analysis (PCA) (Wang et al., 2008) to prole program newly emerging transfer states that are not included in CPT have
and user behavior using system call sequences. These existing bad impact on the prediction performance. The approach in Feng
methods (Forrest et al., 1996; Lee and Stolfo, 1998; Warrender et al. (2004) only simply species a very small xed value to the
et al., 1999; Yeung and Ding, 2003; Lee and Xiang, 2001; Liao and probability of newly emerging transfer states. In this paper, we
Vemuri, 2002; Hu et al., 2003; Sharma et al., 2007; Wang et al., estimate the probability of newly emerging system call transfer
2004, 2006, 2008) based on system call data are shown as that are not included in CPT to greatly reduce the computation
effective for detecting malicious actions. However, they are only cost for manual selection of compensating parameters.
able to detect intrusions after attacks have occurred, either In the reminder of this paper, we will organize the paper as
partially or fully, which makes it difcult to block the attack in follows. Section 2 claries how to model an intrusion goal
real time. Therefore, it is most desirable to incorporate a prediction system by DBN with TPE . In Section 3, the contrastive
prediction function into system call based IDS for predicting the experimental results are given based on DBN without TPE and the
type of goals so that the proper response can be taken before model with TPE. Section 4 draws some conclusion about our
substantial damage harms the systems. approach and outlines the future work.
Plan recognition is the process of inferring the goals of an agent
from observations of an agents action, which is considered as an
inference problem under uncertain conditions (Robert et al., 1999;
2. The model for predicting the goals based on DBN with
Charniak and Goldman, 1993). Current related research mainly
transfer probability estimation (TPE)
focuses on: (1) predicting plans or goals during cooperative
interactions; (2) understanding stories (natural language proces-
2.1. Structure and condition probability table (CPT)
sing); (3) recognizing the plans of an agent that is unaware of the
plans being monitored, known as keyhole plan recognition
(Albrecht et al., 1997; Wrn and Stenborg, 1995). There are two To discover the irregularity of the system call sequences and
main features of the keyhole plan recognition: (1) the monitored detect intrusions, the following state variables are dened (Feng
agent is not aware of that its behavior is monitored and analyzed; et al., 2004).
(2) the observed data is incomplete. In the traditional plan
recognition methods, the plan library is built manually, which (1) System call (S): represents all possible calls in a system call
greatly hinders the wide application of the plan recognition sequence. The size of state space of system calls is |S|, which is
method. To overcome this obstacle, machine-learning approaches the total number of possible calls in an operating system.
are applied to collect information about the plans and to make (2) Goal (Q): being a set of state variables describing the goal of a
decisions. Being capable of modeling a time-varying system, sequence as normal or abnormal. Goal Q consists of two
dynamic Bayesian network (DBN) is one of the few methods that classes of states: normal and abnormal. The normal goal
enables us to develop effective methods for recognizing and denotes that normal users or processes want to accomplish
monitoring the time-varying plans (Charniak and Goldman, 1993; normal tasks. The abnormal goal is what intruders or hackers
Nicholson and Brady, 1994; Friedman et al., 1997). DBN-based plan intend to reach by exploiting the vulnerabilities of the system.
recognition is rst proposed by Albrecht et al. (1997) for
predicting the goals of multiple players in multi-user dungeon
(MUD) game with good experimental results.
In this paper, we proposed an approach on DBN with transfer S0 S1 S2 S3
probability estimation (DBN with TPE) to predict the goals of
intruders by observing the system call sequences. The domain of
goal states for observed agents includes normal and anomalous
goals. The normal denotes a kind of goal with which normal user Q0 Q
completes specic daily tasks, while the anomalous represents a
kind of goal of malicious users or hackers exploiting the Fig. 1. DBN for predicting the goal of a system call sequence.
ARTICLE IN PRESS

L. Feng et al. / Journal of Network and Computer Applications 32 (2009) 721732 723

A DBN for predicting the goal of a system call sequence based on X


2

the above denition is shown in Fig. 1, where Q0 is the initial goal PSk jBk1 PSk jSk1 ; qi PQ qi jBk1
i1
state and Sk is the kth system call in a sequence at some time.
Denitely, a system call transfer (Sk|Sk1,qi|) can be referred as a if (Sk|Sk1,qi)eCPT:
transition from certain system call Sk1 at time step k1 to the _
following one Sk at time step k under the goal qi. This model PQ qi jBk a P Sk jSk1 ; qi PQ qi jBk1
stipulates that the system call Sk depends on the current goal Q
and the previous system call Sk1. These dependencies are based X
2 _
PSk jBk1 P Sk jSk1 ; qi PQ qi jBk1
on the following two assumptions: i1

In initial step, P(Q qi|B0) is specied as 0.5, which ensures that


(1) The goal Q of the current system call sequence will not change
anomalous goal and normal goal have the same probability at the
in the ongoing process but only depends on the initial value
beginning of one segment of system call sequence. The above
Q0 .
specication is based on such an observed phenomenon in real
(2) The sequence has the Markov nature, that is, the current call
environment that generally any system call in various combina-
only depends on the previous call and current Q, but is
torial types will possibly be used in normal or abnormal
independent of history.
operation. The two different situations should be considered in
kth step. When (Sk|Sk1,qi|) belongs to CPT, P(Sk|Sk1,qi) will
The above assumptions are reasonable according to the nature of a participate in the calculation for P(Q qi|Bk) and P(Sk|Bk1). If
_
running program. The rst mainly assumes that the processs goal is (Sk|Sk1,qi) doesnot belong to CPT, P Sk jSk1 ; qi will substitute
constant during its life cycle . The second just simplies the modeling P(S |S ,q ) to infer the goal of Bk. How to calculate the
_ k k1 i
process of possible state transition. It is unquestionable that any P Sk jSk1 ; qi will be explained in following section. In above
process executed for one of specied tasks will invoke a sequence of equations, a is a normalization factor.
system calls in kernel space to achieve its goal. In essential, a plan of
an observed running process will correspond with a sequence of
2.3. Transfer probability estimation (TPE)
function calls with a goal. Consequently, any deviation from the
planned action will impact upon the system calls transfer to some
When the newly emerging system call transfers (Sk|Sk1,qi) are
extent. Naturally, a process always has its own build-in task or a
not included in the CPT, we will use the estimated probability to
planned main goal, no matter how the normal user uses it.
calculate the prediction index. The method for estimating transfer
We dene some variables as follows before describing our
probability is as follows:
approach:
Pn
j1 P ijj1
_
P Sk jSk1 ; qi  Sparsity_index,
q initial goal state; m
(
qi the ith goal state; also denotes state space including all Sk jSk1 ; qi eCPT
the sub-sequences of qi goal state; 1pk; jpn; i f0; 1g
P ij1j 2 CPT
Sk (k 0,1,2,y,n1) the kth call in the system call sequence of
length n; In above equation, Sparsity_index is the index that denotes that
Bk {q, s0,y,sk1,sk} a segment of system calls from s0 to sk and the system call transfers are infrequent to what extent. In the
initial states is q; experiments, we assign 0.001 to those of all processes. This is to
Pi(k1)k (k 1,2,y,n) the transition probability from the (k1)th assume that an unexpected system call transfer will occur in 1000
call Sk1 to the kth call Sk in the system call sequence of system call transfers in general. In above equation, variable m is
length n for the ith goal state; the total number of all system call transfers for a type of process in
Mi(k1)k (k 1,2,y,n) the total number of state transitions for CPT (Table 1).
goal state i between the (k1)th call Sk1 and the kth call
_
Sk in the system call sequence of length n; 2.4. Some indexes for estimating the performance
P Sk jSk1 ; qi the estimated transfer probability from Sk1 to Sk
with_goal qi;_we will explain how to calculate it in Section To measure the overall prediction performance, the average
2.3. Pa and Pn denote the values of TPEs for anomalous prediction index across the temporal horizon is dened below:
and normal system call sequences respectively;
PT
a: the normalization factor. PQ qi jBt
Pv t1
T
2.2. DBN with TPE where T is the total number of system calls in a sequence. In
addition, we dene prediction time index g indicating when the
The DBN with TPE of predicting the goal of a sequence of prediction results can be assured. This is the measurement of how
system call is illustrated as follows: quickly the goal is detected. A small g (0ogo1) means that we
Initial step: obtain a desirable result quickly. The formal denition of average
PQ qi jB0 PQ qi jq 12

X
2 Table 1
PS1 jB0 PS1 jS0 ; qi PQ qi jB0 Condition probability table (CPT) for system call transfer
i1
Previous call Current call Transfer probability Transfer times Goal
The kth step:
S0 S1 Pi01 Mi01 qi
S1 S2 Pi12 Mi12 qi
if (Sk|Sk1,qi)ACPT: y y y y y
Sn1 Sn Pi(n1)n Mi(n1)n qi
PQ qi jBk aPSk jSk1 ; qi PQ qi jBk1
ARTICLE IN PRESS

724 L. Feng et al. / Journal of Network and Computer Applications 32 (2009) 721732

prediction time index is Since more than one system call sequence is processed, indices
dened above always work in some ranges, that is
T
g M
T Pv 2 Min Pv ; Max Pv ; g 2 gmin ; gmax ; d 2 dmin ; dmax 
where TM is the number of temporal steps at which Pv reaches its
In following tests and verication, it is specied that the goal is
peak. To determine TM, the uctuation addition based on the
identied when the prediction index reaches at 100%.
sliding window is dened as
 
kWL 
 X j100%  PQ qi jBk j
T M T k iff  o1
 100%  3. Experiment results and discussion
nk

where WL is the sliding window length and usually specied as


3.1. Introduction
1/51/8 of the sequence length. Tables 25 list the minimum and
maximum value of Pv and g. Additionally, stability index d is
In order to validate the efciency and performance of our
dened to describe the degree of uctuation after the prediction
approach, we design a contrastive experiment for DBN with TPE
index has reached the maximum value
and DBN without TPE. Both of them were applied to process the
XT
j100%  PQ qi jBk j benchmark data from University of New Mexico (UNM). From the
d 0pPQ qi jBk p1 experimental results, we discover that DBN with TPE is more
kT
100%
M
efcient for predicting the goal of system call sequences with
From the above denition, a large d means the prediction index higher accuracy comparing than DBN without TPE. The compar-
would uctuate a lot. ison of experimental results is shown in Sections 3.2 and 3.3,

Table 2
Experimental results for predicting the goal of anomalous system call sequences (UNM data set)

_
System calls Total number of sequences
Pa
gmin (%) gmax (%) Min Pv Max Pv dmin dmax

Inetd 30 0.000737 5.3 46.3 0.5403 0.9575 0 0


Named 5 0.000622 1.9 47 0.9930 0.9999 0 0
Xlock 2 0.000633 24.4 26 0.9559 0.9585 0 0
Stide 19 0.000677 5.0 45.3 0.7946 0.9774 0 0
Ps 31 0.000657 5.4 29.8 0.8673 0.9778 0 0
Login 5 0.000766 2.4 3.4 0.9908 0.9916 0 0
FTP 4 0.000625 0.9 23.6 0.9994 0.8771 0 0

Table 3
Experimental results for predicting the goal of normal system call sequences (UNM data set)

_
System calls Total number of sequences
Pa
gmin (%) gmax (%) Min Pv Max Pv dmin dmax

Inetd 2 0.000779 2.9 28.1 0.9616 0.9989 0 0


Named 15 0.000532 0.01 51.6 0.5138 1.0000 0 0
Xlock 39 0.000487 0.7 66.7 0.9938 0.4145 0 0
Stide 10 0.000667 5.0 25.9 0.8629 0.9737 0 0
Ps 24 0.000674 7.4 28.6 0.8326 0.9617 0 0
Login 12 0.000672 0.39 0.41 0.9996 0.9996 0 0
FTP 7 0.000576 0.85 66.6 0.5609 0.9967 0 0

Table 4
Experimental results for predicting the goal of anomalous system call sequences (KLINNS data set)

_
System calls Total number of sequences
Pa
gmin (%) gmax (%) Min Pv Max Pv dmin dmax

Http 33 0.000879 9.1 9.1 0.9999 0.9999 0 0


FTP 2 0.000635 31.4 32.7 0.7233 0.7342 0 0
Samba 4 0.000662 73.5 73.5 0.7023 0.7023 0 0

Table 5
Experimental results for predicting the goal of normal system call sequences (KLINNS data set)

_
System calls Total number of sequences
Pa
gmin (%) gmax (%) Min Pv Max Pv dmin dmax

Http 18 0.000822 0.20 0.38 100% 100% 0 0


FTP 3 0.000644 22.5 27.3 0.9295 0.9417 0 0
Samba 12 0.000606 2.8 36.7 0.7932 0.9842 0 0
ARTICLE IN PRESS

L. Feng et al. / Journal of Network and Computer Applications 32 (2009) 721732 725

respectively. Proof test for DBN with TPE was also performed on and Named. In Fig. 2(b), prediction index nearly reaches 0 at about
KLINNS data, which is examined in detail in Section 3.4. 200 steps and indicates that normal FTP processes have been
taken as the anomalous ones. Fig. 5(b) shows a great uctuation in
the predication result of normal login process by DBN without
3.2. Prediction results of DBN without TPE for UNM data sets TPE. It is due to the fact that the normal sequence and anomalous
sequence are similar at the beginning in a login process. This kind
Forests et al. collected some system call sequences of active of false prediction also appears in Figs. 6(b) and 7(b) with very
processes over different platforms (e.g., Slackware Linux with small prediction indexes. From the examination of those gures,
kernel 2.0.35, SunOS 4.1.4) for modeling intrusion detection we also nd that large uctuations inuence on predicting goals
system (Forrest et al., 1996). These include different kinds of of anomalous system call sequences. However, none of the early-
processes. Some processes run as daemons and others do not. warning is missed fortunately. Figs. 2(a) and 7(a) show that the
Since programs vary widely in their size and complexity. Different uctuations have negative impact on the prediction at specic
kinds of intrusions like buffer overows, symbolic link attacks, time step when anomalous system call sequences have been
and Trojan programs have been audited in UNM data set. It only analyzed. Additionally, the prediction indexes for both of Inetd
includes those programs that run with root privilege, because and Ps also cannot meet our expectancy.
exploitation of these programs has the greatest potential harm to The prediction results for DBN without TPE shows that it has
the server. the worse capability of predicting the goals of system call
We processed the Login, Ps, Stide, Named, Xlock, FTP and Inetd sequences because of lower prediction indexes and of much more
processes. Because the structure of DBN we proposed in this paper uctuations. Although it does not miss any early-warning, many
is not adaptive to complex programs with simultaneous multiple false early-warnings are intolerable in the process of predicting
processes and time-varying goals, processes like sendmail were the normal goals of normal system call sequences. DBN without
not processed in our experiments. Prediction results by DBN TPE only uses the xed value for the probability of newly
without TPE are shown in Figs. 28. From the illustration of those emerging state transfer and this cannot measure the occurrence
gures, we can see that the overall prediction performance is not frequency as accurately as possible in general. The inuence of
what we expected at all. In Figs. 2, 6 and 7, obviously false early- false early-warnings and much more uctuation on prediction
warning occurs when we predict the normal goals of FTP, Stide performance can be seen in Figs. 28.

1.2 1.2
Prediction index for anomalous goal

Prediction index for normal goal

1.0 1.0

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

0.0 0.0
0 92 184 276 368 460 552 644 736 828 0 40 80 120 160 200 240 280 320 360
Anomalous Ftp system call sequence Normal Ftp system call sequence

Fig. 2. The prediction results of FTP system call sequence in UNM data set based on DBN without TPE.
Prediction index for anomalous goal

1.2 1.02
Prediction index for normal goal

1.0 1.00

0.8 0.98

0.6 0.96

0.4 0.94

0.2 0.92

0.0 0.90
0 28 56 84 112 140 168 196 224 252 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32
Anomalous Inetd system call sequence Normal Inetd system call sequence

Fig. 3. The prediction results of Inetd system call sequence in UNM data set based on DBN without TPE.
ARTICLE IN PRESS

726 L. Feng et al. / Journal of Network and Computer Applications 32 (2009) 721732

1.0 1.02
Prediction index for anomalous goal

Prediction index for normal goal


1.00
0.8
0.98

0.6 0.96
0.94
0.4 0.92
0.90
0.2
0.88
0.0 0.86
0 16 32 48 64 80 96 112 128 144 0 20 40 60 80 100 120 140 160 180
Anomalous Ps system call sequence Normal Ps system call sequence

Fig. 4. The prediction results of Ps system call sequence in UNM data based on DBN without TPE.

1.2 1.2
Prediction index for anomalous goal

Prediction index for normal goal

1.0 1.0

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

0.0 0.0
0 42 84 126 168 210 252 294 336 378 0 74 148 222 296 370 444 518 592 666
Anomalous Login system call sequence Normal Login system call sequence

Fig. 5. The prediction results of Login system call sequence in UNM data based on DBN without TPE.

1.2 1.2
Prediction index for anomalous goal

Prediction index for normal goal

1.0 1.0

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

0.0 0.0
0 32 64 96 128 160 192 224 256 288 0 86 172 258 344 430 516 602 688 774
Anomalous Stide system call sequence Nomal Stide system call sequence

Fig. 6. The prediction results of Stide system call sequence in UNM data based on DBN without TPE.

3.3. Prediction results of DBN with TPE for UNM data sets with which to calculate the prediction index for normal or
anomalous goal of specic system call sequence. To compare with
Unexpected system call transfers not pertained to CPT will DBN without TPE, we also processed the UNM data set and
inuence the prediction performance greatly if their TPE are not prediction results are shown in Figs. 915. For further validation,
measured properly. One important feature of DBN with TPE is to KLINNS data set was also processed and the results are shown in
estimate the transfer probability of newly emerging system call Section 3.4.
ARTICLE IN PRESS

L. Feng et al. / Journal of Network and Computer Applications 32 (2009) 721732 727

1.2 1.0

Prediction index for anomalous goal

Prediction index for normal goal


1.0
0.8

0.8
0.6
0.6
0.4
0.4

0.2
0.2

0.0 0.0
0 60 120 180 240 300 360 420 480 540 0 14 28 42 56 70 84 98 112
Anomalous Named system call sequence Normal Named system call sequence

Fig. 7. The prediction results of Named system call sequence in UNM data based on DBN without TPE.

1.2 1.2
Prediction index for anomalous goal

Prediction index for normal goal


1.0 1.0

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

0.0 0.0
0 50 100 150 200 250 300 350 400 450 0 44 88 132 176 220 264 308 352 396
Anomalous Xlock system call sequence Normal Xlock system call sequence

Fig. 8. The prediction results of Xlock system call sequence in UNM data based on DBN without TPE.

1.2 1.2
Prediction index for anomalous goal

Prediction index for normal goal

1.0
1.0

0.8
0.8
0.6
0.6
0.4

0.2 0.4

0.0 0.2
1 94 187 280 373 466 559 652 745 838 1 41 81 121 161 201 241 281 321 361
Anomalous Ftp system call sequence Normal Ftp system call sequence

Fig. 9. The prediction results of FTP system call sequence in UNM data based on DBN with TPE.

Comparing with results of Figs. 215 indicate that DBN with system call sequence in Fig. 9(b) greatly improves away the false
TPE is more efcient than DBN without TPE to predict the goals of early-warning of Fig. 2(b) and the former shows the more stable
both normal and anomalous system call sequences in general. and sensitive prediction performance. The same improvement
There is less uctuation in Fig. 9(a) than that in Fig. 2(a) and takes effect evidently on Named system call sequences in Fig. 14.
the former launches anomalous goal alarm earlier with less Similarly, there is also less uctuation in anomalous goal
uctuation. Furthermore, the prediction results of normal FTP prediction for Xlock in Fig. 15. The prediction results shown in
ARTICLE IN PRESS

728 L. Feng et al. / Journal of Network and Computer Applications 32 (2009) 721732

1.2 1.01
Prediction index for anomalous goal

Prediction index for normal goal


1.00
1.0
0.99
0.8
0.98
0.6 0.97
0.96
0.4
0.95
0.2
0.94
0.0 0.93
1 30 59 88 117 146 175 204 233 262 1 26 51 76 101 126 151 176 201 226
Anomalous Inetd system call sequence Normal Inetd system call sequence

Fig. 10. The prediction results of Inetd system call sequence in UNM data based on DBN with TPE.

1.2 1.1
Prediction index for anomalous goal

Prediction index for normal goal

1.0
1.0
0.9

0.8 0.8
0.7
0.6 0.6
0.5
0.4
0.4
0.2 0.3
1 17 33 49 65 81 97 113 129 145 1 41 81 121 161 201 241 281 321 361
Anomalous Ps system call sequence Normal Ps system call sequence

Fig. 11. The prediction results of Ps system call sequence in UNM data based on DBN with TPE.

1.1 1.1
Prediction index for anomalous goal

Prediciton index for normal goal

1.0
1.0

0.9
0.9
0.8

0.8
0.7

0.6 0.7
1 43 85 127 169 211 253 295 337 379 1 78 155 232 309 386 463 540 617 694
Anomalous Login system call sequence Normal Login system call sequence

Fig. 12. The prediction results of Login system call sequence in UNM data based on DBN with TPE.

Figs. 1013 clearly illustrate that the DBN with TPE predicts the following tables. From the illustration of Tables 2 and 3, we nd
goals with very high accuracy and stability. that (1) there are different prediction effects even for the same
The results for all system call sequences of UNM data are given processes. According to the values of sensitivity index g listed in
in Tables 2 and 3. In the experiments for UNM data set, _
we assign Table 2, the processes with the largest deviation between gmin and
Sparsity_index
_
to 0.001 for all processes, with which Pn for normal gmax are anomalous Named processes. The slowest has predicted
goal and Pa for anomalous goal can be calculated as listed in the anomalous goal at 47% of sequence length, while the fastest
ARTICLE IN PRESS

L. Feng et al. / Journal of Network and Computer Applications 32 (2009) 721732 729

1.1 1.1

Prediction index for anomalous goal

Prediction index for normal goal


1.0 1.0
0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.6
0.5
0.4 0.5

0.3 0.4
1 121 241 361 481 601 721 841 961 1081 1 47 93 139 185 231 277 323 369 415
Anomalous Stide system call sequence Normal Stide system call sequence

Fig. 13. The prediction results of Stide system call sequence in UNM data based on DBN with TPE.

1.02 1.2
Prediction index for anomalous goal

Prediction index for normal goal

1.00 1.0

0.98 0.8

0.96 0.6

0.94 0.4

0.92 0.2

0.90 0.0
1 61 121 181 241 301 361 421 481 541 1 41 81 121 161 201 241 281 321 361
Anomalous Named system call sequence Normal Named system call sequence

Fig. 14. The prediction results of Named system call sequence in UNM data based on DBN with TPE.

1.2 1.2
Prediction index for anomalous goal

Prediction index for normal goal

1.0 1.0

0.8
0.8
0.6
0.6
0.4
0.4
0.2

0.2 0.0
1 47 93 139 185 231 277 323 369 415 1 28 55 82 109 136 163 190 217 244
Anomalous Xlock system call sequence Normal Xlock system call sequence

Fig. 15. The prediction results of Xlock system call sequence in UNM data based on DBN with TPE.

has done it at 1.9% of sequence length. Login is the one with and those of login have the smallest deviation, which show us that
smallest deviation, which indicates that Login processes run in a Xlock involves more diverse and complex behaviors, while Login
more simple and uniform way of behavior. Similarly, from the does not. (2) Various TPEs have been _
calculated
_
for different
illustration in Table 3, Xlock processes have the largest deviation processes. The different values of Pa and Pn in Tables 2 and 3
ARTICLE IN PRESS

730 L. Feng et al. / Journal of Network and Computer Applications 32 (2009) 721732

indicates that TPEs rather than those small xed decimal fraction remote le globbing heap corruption vulnerability, which allows
can measure the possibility of newly emerging transfer more remote attackers to execute arbitrary commands at the victim
accurately. In addition, according to the above denition in host. The anomalous traces of FTP consist of the exploitations
Section 2.4, d measures the stability during the goal prediction. behavior against the above vulnerability. Similarly, the Samba
The all values of dmin and dmax list in Tables 2 and 3 are zero and 2.2.8 and above version have the vulnerability leading to remote
indicate that DBN with TPE has better stability of prediction index buffer overow when illegal users send packages with the
than that approach in Feng et al. (2004). excessive length to Samba server. And the synthetic anomalous
Samba system calls traces were also collected through some auto-
3.4. Prediction results of DBN with TPE for KLINNS data sets scripts exploitation.

3.4.1. Data collection 3.4.2. Experiment results and discussion


To further validate our method, extensive testing was also The experiment results for KLINNS data sets illustrated in
performed based on the data sets collected from the computer Figs. 1618 show the prediction performance on extended data
network systems in our own KLINNS Laboratory several normal set. From Fig. 16(a), it is seen that the prediction index gradually
and anomalous Http, FTP and Samba system call sequences are reaches 100% after 300th step for anomalous goal prediction of
collected in RedHat Linux OS system with kernel 2.4.7-10. The anomalous FTP and keeps stable up to end of the sequence.
data collector was implemented by tapping into the kernel. Each Similarly, prediction effect has been acquired for normal goal
sequence is a series of system calls invoked by an observed prediction at normal FTP processes in Fig. 16(b). Similarly, the
process from the beginning to the end of its life cycle. Sequence prediction results for Samba and http are both consistent. In a
lengths widely vary because of the differences in program word, the prediction performance for KLINNS data based on DBN
complexity and users goals. The data collection procedure is with TPE is satisfying with high accuracy and less uctuation.
presented as follows. The overall prediction results of CNSIS data sets are shown in
Tables 4 and 5. We have some ndings based on g and Pv. The http
3.4.1.1. Collection of normal system calls sequence. According to the processes is the rst one to be predicted because of its minimum g
Forrests collection mechanism Forrest et al. (1996), the data for and maximum Pv, which also indicates that the processes of Http
normal system calls are also divided two types: synthetic and live behave more regularly. We also assign
_
Sparsity_index to 0.001
_
for
data. Synthetic data are traces obtained by running prepared all processes, with which the Pn for normal goal and Pa for
scripts or simulators to impersonate a real users behavior. Live anomalous goal were calculated and listed in Tables 4 and 5,
normal data are traces of the processes collected during normal respectively. Meanwhile, the values of d in this experiment also
usage of a real computer system used by real normal users. We show the better performance on the stability than those in our
obtained the synthetic httpd system calls traces by simulating previous work (Feng et al., 2004).
tool webstress, which impersonates users to access and test the
payload of WWW servers. For FTP and Samba, we collected the
live data, spanning a time period of about 1 month. The users 4. Conclusion and future works
were graduate students with rich computer experience in our lab
who were working for the Integrated Network Security System Comparing with traditional approach of intrusion detection, a
Project (INSSP) funded by 863 High Tech Plan of China. proactive IDS and intrusion prevention system should react in
more quick and more agile way instead of only sending amounts
3.4.1.2. Collection of anomalous system calls sequence. Intrusion of alert. To effectively prevent hackers breaking into information
traces involved some attacking events associated with httpd, system, early-warning is very vital in a proactive IDS and
FTP and Samba. Apache is widely used WWW service software prevention system, which can greatly improve the performance
with an important component called Apache-SSL implementing of state of art IDS. In this paper, we are mainly concerned about
of SSL service. It has a remote buffer overow vulnerability that the goal prediction by analyzing system call sequences of different
allows remote hackers to execute arbitrary commands with root processes. DBN with TPE is proposed to predict the goals of
privilege. Anomalous httpd system call traces are collected from different processes by UNM and KLINNS data sets with very high
such Apache-SSL exploitation. The widely known wu-FTPd has a accuracy and better stability. It clears the way for early-warnings

1.2 1.1
Prediction index for anomalous goal

Prediction index for normal goal

1.0
1.0
0.9
0.8
0.8
0.6 0.7
0.6
0.4
0.5
0.2
0.4
0.0 0.3
1 96 191 286 381 476 571 666 761 856 1 78 155 232 309 386 463 540 617 694
Anomalous Ftp system call sequence Normal Ftp system call sequence

Fig. 16. The prediction results of FTP system call sequence in KLINNS data sets (DBN with TPE).
ARTICLE IN PRESS

L. Feng et al. / Journal of Network and Computer Applications 32 (2009) 721732 731

1.1 1.1

Prediction index for anomalous goal

Prediction index for normal goal


1.0 1.0

0.9 0.9

0.8 0.8

0.7 0.7

0.6 0.6

0.5 0.5

0.4 0.4
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 1 81 161 241 321 401 481 561 641 721
Anomalous Http system call sequence Normal Http system call sequence

Fig. 17. The prediction results of Http system call sequence in KLINNS data sets (DBN with TPE).

1.2 1.2
Prediction index for anomalous goal

Prediction index for normal goal

1.0
1.0

0.8
0.8

0.6
0.6
0.4

0.4
0.2

0.2 0.0
1 96 191 286 381 476 571 666 761 856 1 199 397 595 793 991 1189 1387 1585 1783
Anomalous Samba system call sequence Normal Samba system call sequence

Fig. 18. The prediction results of Samba system call sequence in KLINNS data sets (DBN with TPE).

on which system call based IPS depend to take corresponding Charniak E, Goldman R. A Bayesian model of plan recognition. Artif Intell
countermeasures. Based on the extensive experimental results, 1993;64(1):5379.
Feng L, Guan X, Guo S, Gao Y, Liu P. Predicting the intrusion intentions by observing
We nd: (1) DBN with TPE greatly improve the prediction system call sequences. Comput Secur 2004;23(3):24152.
performance so as to provide early-warnings to administrator or Forrest S, Hofmeyr SA, Somayaji A, Longstaff TA. A sense of self for Unix processes.
network interaction systems for quick response against further In: Proceedings of the 1996 IEEE Symposium on Research in Security and
Privacy; 1996. p. 1208.
attacks; (2) DBN with TPE hardly need any manual selection for Friedman N, Geiger D. Goldszmidt Moises Bayesian network classiers. Mach Learn
parameters mentioned in the approach of DBN with parameter 1997;29(23):13163.
compensation (Feng et al., 2004). And the transfer probability can Geib C, Goldman R. Plan recognition in intrusion detection systems. In: DARPA
Information Survivability Conference and Exposition (DISCEX) vol. 1; 2001.
be automatically calculated once the Sparsity_index has been
p. 4655.
assigned, which make it more utilizable in real environments. Goldman, Robert P, Geib Christopher W, Miller Christopher A. A New Model of Plan
There are still so much more to be explored. Our future work Recognition. In: Proceedings of the 1999 conference on uncertainty in articial
will focus on following aspects: (1) current structure of DBN is not intelligence, 1999. p. 24554.
Hu W, Liao Y, Vemuri V. Robust support vector machines for anomaly detection in
adaptive to complex program with multiple running processes computer security. In: Proceeding of the 2003 International Conference on
and different goals such as sendmail. The more complex model Machine Learning and Applications (ICMLA03). California: Los Angeles; 2003.
and associated improved approach of goal prediction for that will Huang M, Wicks T. A large-scale distribution intrusion detection framework based
on attack strategy analysis. Comput networks 1999;31(2324):246575.
be considered; (2) Command sequences and other sequential Lee W, Stolfo S. Data mining approaches for intrusion detection. In: Proceedings of
behaviors related with computer security will be analyzed by this the 7th USENIX Security Symposium. USENIX Association; 1998. p. 7994.
approach to distinguish the goal of attackers and normal users. Lee W, Xiang D. Information-theoretic measures for anomaly detection. In:
Proceedings of the 2001 IEEE Symposium on Security and Privacy; 2001.
Liao Y, Vemuri V. Use of k-nearest neighbor classier for intrusion detection.
References Comput Secur 2002;21(5):43948.
Nicholson AE, Brady JM. Dynamic belief networks for discrete monitoring. IEEE
Trans Syst Man and Cybern 1994;24(11):1593610.
Albrecht D, Zukerman I, Nicholson A, Bud A. Towards a Bayesian model for keyhole
Sharma A, Pujari A, Paliwal K. Intrusion detection using text processing techniques
plan recognition in large domains. In: Proceedings of the sixth international with a kernel based similarity measure. Comput Secur 2007;26(7-8):
conference on user modeling. Sardinia, Italy, 1997. p. 365376. 48895.
ARTICLE IN PRESS

732 L. Feng et al. / Journal of Network and Computer Applications 32 (2009) 721732

Wang W, Guan X, Zhang X. Proling program and user behaviors for anomaly Warrender C, Forrest S, Pearlmutter B. Detecting intrusions using system calls:
intrusion detection based on non-negative matrix factorization. In: Proceed- alternative data models. In: Proceedings of 1999 IEEE Symposium on Security
ings of 43rd IEEE Conference on Control and Decision (CDC2004). Paradise and Privacy; 1999. p. 13345.
Island, Bahamas: Atlantis; 2004. p. 99104. Wrn A, Stenborg O. Recognizing the plans of a replanning user. In: Proceedings of
Wang W, Guan X, Zhang X, Yang L. Proling program behavior for anomaly the IJCAI95 workshop on the next generation of plan recognition systems:
intrusion detection based on the transition and frequency property of challenges for and insight from related areas of AI. Montreal, Canada, 1995.
computer audit data. Comput Secur. 2006;25(7):53950. p. 1138.
Wang W, Guan X, Zhang X. Processing of massive audit data streams for real-time Yeung D, Ding Y. Host-based intrusion detection using dynamic and static
anomaly intrusion detection. Comput Commun. 2008;31(1):5872. behavioral models. Pattern Recognition 2003;36(1):22943.

Vous aimerez peut-être aussi