Académique Documents
Professionnel Documents
Culture Documents
Terran Lane
School of Electrical and Computer Engineering and
CERIAS
Purdue University, West Lafayette, IN 47907-1287
email: terran@ecn.purdue.edu
0.9 2
10
0.8
1
0.7 10
0.6
Time to Alarm
Accuracy
0
10
0.5
0.4
−1
10
0.3
0.2 −2
10
0.1
−3
0 10
USER0 USER1 USER2 USER3 USER4 USER5 USER6 USER7 USER0 USER1 USER2 USER3 USER4 USER5 USER6 USER7
Tested User Tested User
(a) (b)
Figure 2: Accuracies, (a), and mean times-to-alarm, (b), for an HMM model (K = 50) of USER0's behaviors.
rate parameter, r, was tested across the range 0.5%{10%, formance trends is USER4. While the sensor displays
all of the observed false alarm rates are greater than this strong true accept abilities with respect to this user, it
(8.3%{18.3%). This is a result of the training and pa- provides only poor true detection abilities. This is an
rameterization data failing to fully re
ect the behavioral example of the decision thresholds (tmax and tmin, as de-
distribution present in the testing data. Because the user scribed in Section 2.4) being set to articially extreme
has changed behaviors or tasks over the interval between values, resulting in a spuriously large acceptance region.
the generation of training and testing data, the prole Thus, the system has eectively decided that \every-
does not include all of the behaviors present in the test thing is USER4", and no real dierentiation is being
data. This phenomenon is actually exacerbated by the done | it is simply accepting most behaviors as normal.
batch-mode experimental setup used here. We have in- Examination of USER4's training data reveals that this
vestigated online techniques for this domain which em- user appears to devote entire shell sessions to single tasks
ploy an instance based learning technique (IBL), [Lane (such as the compile-debug cycle) which appear as rather
and Brodley, 1998], and have found that they do perform repetitious and monotonous patterns. Because this user
better than the corresponding batch-mode IBL sensors. is working in the X-Windows environment, tasks can be
In future work, we will be investigating techniques for assigned to single shell sessions, and those shell sessions
online versions of the HMM user modeling sensor. may be long-lived (some were over 2,000 commands).
The complete set of results for all proles and folds for Thus, the training data may display only one or two ses-
the HMM user model with K = 50 is shown in Figure 3. sions and a very small number of behaviors, while the
These plots are intended not as a reference for individual parameter selection data displays a dierent (but also
accuracy or time-to-alarm (TTA) values, but to convey small) set of behaviors. Because there may be little over-
a sense of the general performance of the anomaly detec- lap between training and parameter selection data, the
tion sensor under dierent operating conditions and to observed similarity-to-prole frequency distribution may
highlight some behavioral characteristics of the detection be distorted and the selected decision thresholds would
system. In these plots, each column displays the results then be poorly chosen.
for a single user's prole (the same data as are displayed
for USER0 in Figure 2). Now, however, all three folds
are given for each prole. A converse behavior occurs with Prole 1 on fold 2
The primary point of interest in these plots is that (the set of circles at the lowest end of Prole 1 in Figure
true acceptance abilities (ability to correctly identify the 3). This prole displays relatively low true accept rates
proled user as him or herself) is generally good as evi- in comparison to other proles and folds, but very high
denced by high accuracies and long times to generation true detect rates (often 100%). This is an example of the
of false alarms (i.e. the \o" symbols are clustered toward user model deciding that \nothing is USER1" because
the top of each Y axis). In addition, the true detection the acceptance region has been set too narrowly. As
abilities (ability to correctly identify that an imposter with USER4, this arises because dierent behaviors are
is not the proled user) are generally fair to good as displayed in the training and testing data. In this case,
evidenced by reasonable accuracies and short times to the parameter selection data re
ects the training data
generation of true alarms. Note that mediocre true de- well, but the test data is dierent from both of them.
tection abilities may be acceptable because each intruder As a result, the acceptance range is narrowly focused to
need be caught only once. high-similarity behaviors, but the behaviors encountered
The obvious and notable exception to the general per- in the testing data have lower similarity.
1
3
10
0.9
0.8
2
0.7 10
0.6
Time to Alarm
Accuracy
0.5 1
10
0.4
0.3
0
10
0.2
0.1
−1
0 10
Profile 0 Profile 1 Profile 2 Profile 3 Profile 4 Profile 5 Profile 6 Profile 7 Profile 0 Profile 1 Profile 2 Profile 3 Profile 4 Profile 5 Profile 6 Profile 7
(a) (b)
Figure 3: Results for all user proles and folds. Each column now displays a single prole tested against all test sets
(i.e. each column is the equivalent of Figure 2).
4.2 Number of Hidden States At rst appearance, these results, while slight, seem to
An open question in the use of HMMs for modeling is indicate at least that the K = 50 sensor is performing no
the choice of K , the number of hidden states. When the worse than is the K = 1 sensor in terms of true accept
states have a clear domain interpretation, as for exam- accuracy and better in terms of true detection. The situ-
ple in fault monitoring, the value of K may be naturally ation becomes somewhat more confused, however, when
dictated by the domain. When K is not so conveniently mean time-to-alarm is considered. In this dimension, the
available, however, we can employ an empirical analysis K = 50 model has superior time to false alarm, at an av-
to discover an appropriate value. To examine the im- erage of 15.6 tokens longer than K = 1, but inferior time
pact of K on sensor performance, we constructed mod- to true alarm at 36.9 tokens longer. It turns out that this
els with K 2 f1; 2; 15; 30g and tested them under the is skewed by USER4 (note that the logarithmic range of
same conditions used for K = 50. The case K = 1 is the TTA data allows a single user to signicantly skew a
a degenerate form of an HMM equivalent to frequency simple additive mean). While the K = 1 model also suf-
estimation of the alphabet symbols with all time steps fers from the \everything is USER4" syndrome, it does
of the sequence data considered to be statistically inde- so to a much lesser degree than does the K = 50 model
pendent. Eectively, the data is considered to have been and, thus, appears to be far more eective at separat-
generated by a multinomial process with jj elements ing other users from USER4. When USER4 is removed
drawn according to the distribution B (the output sym- from the sample, the dierences between K = 50 and
bol generation distribution). Because the K = 1 case K = 1 in the TTA domain favor K = 50, for which the
has dierent qualitative behaviors than the other cases, mean TTA is 14.6 tokens longer for false alarms and 14.4
we discuss it separately. tokens shorter for true alarms.
Results for the K = 1 case are displayed in Figure 4. Results comparing the sensor system at K = 2, and
These gures are comparative, plotting the results for 30 to K = 50 are given in Figure 5 (we omit K = 15, as
the K = 1 mode on the vertical versus results for the it falls on the spectrum between K = 2 and K = 30 but
K = 50 mode on the horizontal. The diagonal line is is not otherwise unusual). Again, values for K = 50 are
the iso-performance surface, and points falling above it plotted on the horizontal while values for other settings
indicate higher performance by the K = 1 sensor, while of K appear on their respective vertical axes.
points falling to the right of it indicate higher perfor- Figure 5 re
ects a trend which is most dramatic in the
mance by the K = 50 sensor. K = 2 plots and which becomes less pronounced as K
The general result of Figure 4 is that the 50 state increases. The general result is that the K = 50 sensor
HMM has much stronger true detection accuracies and has superior or equivalent true accept accuracies, but in-
TTAs. And though the true accept points are scattered ferior true detect accuracies (albeit by a narrow margin,
more uniformly across the iso-performance surface (61 of on average). The qualitative result in the TTA domain
the 120 true accept accuracy measurements fall on the is similar, but the aggregate results are skewed by a few
K = 50 side of the line), the K = 50 system appears to of the tested users | in this case USER0, USER1, and
have a slight margin, at an average of 1% higher true ac-
cept accuracy5 than that reported by the K = 1 system.
and method 2 is:
1 X
(accuracymethod1 (t) , accuracymethod2 (t))
5
To measure the relative accuracy performance between N 2opponent test sets
two systems, we employ a mean of accuracy value dierences. t
Thus, the dierence in true detect rates between method 1 where N is the number of opponent test sets.
Comparative Accuracies Comparative Mean Times to Alarm
1
3
10
0.9
0.8
2
10
0.7
0.6
K=1
K=1
0.5 1
10
0.4
0.3
0
10
0.2
0.1
−1
0 10
−1 0 1 2 3
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 10 10 10 10 10
K=50 K=50
(a) (b)
Figure 4: Comparisons of HMM user models with K = 1 (vertical axis) to K = 50 (horizontal axis). Accuracies
appear in (a) and TTAs in (b). The \o" symbols denote true accept rates and mean times to false alarms and the
\+" symbols denote true detect rates and mean times to true alarms.
USER5. The structure of this trend is open to multiple quickly and accurately when they can be trusted.
interpretations. The \improved true accept coupled with The online monitoring approach to identication
degraded true detect performance" can be viewed as an serves mainly as a verication of and backup to pri-
indication that K = 50 is subject to the \everybody mary identication techniques.
is the proled user" diculty with respect to smaller Group Identication A more visibly useful extension
values of K . We can, however, take the converse in- is to identify users as members of groups rather than
terpretation that the models with smaller values of K as individuals. By constructing models of a group's
are evidencing a \nobody is the proled user" problem. exemplar behaviors, an individual can be automati-
Thus, the spectrum of values of K represents a spectrum cally assigned to a group and inherit environmental
of tradeos between user-oriented (at large values of K ) customizations appropriate to that group's needs.
and imposter-oriented (at small values of K ). This ob- Behavioral Identication At a ner grain, a user's
servation is compatible with the interpretation that the behaviors may be segmented by class (e.g. writing,
models with larger K 's are encoding a broader range play, coding, web surng). Such an approach has
of user behaviors than are the smaller models, although been examined with manually constructed HMMs
more investigation is required to verify this hypothesis. by Orwant, [Orwant, 1995]. By analyzing the sub-
In general, the optimal number of hidden states for max- structure of the interconnections in an automati-
imum discriminability is user dependent and seems to cally generated HMM, behavioral classes might be
be related to the syntactic complexity displayed in the automatically identied and associated with appro-
user's data. For example, USER4's data, which is ex- priate responses for a user interface.
tremely repetitive and employs only simple shell com-
mands, is best modeled by a single state model while Behavioral Prediction HMMs can be run not only as
USER7's data, which displays some complex shell ac- observational models but also as generative models.
tions such as multi-stage pipelines, is best modeled by a In such a framework, they could be used to predict
15 state HMM. a user's next actions and provide time-saving short-
cuts (such as opening menus or initiating expensive
5 Extensions and Implications computations early).
The techniques presented here are not limited solely The observation that USER4, for example, displays
to the domain of anomaly detection nor to explicitly qualitatively dierent behaviors than do other users (be-
security-oriented tasks. A number of other possible uses cause USER4 is modeled more eectively by the single
could be realized with straightforward modications to state model while the other users are modeled more ef-
this framework. fectively by the multi-state models) indicates that the
HMM framework is capable of discerning some types of
User Identication The most obvious extension to behavioral groupings. The results on choice of K are
this work is the capacity to identify one particu- also consistent with the interpretation that users fall
lar user from a set of known users solely through along a spectrum of behavioral complexities which can
behavioral characteristics. This use, however, is be identied by models of diering complexity. Under
also mostly of security interest, as methods such as the privacy-oriented framework employed here, it is dif-
password or physical tokens can identify users more cult to employ some of this knowledge because cross-
Comparative Accuracies Comparative Mean Times to Alarm
1
3
10
0.9
0.8
2
10
0.7
0.6
K=2
K=2
0.5 1
10
0.4
0.3
0
10
0.2
0.1
−1
0 10
−1 0 1 2 3
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 10 10 10 10 10
K=50 K=50
(a) (b)
Comparative Accuracies Comparative Mean Times to Alarm
1
3
10
0.9
0.8
2
10
0.7
0.6
K=30
K=30
0.5 1
10
0.4
0.3
0
10
0.2
0.1
−1
0 10
−1 0 1 2 3
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 10 10 10 10 10
K=50 K=50
(c) (d)
Figure 5: Comparisons of K = 2, (a) and (b) and K = 30, (e) and (f), (on their respective vertical axes) to K = 50
(horizontal axis).
validation testing is impossible, but in a less constrained syntactic complexity present in the command line
setting these types of distinctions could be extracted data.
fairly easily and used to assist the user. We found that single-state HMM models (eectively
6 Conclusions and Future Work token frequency estimation models) display qualitatively
dierent behaviors than do multi-state models. For most
We have demonstrated the use of hidden Markov models of the proled users, the multi-state models were more
for user proling in the domain of anomaly detection. eective than the single-state model. The exception was
The key results of the empirical investigation are: USER4, whose data consisted of long sessions each of
HMMs can be used to identify users by their com- which encoded a small number of tasks.
mand line behavioral patterns. An open problem for this user proling technique is the
These models suer from two general classes of er- ability to select appropriate model parameters (such as
rors: overly permissive (\everybody is the proled K ) from data or prior knowledge. In supervised learning
user") and overly restrictive (\nobody is the proled domains, cross-validation search may be used to select
user"). appropriate parameter settings, but we must seek un-
supervised techniques for this domain. The observation
The number of hidden states in the HMM repre- that optimal choice of K seems to be related to behav-
sents a spectrum of tradeos between these two er- ioral complexities presents a potential approach to this
ror classes. Larger models were found to be more problem. It is possible that measures of data complex-
eective at identifying the valid user, while smaller ity, such as entropy, could be used to select appropriate
models were generally better at discerning impos- model parameters.
tors. Finally, the sensor employed here functions o-line.
The optimal number of hidden states is user- In other work, [Lane and Brodley, 1998], we have found
dependent and appears to re
ect a measure of the that on-line extensions to an instance based user mod-
eling sensor displayed heightened performance. We are [Norton, 1994] S. W. Norton. Learning to recognize pro-
currently investigating extensions of the HMM anomaly moter sequences in E. coli by modelling uncertainty
detection sensor to on-line mode, and expect that sim- in the training data. In Proceedings of the Twelfth
ilar performance improvements will be realized by this National Conference on Articial Intelligence, pages
change. 657{663, Seattle, WA, 1994.
[Oppenheim and Schafer, 1989] A. Oppenheim and
Acknowledgments R. Schafer. Discrete-Time Signal Processing. Signal
Portions of this work were supported by contract Processing. Prentice Hall, Englewood Clis, New
MDA904-97-C-0176 from the Maryland Procurement Jersey, 1989.
Oce, and by sponsors of the Center for Education and [Orwant, 1995] J. Orwant. Heterogeneous learning in
Research in Information Assurance and Security, Pur- the Doppelganger user modeling system. User Model-
due University. We would like to thank Carla Brodley, ing and User-Adapted Interaction, 4(2):107{130, 1995.
Craig Codrington, and our reviewers for their helpful
comments on this work. We would also like to thank our [Provost and Fawcett, 1998] F. Provost and T. Fawcett.
data donors and, especially, USER4 whose data forced Robust classication systems for imprecise environ-
us to examine this domain more closely than we might ments. In Proceedings of the Fifteenth National Con-
otherwise have done. ference on Articial Intelligence, Madison, WI, 1998.
AAAI Press.
References [Quinlan, 1993] J. R. Quinlan. C4.5: Programs for ma-
[Anderson, 1980] J. P. Anderson. Computer security chine learning. Morgan Kaufmann, San Mateo, CA,
threat monitoring and surveillance. Technical Report 1993.
Technical Report, Washington, PA, 1980. [Rabiner and Juang, 1993] L. Rabiner and B. H. Juang.
[Angulin, 1987] D. Angulin. Learning regular sets from Fundamentals of Speech Recognition. Prentice Hall,
queries and counterexamples. Information and Com- Englewood Clis, New Jersey, 1993.
putation, 75:87{106, 1987. [Rabiner, 1989] L. R. Rabiner. A tutorial on Hidden
[Casella and Berger, 1990] G. Casella and R. L. Berger. Markov Models and selected applications in speech
Statistical Inference. Brooks/Cole, Pacic Grove, CA, recognition. Proceedings of the IEEE, 77(2), Febru-
1990. ary 1989.
[Chenoweth and Obradovic, 1996] T. Chenoweth and [Rivest and Schapire, 1989] R. L. Rivest and R. E.
Z. Obradovic. A multi-component nonlinear predic- Schapire. Inference of nite automata using hom-
tion system for the S&P 500 index. Neurocomputing, ing sequences. In Proceedings of the Twenty First
10(3):275{290, 1996. Annual ACM Symposium on Theoretical Computing,
[Davison and Hirsh, 1998] B. D. Davison and H. Hirsh. pages 411{420, 1989.
Predicting sequences of user actions. In Proceedings [Salzberg, 1995] S. Salzberg. Locating protein coding
of the AAAI-98/ICML-98 Joint Workshop on AI Ap- regions in human DNA using a decision tree algo-
proaches to Time-series Analysis, pages 5{12, 1998. rithm. Journal of Computational Biology, 2(3):473{
[Denning, 1987] D. E. Denning. An intrusion-detection 485, 1995.
model. IEEE Transactions on Software Engineering, [Smyth, 1994a] P. Smyth. Hidden Markov monitoring
13(2):222{232, 1987. for fault detection in dynamic systems. Pattern Recog-
[Forrest et al., 1996] S. Forrest, S. A. Hofmeyr, A. So- nition, 27(1):149{164, 1994.
mayaji, and T. A. Longsta. A sense of self for Unix [Smyth, 1994b] P. Smyth. Markov monitoring with
processes. In Proceedings of 1996 IEEE Symposium unknown states. IEEE Journal on Selected Areas
on Computer Security and Privacy, 1996. in Communications, special issue on intelligent sig-
[Fukunaga, 1990] K. Fukunaga. Statistical Pattern nal processing for communications, 12(9):1600{1612,
Recognition (second edition). Academic Press, San 1994.
Diego, CA, 1990. [Srikant and Agrawal, 1996] R. Srikant and R. Agrawal.
[Lane and Brodley, 1997] T. Lane and C. E. Brodley. Mining sequential patterns: Generalizations and per-
Sequence matching and learning in anomaly detec- formance improvements,. In Proc. of the Fifth
tion for computer security. In Proceedings of AAAI-97 Int'l Conference on Extending Database Technology
Workshop on AI Approaches to Fraud Detection and (EDBT), Avignon, France, 1996.
Risk Management, pages 43{49, 1997. [Yoshida and Motoda, 1996] K. Yoshida and H. Mo-
[Lane and Brodley, 1998] T. Lane and C. E. Brodley. toda. Automated user modeling for intelligent inter-
Approaches to online learning and concept drift for face. International Journal of Human-Computer In-
user identication in computer security. In Fourth In- teraction, 8(3):237{258, 1996.
ternational Conference on Knowledge Discovery and
Data Mining, pages 259{263, 1998.